This paper addresses single-channel speech enhancement assuming difficulties in predicting the noise statistics. We describe an approach which aims to maximally extract the two features of speech - its temporal dynamics and speaker characteristics - to improve the noise immunity. This is achieved by recognizing long speech segments as whole units from noise. In the recognition, clean speech sentences, taken from a speech corpus, are used as examples. Experiments have been conducted on the TIMIT database for separating various types of nonstationary noise including song, music, and crosstalk speech. The new approach has demonstrated improved performance over conventional speech enhancement algorithms in both objective and subjective evaluations.