ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Modeling varying pauses to develop robust acoustic models for recognizing noisy conversational speech

Jin-Song Zhang, Satoshi Nakamura

The frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents our proposal to exploit reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of SPINEII project, and achieved a more correct phonetic transcription. The cross-word triphone HMMs developed using this transcription got absolute 5.2% word error reduction when compared to the baseline model.