ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Using time-stretched pulses for accurate splitting of speech utterances played back in noisy reverberant environments

Dorothea Kolossa, Qiang Huo

In the speech recognition area, there is often a need to play back existing speech corpora in a new environment to generate a large amount of close-to-realistic speech data for developing a new application. Such playback speech corpora can be constructed efficiently by first concatenating many speech utterances into big files with a marker tone inserted between two utterances, then playing back and recording them, and finally extracting the individual utterances from the recorded sounds via detection of the marker tone responses. In this paper, we propose to use the TSP (time-stretched pulse) as a marker tone for this purpose. We present a matched filtering based procedure for detecting TSP responses in playback recordings. Using this approach, we have played back the TI46 speech corpus at different distances ranging from 5cm to 1.5m between the loudspeaker and the microphone in a noisy reverberant lab environment. The playback sounds are recorded by an iPAQ Pocket PC. All the speech utterances are successfully extracted with a maximum temporal error of about 1.9ms.