ISCA Archive Eurospeech 1995
ISCA Archive Eurospeech 1995

Top-down speech detection and n-best meaning search in a voice activated telephone extension system

Kazuya Takeda, Shingo Kuroiwa, Masaki Naito, Seiichi Yamamoto

In this paper, a robust speech detection method and an effective N-best search method are proposed. In the proposed speech endpoint detection method, the robustness to varying speech level is improved by using the likelihood of partially matched word sequences in contrast with short time speech level used in conventional methods. As a result, degradation of recognition accuracy due to failure of endpoint detection is very small even at the SNR of 7 dB, where speech detection using speech level does not work at all. In the proposed N-best search method, the effectiveness of keeping candidates is improved by merging the word sequences whose meanings are identical. By reducing the number of candidates, the time for reordering the N-best candidates can be reduced to one fourth without any degradation of recognition accuracy.