ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Automatic transcription for a web 2.0 service to search podcasts

Jun Ogata, Masataka Goto, Kouichirou Eto

This paper describes speech recognition techniques that enable a Web 2.0 service "PodCastle" where users can search and read transcribed texts of podcasts, and correct recognition errors in those texts. Most previous speech recognizers had difficulties transcribing podcasts because podcasts include various kinds of contents recorded in different conditions and cover recent topics that tend to have many out-of-vocabulary words. To overcome such difficulties, we continuously improve speech recognizers by using information aggregated on the basis of Web 2.0. For example, a language model is adapted to a topic of the target podcast on the fly, the pronunciations of out-of-vocabulary words are obtained from a Web 2.0 service, and an acoustic model is trained by using the results of the error correction by anonymous users. The experiments we report in this paper show that our techniques produce promising results for podcasts.