ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Open-vocabulary spoken document retrieval based on new subword models and subword phonetic similarity

Kohei Iwata, Yoshiaki Itoh, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee

A new type of video retrieval system is proposed that identifies a target video section by searching for a word passage submitted as a quoted speech or text query. The proposed system has two unique characteristics. The first characteristic is that it is based on subword models such as phonemes, syllables, and morphemes so the system is able to deal with any type of query, including new words and personal names. The second characteristic is that the system relies on acoustic similarity between subword models. Furthermore, new subword models were constructed for the retrieval system to improve performance. The new models were based on two concepts: contextdependent models and more sophisticated in the time axis than phone models. Through experimentation, the effectiveness and scope of the proposed spoken document retrieval system were confirmed, and suitable subword models for the proposed method discussed.