Phonetic-search is a method used to enable fast search of spoken keywords within large amounts of audio recordings. The phonetic search process consists of two stages – the indexing phase, where a phonetic lattice is constructed, and the search phase, where keywords are searched in this lattice. The performance of phonetic search systems is highly sensitive to the accuracy of the phonetic recognition, and therefore acoustic model training requires substantial amounts of audio and linguistic resources. Recently, there is a growing demand for applications that require support for keyword spotting in many different languages, including under-resourced languages. Supporting such languages, however, poses a substantial challenge for phonetic-search, since achieving merely reasonable performance requires a lot of training data. In the current research presented here, we propose methods for supporting a new language (the target language), while coping with limited resources, by using existing acoustic models of another language (the source language). In the indexing phase, acoustic models of the source language are used to generate phonetic lattices. Then, the search for keywords in the target language is performed over the recognized lattices. The search is performed by using a cross-language phonetic mapping between the target and source language phonemes. This paper presents methods for cross-language phonetic-search configurations, which depend on the amount of target language available data. Phonetic-search experiments were performed on Spanish as a target language and using American-English and Levantine Arabic as source languages. Results are compared to standard monolingual acoustic modeling in Spanish and show that it is possible to achieve reasonable applicable accuracy for retrieval of spoken words using different combinations of phonetic mappings.
Index Terms. Keyword-spotting; phonetic-search; under-resourced languages