This paper describes work on Japanese voice-search at Yahoo! Japan. We first describe several implementation details of our WFST-based internal decoder which make the voice-search task more efficient including a simple, but effective, compressed WFST arc representation. We then describe a baseline system and make a comparison between our internal decoder and two open-source decoders, Juicer and Julius. We also describe our initial attempts to adapt the baseline system through simple language model adaptation using manually transcribed anonymized voice queries. To achieve this we present a sequence of WFST operations which preserve consistency of segmentation between the manual and automatic transcriptions. We show that even using this simple adaptation method we obtain a reduction in sentence error rate of up to 4.64% relative.
Index Terms: ASR, Japanese, voice search, WFST