ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Revisiting WFST-based Hybrid Japanese Speech Recognition System for Individuals with Organic Speech Disorders

Naoki Hojo, Ryoichi Takashima, Chihiro Sugiyama, Nobukazu Tanaka, Kanji Nohara, Kazunori Nozaki, Tetsuya Takiguchi

End-to-end automatic speech recognition (ASR) technology has advanced significantly, yet it remains ineffective for individuals with speech disorders. In particular, Japanese ASR faces unique challenges due to its diverse character set, including kanji (Chinese characters), hiragana, and katakana (Japanese phonetic syllabary). Adapting an end-to-end ASR model to this task requires extensive training data. In this paper, we revisit a WFST-based hybrid ASR system that decomposes the system into an acoustic model, a pronunciation dictionary, and a language model. This approach is effective when speech data is limited because it uses speech data only to train the acoustic model, while the other components learn only from text data. In addition, to enhance the acoustic model, we introduce multi-step model adaptation using synthetic speech. Experimental results with speakers with organic speech disorders demonstrated that the proposed system outperformed Whisper.