ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation

Ryu Takeda, Shun'ichi Yamamoto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Component Analysis (ICA) with the single-input multiple-output (SIMO) model. Then, spectral distortion in the separated sounds is then estimated to generate missing feature masks. Finally, the separated sounds are recognized by missing-feature theory (MFT) for Automatic Speech Recognition (ASR). The novel aspects of our system involve estimates of spectral distortion in the temporal-frequency domain in terms of feature vectors and based on estimates error in SIMO-ICA signals. The resulting system outperformed the baseline robot audition system by 7%.


doi: 10.21437/Interspeech.2006-92

Cite as: Takeda, R., Yamamoto, S., Komatani, K., Ogata, T., Okuno, H.G. (2006) Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation. Proc. Interspeech 2006, paper 1729-Thu1CaP.1, doi: 10.21437/Interspeech.2006-92

@inproceedings{takeda06_interspeech,
  author={Ryu Takeda and Shun'ichi Yamamoto and Kazunori Komatani and Tetsuya Ogata and Hiroshi G. Okuno},
  title={{Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1729-Thu1CaP.1},
  doi={10.21437/Interspeech.2006-92},
  issn={2958-1796}
}