ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Auditory fovea based speech enhancement and its application to human-robot dialog system

Kazuhiro Nakadai, Hiroshi G. Okuno, Hiroaki Kitano

This paper presents an active direction-pass filter (ADPF) that separates sound from a specified direction by using a pair of microphones. Its application to front-end processing for speech recognition is also reported. The ADPF improves sound source separation by accurate sound direction obtained by multi-modal integration and active motor control that keeps the robot facing to a sound source, because the resolution of the center direction is much higher than that of peripherals, indicating similar property of visual fovea. In order to recognize separated sound streams, a Hidden Markov Model (HMM) based automatic speech recognition is built with multiple acoustic models trained by the output of the ADPF under various conditions. The experimental results by a preliminary dialog system prove that it works well even when two speakers speak simultaneously.