ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Speech recognition in mixed sound of speech and music based on vector quantization and non-negative matrix factorization

Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa

This paper describes a speech recognition method for mixed sound, consisting of speech and music, that removes the music only based on vector quantization (VQ) and non-negative matrix factorization (NMF). For isolated word recognition using the clean speech model, an improvement of about 15% was obtained compared with the case of not removing music. Furthermore, a high recognition rate of about 90% was achieved, even under the 0 dB condition using a model trained from the mixed sound after removing the music according to the VQ method.