ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

An empirical study of multilingual and low-resource spoken term detection using deep neural networks

Jie Li, Xiaorui Wang, Bo Xu

As a further step of our previous work, this paper focuses on how to promote the multilingual spoken term detection (STD) system by the use of shared-hidden-layer multilingual DNN (SHL-MDNN). Seven languages namely Arabic, English, German, Japanese, Korean, Mandarin and Spanish are used in our experiments. Compared with our original multilingual STD system, which is based on Subspace GMMs (SGMMs), the resulting system reduces the average equal error rate (EER) on seven languages by 17.2%. Our STD system is also evaluated under low-resource conditions in this paper. We choose Mandarin and English as two target languages and simulate different degrees of available resources. The experimental results show that with the help of cross-lingual model transfer, our STD system can be elevated a lot in low-resource settings. To further improve the performance, we also attempt to use dropout strategy during the process of cross-lingual model transfer. However, no significant improvement can be observed in our experiments. This indicates the dropout method is not so effective on cross-lingual model transfer task.As a further step of our previous work, this paper focuses on how to promote the multilingual spoken term detection (STD) system by the use of shared-hidden-layer multilingual DNN (SHL-MDNN). Seven languages namely Arabic, English, German, Japanese, Korean, Mandarin and Spanish are used in our experiments. Compared with our original multilingual STD system, which is based on Subspace GMMs (SGMMs), the resulting system reduces the average equal error rate (EER) on seven languages by 17.2%. Our STD system is also evaluated under low-resource conditions in this paper. We choose Mandarin and English as two target languages and simulate different degrees of available resources. The experimental results show that with the help of cross-lingual model transfer, our STD system can be elevated a lot in low-resource settings. To further improve the performance, we also attempt to use dropout strategy during the process of cross-lingual model transfer. However, no significant improvement can be observed in our experiments. This indicates the dropout method is not so effective on cross-lingual model transfer task.


doi: 10.21437/Interspeech.2014-399

Cite as: Li, J., Wang, X., Xu, B. (2014) An empirical study of multilingual and low-resource spoken term detection using deep neural networks. Proc. Interspeech 2014, 1747-1751, doi: 10.21437/Interspeech.2014-399

@inproceedings{li14g_interspeech,
  author={Jie Li and Xiaorui Wang and Bo Xu},
  title={{An empirical study of multilingual and low-resource spoken term detection using deep neural networks}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={1747--1751},
  doi={10.21437/Interspeech.2014-399},
  issn={2308-457X}
}