ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Automatic word acquisition from continuous speech

Helmut Lucke, Masanori Omote

A method for learning lexical representations of unknown words in an unsupervised manner is described. The unknown words are automatically extracted from continuous speech and a clustering algorithm is used to derive word clusters and lexical representations based on the set of phonetic units used in the system. In experiments, we verify the robustness of the approach. An interesting feature is that extraction errors usually do no harm, as wrongly extracted words tend to inhabit clusters by themselves and thus do not adversely effect the modeling of correctly extracted words.