ISCA Archive ICSLP 1992
ISCA Archive ICSLP 1992

A phoneme labelling workbench using HMM and spectrogram reading knowledge

Shingo Fujiwara, Yasuhiro Komori, Masahide Sugiyama

This paper proposes a workbench for the phoneme labelling of speech data, that acts as a powerful tool in reducing the effort required to create phoneme labels. The proposed workbench consists of two modules: a user interface module and a phoneme segmentation engine that performs automatic phoneme segmentation. An operator can label speech data interactively by using the window and referring easily to automatic phoneme boundaries. The phoneme segmentation engine is based on the hidden Markov model (HMM) and spectrogram reading knowledge (SRK). The performance of the phoneme segmentation engine was estimated with a 5,240 Japanese word speech database. The segmentation rates of the engine were 96.1%(50ms) and 89.1%(30ms). The quality of phoneme labels produced by operators using the workbench was then estimated. The average error of the created labels was about 6ms, with the standard deviation at about 10ms. The workbench architecture, the user interface and the performance of the phoneme segmentation engine are presented. An implemented workbench is also described.