Unsupervised cluster adaptive training of acoustic models offers promise in improving recognition accuracy, especially for speech recognition systems that store massive sets of speech samples from unknown people. How to classify the variety of acoustic characteristics is an important problem in adaptation sample clustering. We propose a novel speech sample clustering method that focuses on the phoneme error trend in each speech sample. The proposed method classifies adaptation samples in terms of the trend of phoneme discrimination in each sample, and represents each sample as a compact phoneme error trend vector whose dimension is at most the number of phonemes. Experiments illustrate that the phoneme error trend vectors have enough expressiveness to classify acoustic characteristics effectively, and are compact enough to provide robustness against unknown data.
Index Terms: speech recognition, acoustic model adaptation, data clustering, phoneme error trend