ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

A novel tree-based clustering algorithm for statistical language modeling

G. Damnati, J. Simonin

In this paper, a new method to cluster words into classes is proposed in order to define a statistical language model. The purpose of this algorithm is to decrease the computational cost of the clustering task while not degrading speech recognition performance. The algorithm provides a bottom-up hierarchical clustering using the reciprocal neighbours method. This technique consists in merging several pairs of classes within a single iteration. Experiments on a spontaneous speech corpus are presented. Results are given both in terms of perplexity and word recognition error rate. We obtain a large reduction in the number of iterations necessary to build a classification tree and thus a CPU time reduction in building the model as well as a reduction in both perplexity and word error rate.


doi: 10.21437/Eurospeech.1997-685

Cite as: Damnati, G., Simonin, J. (1997) A novel tree-based clustering algorithm for statistical language modeling. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 2715-2718, doi: 10.21437/Eurospeech.1997-685

@inproceedings{damnati97_eurospeech,
  author={G. Damnati and J. Simonin},
  title={{A novel tree-based clustering algorithm for statistical language modeling}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={2715--2718},
  doi={10.21437/Eurospeech.1997-685},
  issn={1018-4074}
}