A two-layer NMF model is proposed for vocabulary discovery. The model first extracts low-level vocabulary patterns based on a histogram of co-occurrences of Gaussians. Then latent units are discovered by spectral embedding of Gaussians at layer-1. Layer-2 discovers vocabulary patterns based on the histogram of co-occurrences of the latent units. Improvements in unordered word error rates are observed from the low-level representation to the two-layermodel on the Aurora2/ Clean database. The relation between the latent units and the states of an HMM is discussed.
Index Terms: non-negativematrix factorization, hidden Markov models, speech recognition