Speech is usually observed after passing through some form of "channel" that results in distortions. For some scenarios it is possible to build explicit models of this channel distortion and hence compensate the acoustic models. However the accuracy of the distortion model is sometimes poor and more general adaptation approaches are required. This paper investigates these model-based approaches for communication channel, link, modelling. In particular the paper examines the interaction of link models with speaker adaptation and adaptive training. CMLLR link models with multiple transforms can yield multiple inconsistent feature-spaces When combined with speaker adaptation with very few transforms this inconsistency can limit adaptation performance gains. In contrast using a front-end CMLLR (FE-CMLLR) transform yields a consistent space for speaker adaptation. These schemes are compared on communication channel distorted dialect Arabic conversational speech. Preliminary results on this task indicate the benefits of performing adaptation in a consistent feature-space.
Index Terms: acoustic model adaptation, adaptive training