The articulators of human speech might only be able to move slowly, which results in the gradual and continuous change of acoustic speech properties. Nevertheless, the so-called speech continuity is rarely explored to discriminate different phones. To exploit this, this paper investigates a multiple-frame MFCC representation (that is expected to retain sufficient time-continuity information) in combination with a supervised dimensionality reduction method, whose target is to find low-dimensional representations that optimally separates different phone classes. The speech continuity information is integrated into this framework by using the regularization terms that penalize discontinuities. Experimental results on TIMIT phonetic classification show that the use of regularizers can help to improve the separability of phone classes.
Index Terms: Dimensionality Reduction; Contextual Representation; TIMIT Phone Classification; Regularization; Laplacian Smoothing