ISCA Archive SpeechProsody 2008
ISCA Archive SpeechProsody 2008

Holistic and prosodic representation of the segmental aspect of speech

Nobuaki Minematsu, T. Nishimura, D. Saito, S. Asakawa, Y. Qiao

Speech communication has several steps of encoding, transmission, and decoding. In each step, various acoustic distortions are inevitably induced by non-linguistic factors such as differences of age, gender, microphone, line, room, auditory characteristics of a hearer’s ears, etc. In spite of this large variability, humans can perform very precise speech processing. Recently, the first author proposed a novel representation of speech [1, 2], which is invariant with these factors at all. Only the dynamic motions in speech are focused on and the static features in speech are completely discarded. The high validity of this new representation for speech recognition was already verified experimentally [3, 4, 5]. In this paper, we show that the new representation of the segmental aspect of speech can be interpreted as a kind of holistic and prosodic feature because the representation captures speech as music, i.e. timbre-based melody.