ISCA Archive ICSLP 1992
ISCA Archive ICSLP 1992

Frequency domain speech coding

Shane Switzer, Tim Anderson, Matthew Kabrisky, Steven K. Rogers, Bruce Suter

This paper describes research undertaken to investigate speech coding techniques that attempt to achieve high quality speech transmittable at 4800 and 2400 bits per second (bps). The approach taken is to code the raw frequency domain representation of speech sampled at 8 kHz. Speech is represented by a sparse set of frequency components. Four frequency selection schemes are implemented and the resulting frequency coefficients (magnitude and phase) are coded in an efficient manner for transmission. Specific techniques involved in the speech coder include: (J) a recurrent neural network to make periodic/noiselike decisions, (2) variable length windows for analysis and synthesis, and (3) representation of noiselike speech using frequency banded energy information. The quality of the reconstructed speech is evaluated using listening tests which compare speech produced using the different frequency selection schemes along with the original and sampled versions of the test utterances. Although the system does not achieve "toll quality" speech, the resulting speech is intelligible as shown by the scores on a Modified Rhyme Test (MRT). At various noise levels, MRT scores for the reconstructed speech are not significantly lower than those achieved by the McAulay/Quatieri sinusoidal coder. An attempt is made to lower the transmission bit rate to 2400 bps by decreasing the number of frequency coefficients per unit time. The resulting speech quality suffers, but the MRT results show little degradation.