ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Between recognition and synthesis - 300 bits/second speech coding

Mohamed Ismail, Keith Ponting

This paper describes a system for speech coding designed to operate at 300 bits/sec and below. A continuous speech recogniser is used to transcribe incoming speech as a sequence of sub-word units termed acoustic segments. Prosodic information is combined with segment identity to form a serial data stream suitable for transmission. A rule- based system maps segment identity and prosodic information to parameters suitable for driving a parallel formant speech synthesiser. Acoustic segment Hidden Markov Models (HMMs) are shown to perform as well as conventional phone HMMs during recognition. A segment error rate of 3.8 % was achieved in a speaker-dependent, task-dependent configuration. An average data rate of 262 bits/sec was obtained. Speech from the synthesiser was better than obtainable from a purely textual representation though not as good as 2400 bit/sec Linear Predictive Coding (LPC) vocoded speech.


doi: 10.21437/Eurospeech.1997-182

Cite as: Ismail, M., Ponting, K. (1997) Between recognition and synthesis - 300 bits/second speech coding. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 441-444, doi: 10.21437/Eurospeech.1997-182

@inproceedings{ismail97_eurospeech,
  author={Mohamed Ismail and Keith Ponting},
  title={{Between recognition and synthesis - 300 bits/second speech coding}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={441--444},
  doi={10.21437/Eurospeech.1997-182},
  issn={1018-4074}
}