ISCA Archive SUS 1995
ISCA Archive SUS 1995

Improving recognition and synthesis of stressed speech via feature perturbation in a source generator framework

Sahar E. Bou-Ghazale, John H. L. Harisen

The objectives of this work are two fold, consisting of improving both speech recognition and synthesis of speech under stress. Improved recognition is achieved by generating simulated-stress tokens to replace neutral data used in the recognizer training phase. The second goal is directed at generating stressed speech from neutral speech. This is accomplished by formulating speech parameter models for angry, Lombard effect, and loud speaking conditions, and perturbing the parameters of neutral speech. The studies/evaluations conducted are based on a previously established stress database, called SUSAS (Speech Under Simulated and Actual Stress). Results show that the token generation training method improved isolated word recognition by an overall average of 15% when compared to neutral trained isolated word recognition. Results from formal listener evaluations of stress perturbed neutral speech show successful classification rates of 87% for angry speech, 75% for Lombard effect speech, and 92% for loud speech.


Cite as: Bou-Ghazale, S.E., Harisen, J.H.L. (1995) Improving recognition and synthesis of stressed speech via feature perturbation in a source generator framework. Proc. ESCA/NATO Workshop on Speech under Stress, 45-48

@inproceedings{boughazale95_sus,
  author={Sahar E. Bou-Ghazale and John H. L. Harisen},
  title={{Improving recognition and synthesis of stressed speech via feature perturbation in a source generator framework}},
  year=1995,
  booktitle={Proc. ESCA/NATO Workshop on Speech under Stress},
  pages={45--48}
}