ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Using tilt for automatic emphasis detection with Bayesian networks

Yishuang Ning, Zhiyong Wu, Xiaoyan Lou, Helen Meng, Jia Jia, Lianhong Cai

This paper proposes a new framework for emphasis detection from natural speech, where emphasis refers to a word or part of a word perceived as standing out from its surrounding words. Labeling emphatic words from speech recordings plays a significant role not only in human-computer interactions, but also in building speech corpus for expressive speech synthesis. Many previous researches use the global features to train their models, neglecting the efficiency of the local ones. In this paper, we introduce the tilt parameters which correspond to the phonetic prominence of an intonation event to our task. Besides, traditional approaches such as emphasis detection with support vector machines (SVMs) neglect the correlations between features, thus degrading the accuracy of emphasis detection. In this paper, we use Bayesian networks (BNs) which consider the dependency between features as detector. Experimental results demonstrate that BNs outperform the baseline and SVMs for the task. Specifically, by combining the tilt feature with the traditional segmental features and semitone, the proposed method yields an 11.6% improvement in emphasis detection accuracy as compared with the baseline and 2.2%-3.1% improvement with other feature combinations.