ISCA Archive SLaTE 2019
ISCA Archive SLaTE 2019

Comparison of automatic syllable stress detection quality with time-aligned boundaries and context dependencies

Chiranjeevi Yarra, Manoj Kumar Ramanathi, Prasanta Kumar Ghosh

Syllable stress is detected automatically using a classifier trained with stress labels and features computed based on acoustics within syllables. Typically, in real scenarios, syllable data is estimated considering an acoustic model (AM) and a lexicon. Thus, their quality affects the stress detection performance (accuracy). In this work, we analyse variations in the accuracies on ISLE corpus containing spoken English utterances from non-native speakers. In the analysis, we consider five AMs and five lexicons containing native English pronunciations augmented with different percentages of non-native pronunciations collected from the corpus. For each AM and lexicon combination, we estimate syllable data using two existing forced-alignment techniques and observe that the accuracies obtained with the features from both the data are comparable. Further, we propose a set of features based on context dependencies of the syllable nuclei. For all the combinations, the accuracies are higher when context based features are augmented with acoustic based features and the highest accuracy is obtained for the combination whose estimated syllable data has the least error. Among all five lexicons, the highest and the least accuracies for ITA & GER are obtained when the lexicons include all & none and none & all of the non-native pronunciations respectively.