ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

Competency Evaluation in Voice Mimicking Using Acoustic Cues

Abhijith G., Adharsh S., Akshay P. L., Rajeev Rajan

The fusion of i-vector with prosodic features is used to identify the most competent voice imitator through a deep neural network framework (DNN) in this paper. This experiment is conducted by analyzing the spectral and prosodic characteristics during voice imitation. Spectral features include mel-frequency cepstral features (MFCC) and modified group delay features (MODGDF). Prosodic features, computed by the Legendre polynomial approximation, are used as complementary information to the i-vector model. Proposed system evaluates the competence of artists in voice mimicking and ranks them according to the scores from a classifier based on mean opinion score (MOS). If the artist with the highest MOS is identified as rank-1 by the proposed system, a hit occurs. DNN-based classifier makes the decision based on the probability value on the nodes at the output layer. The performance is evaluated using top X-hit criteria on a mimicry dataset. Top-2 hit rate of 81.81% is obtained for fusion experiment. The experiments demonstrate the potential of i-vector framework and its fusion in competency evaluation of voice mimicking.