In this paper, we propose modeling a noisy-channel for the task of voice conversion (VC). We have used the artificial neural networks (ANN) to capture speaker-specific characteristics of a target speaker which avoids the need for any training utterance from a source speaker. We use articulatory features (AFs) as canonical form or speaker-independent representation of speech signal. Our studies show that AFs contain significant amount of speaker information in their trajectories. Suitable techniques are proposed to normalize the speaker-specific information in AF trajectories and the resultant AFs are used in voice conversion. The results of voice conversion evaluated using objective and subjective measures confirm that speaker-specific characteristics of target speaker could be captured.
Index Terms: voice conversion, articulatory features, noisy-channel model, speaker-independent representation