GlottHMM is a previously developed vocoder that has been successfully
used in HMM-based synthesis by parameterizing speech into two parts
(glottal flow, vocal tract) according to the functioning of the real
human voice production mechanism. In this study, a new glottal vocoding
method, GlottDNN, is proposed. The GlottDNN vocoder is built on the
principles of its predecessor, GlottHMM, but the new vocoder introduces
three main improvements: GlottDNN (1) takes advantage of a new, more
accurate glottal inverse filtering method, (2) uses a new method of
deep neural network (DNN) -based glottal excitation generation, and
(3) proposes a new approach of band-wise processing of full-band speech.
The proposed GlottDNN vocoder was evaluated as part of a full-band
state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and
compared against the release version of the original GlottHMM vocoder,
and the well-known STRAIGHT vocoder. The results of the subjective
listening test indicate that GlottDNN improves the TTS quality over
the compared methods.