This paper investigates a method for the real-time reconstruction of normal speech from whispers. This system could be used by aphonic individuals as a voice prosthesis. It could also provide improved verbal communication when normal speech is not appropriate. The normal speech is synthesized using the mixed excitation linear prediction model. Differences between whispered and phonated speech are discussed and methods for estimating the parameters of this model from whispered speech for real-time synthesis are proposed. This includes modification of the formants and smoothing of the noisy linear prediction spectra and synthesis of the excitation signal. Trade-offs between computational complexity, delay, and accuracy of different methods are discussed.
Index Terms. Whispered speech; voice parameter extraction; voice parameter modification; voice prosthesis