The human singing voice is highly expressive instrument capable of
producing a variety of complex timbres. Singing synthesis today is
popular amongst composers and studio musicians accessing the technology
by means of offline sequencing platforms. Only a couple of singing
synthesizers are known to be equipped with both the real-time capability
and the user interface to successfully target live performances. These
are LIMSI’s Cantor Digitalis and Yamaha’s VOCALOID Keyboard.
However, both systems have their own shortcomings. The former is limited
to vowels and does not synthesize complete words or syllables. The
latter is only real-time to the syllable level and thus requires specifications
of the entire syllable before it commences in the performance. A demand
remains for a singing synthesis system that truly solves the problem
of real-time synthesis — a system capable of synthesizing both
vowels and consonants to form entire words while being capable of synthesizing
in real-time to the sub-frame level. Such a system has to be versatile
enough to exhaustively present all acoustic options possible to the
user for maximal control while being intelligent enough to fill in
acoustic details that are too fine for human reflexes to control.
SERAPHIM is a real-time singing synthesizer developed in answer
to this demand. This paper presents the implementation of SERAPHIM
for performing musicians and studio musicians, together with how 3D
game developers may use Seraphim to deploy singing in their games.