ISCA Archive Eurospeech 2003
ISCA Archive Eurospeech 2003

Time alignment for scenario and sounds with voice, music and BGM

Yamato Wada, Masahide Sugiyama

This paper proposes a new time alignment method between scenario and sounds with voice, music and BGM (Back Ground Music) in order to generate video caption automatically. The proposed time alignment method, Voice-Music-Pause+BGM method, is based on the composition of voice and music models. The result of the experiments to evaluate the proposed method shows the proposed method works about 10~60 times better than the conventional time alignment methods.