ISCA Archive ICSLP 1996
ISCA Archive ICSLP 1996

An experimental Japanese/English interpreting video phone system

M. Karaorman, T. H. Applebaum, T. Itoh, M. Endo, Y. Ohno, M. Hoshimi, T. Kamai, K. Matsui, K. Hata, S. Pearson, Jean-Claude Junqua

In this paper we report on the architectural design issues and experiences gained while building and demonstrating an experimental interpreting video phone (IVP) system. The IVP system has been demonstrated in an internet home shopping simulation simultaneously before live audiences in Japan and the U.S. An American shop assistant and a Japanese customer engaged in task-directed dialogues, using their native languages. In addition to their direct audio/visual contact by ISDN video phone, each participant heard a translation of the remote speaker’s utterances in a synthetic voice in real-time. Each site used a medium-size vocabulary, a continuous speech recognition system and a text-to-speech synthesis (TTS) system for the local language. Recognition results were transmitted over the internet to the remote site, where the corresponding translated sentence was spoken by TTS in the listener’s native language. All of the speech and language processing software components of the system were independently developed proprietary technologies of the authors’ laboratories which were integrated using commercially available hardware and communication media. Difficulties encountered in developing the system, the accommodations which were made, and other experiences gained through the process are reported in this paper.