In this paper we study a close incorporation of speaker diarization with speaker adaptive speech recognition in our broadcast transcription system. We provide our motivation for utilization of speech transcripts in the diarization process and analyze the effect it yields in terms of diarization performance or computational cost. Further, speaker adaptation performed according to various scenarios of speaker segmentation and diarization of an audio stream is evaluated. For better insight, the limit performance is evaluated substituting most of the components of the system by the oracle ones.
Index Terms: Speaker diarization, i-vectors, speaker adaptation, CMLLR, broadcast transcription