ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Speech activity detection for NASA apollo space missions: challenges and solutions

Ali Ziaei, Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen, Douglas W. Oard

Speech Activity Detection (SAD) is a well researched problem for communication, command and control applications, where audio segments are short duration and solution proposed for noisy as well as clean environments. In this study, we investigate the SAD problem using NASA's Apollo space mission data [1]. Unlike traditional speech corpora, the audio recordings in Apollo are extensive from a longitudinal perspective (i.e., 6–12 days each). From SAD perspective, the data offers many challenges: (i) noise distortion with variable SNR, (ii) channel distortion, and (iii) extended periods of non-speech activity. Here, we use the recently proposed Combo-SAD, which has performed remarkably well in DARPA RATS evaluations, as our baseline system [2]. Our analysis reveals that the Combo-SAD performs well when speech-pause durations are balanced in the audio segment, but deteriorates significantly when speech is sparse or absent. In order to mitigate this problem, we propose a simple yet efficient technique which builds an alternative model of speech using data from a separate corpora, and embeds this new information within the Combo-SAD framework. Our experiments show that the proposed approach has a major impact on SAD performance (i.e., +30% absolute), especially in audio segments that contain sparse or no speech information.

s A. Sangwan, L. Kaushik, C. Yu, John H. L. Hansen and D. Oard, “Houston, We have a solution: Using NASA Apollo Program to advance Speech and Language Processing Technology,” Interspeech 2013. S. O. Sadjadi and John H. L. Hansen, “Unsupervised Speech Activity Detection using Voicing Measures and Perceptual Spectral Flux,” IEEE Signal Processing Letters, Vol. 20, No. 3, March 2013.