ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Significance of single frequency filter for the development of children’s KWS system

Biswaranjan Pattanayak, Gayadhar Pradhan

Spotting a defined set of keywords from a running speech is known as keyword spotting (KWS). When keywords are detected using speech data from child speakers with the acoustic model built using speech data from adult speakers, it is named as children's KWS system. Owing to the differences in pitch and speaking rate between the two kind of speakers, the performance of children's KWS system deteriorates severely. In this paper, a pitch independent feature extraction method is proposed exploiting single frequency filtering (SFF) approach to address this issue. The method aims at finding the amplitude envelopes at Mel spaced frequencies. These amplitude envelopes are then averaged per analysis frame. Logarithm of the means are computed followed by Discrete Cosine Transform (DCT) to determine the required pitch robust feature, here denoted as Mel spaced single frequency filtering cepstral coefficient (MS-SFF-CC). The proposed feature outperforms several explored features with acoustic model trained on deep neural network-hidden Markov model (DNN-HMM) under pitch matched and mismatched test scenarios without and with data-augmented training.