ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Combination of diverse subword units in spoken term detection

Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh

This paper focuses on the following two points: First, we try to clarify the effect of combination systems from two aspects, accuracy and heterogeneity. And then we evaluate our unique subword unit, called Sub-Phonetic Segment (SPS) to maximize performance improvement by combination. Combination systems usually yield higher performance than any individual system. When the systems being combined are individually accurate but also mutually heterogeneous, the improvement by combination can be maximized. From this consideration, we estimate heterogeneity by correlation of false alarm errors of combined systems and confirm that lower correlation of two systems yields the better performance improvement by combination. Comparative tests of several combination approaches are carried out on subword-based spoken term detection. Since subword-based systems use constrained linguistic knowledge, it is fairly straightforward to verify the heterogeneity of combined systems. Experimental results show that the most significant improvements can be achieved by combination of two different subword units, triphone and SPS, which are highly heterogeneous subword units with low correlation of false alarm detections.