We propose and investigate methods for identifying regions of speech that have unexpected distortions not seen in training data. The methods do not require knowledge of correct labels and rely only on divergence between statistics of test and training data. We propose two metrics with and without probabilistic assumptions. Our experiments show that the proposed non-probabilistic method requires a relatively small amount of test data of the order of several seconds to stabilize, and correlates well with recognition error observed on the test data.
Index Terms: Unexpected distortions, confidence estimation, machine recognition of speech