The problem of system performance degradation in mismatched acoustic
conditions has been widely acknowledged in the community and is common
for different fields. The present state-of-the-art deep speaker embedding
models are domain-sensitive. The main idea of the current research
is to develop a single method for automatic signal quality estimation,
which allows to evaluate short-term signal characteristics.
This paper presents
a neural network based approach for blind speech signal quality estimation
in terms of signal-to-noise ratio (SNR) and reverberation time (RT60),
which is able to classify the type of underlying additive noise. Additionally,
current research revealed the need for an accurate voice activity detector
that performs well in both clean and noisy unseen environments. Therefore
a novel neural network VAD based on U-net architecture is presented.The
proposed algorithms allow to perform the analysis of NIST, SITW, Voices
datasets commonly used for objective comparison of speaker verification
systems from the new point of view and consider effective calibration
steps to improve speaker recognition quality on them.