ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

NTF of Spectral and Spatial Features for Tracking and Separation of Moving Sound Sources in Spherical Harmonic Domain

Mateusz Guzik, Konrad Kowalczyk

This paper presents a novel Non-negative Tensor Factorization (NTF) based approach to tracking and separation of moving sound sources, formulated in the Spherical Harmonic Domain (SHD). In particular, at first, we redefine an already existing Ambisonic NTF by introducing time-dependence into the Spatial Covariance Matrix (SCM) model. Next, we further extend the time-dependent SCM by incorporating a newly proposed NTF model of the spatial features, thereby introducing spatial components. To exploit the relationship between the positions of sound sources in adjacent time frames, resulting from the naturally occurring continuity of the movement itself, we impose local smoothness on time-dependent components of the spatial features. To this end, we propose a suitable posterior probability with Gibbs prior, and finally we derive the corresponding update rules. The experimental evaluation is based on first-order Ambisonic recordings of speech utterances and musical instruments in several scenarios with moving sources.