This paper presents our contribution to the INTERSPEECH 2020 Breathing
Sub-challenge. Besides fulfilling the main goal of the challenge, which
involves the automatic prediction from conversational speech of the
breath signals obtained from respiratory belts, we also analyse both
original and predicted signals in an attempt to overcome the main pitfalls
of the proposed systems. In particular, we identify the subsets of
most irregular belt signals which yield the worst performance, measured
by the Pearson correlation coefficient, and show how they affect the
results that were obtained by both the baseline end-to-end system and
variants such as a Bidirectional LSTM. The performance of this type
of architecture indicates that future information is also relevant
when predicting breathing patterns.
We also study the
information retained from the AM-FM decomposition of the speech signal
for this purpose, showing how the AM component significantly outperforms
the FM component on all experiments, but fails to surpass the prediction
results obtained using the original speech signal.
Finally, we validate
the system’s performance in video-conferencing conditions by
using data augmentation and compare clinically relevant parameters,
such as breathing rate, from both the original belt signals and the
ones predicted from the simulated video-conferencing signals.