ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Insights into end-to-end audio-to-score transcription with real recordings: A case study with saxophone works

Juan Carlos Martínez-Sevilla, María Alfaro-Contreras, Jose J. Valero-Mas, Jorge Calvo-Zaragoza

Neural end-to-end Audio-to-Score (A2S) transcription aims to retrieve a score that encodes the music content of an audio recording in a single step. Due to the recentness of this formulation, the existing works have exclusively addressed controlled scenarios with synthetic data that fail to provide conclusions applicable to real-world cases. In response to this gap in the literature, this work introduces a novel assortment of real saxophone recordings---together with their digital scores---and poses several experimental scenarios involving real and synthetic data. The obtained results confirm the adequacy of this A2S framework to deal with real data as well as proving the relevance of leveraging synthetic interpretations to improve the recognition rate in scenarios with real-data scarcity.


doi: 10.21437/Interspeech.2023-88

Cite as: Martínez-Sevilla, J.C., Alfaro-Contreras, M., Valero-Mas, J.J., Calvo-Zaragoza, J. (2023) Insights into end-to-end audio-to-score transcription with real recordings: A case study with saxophone works. Proc. INTERSPEECH 2023, 2793-2797, doi: 10.21437/Interspeech.2023-88

@inproceedings{martinezsevilla23_interspeech,
  author={Juan Carlos Martínez-Sevilla and María Alfaro-Contreras and Jose J. Valero-Mas and Jorge Calvo-Zaragoza},
  title={{Insights into end-to-end audio-to-score transcription with real recordings: A case study with saxophone works}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={2793--2797},
  doi={10.21437/Interspeech.2023-88},
  issn={2958-1796}
}