doi: 10.21437/odyssey.2024
A Brief History of the NIST Speaker Recognition Evaluations
Craig S. Greenberg
Toward Robust and Discriminative Emotional Speech Representations
Carlos Busso
Exploring individual speaker behaviour within a forensic automatic speaker recognition system
Vincent Hughes, Chenzi Xu, Paul Foulkes, Philip Harrison, Poppy Welch, Finnian Kelly, David van der Vloed
Forensic speaker recognition with BA-LR: calibration and evaluation on a forensically realistic database
Imen Ben-Amor, Jean-François Bonastre, David van der Vloed
ROXSD: The ROXANNE Multimodal and Simulated Dataset for Advancing Criminal Investigations
Petr Motlicek, Erinc Dikici, Srikanth Madikeri, Pradeep Rangappa, Miroslav Jánošík, Gerhard Backfried, Dorothea Thomas-Aniola, Maximilian Schürz, Johan Rohdin, Petr Schwarz, Marek Kováč, Květoslav Malý, Dominik Boboš, Mathias Leibiger, Costas Kalogiros, Andreas Alexopoulos, Daniel Kudenko, Zahra Ahmadi, Hoang H. Nguyen, Aravind Krishnan, Dawei Zhu, Dietrich Klakow, Maria Jofre, Francesco Calderoni, Denis Marraud, Nikolaos Koutras, Nikos Nikolau, Christiana Aposkiti, Panagiotis Douris, Konstantinos Gkountas, Eleni Sergidou, Wauter Bosma, Joshua Hughes, Hellenic Police Team
Exploring speaker similarity based selection of relevant populations for forensic automatic speaker recognition
Linda Gerlach, Finnian Kelly, Kirsty McDougall, Anil Alexander
Attention-based Comparison on Aligned Utterances for Text-Dependent Speaker Verification
Nathan Griot, Mohammad Mohammadamini, Driss Matrouf, Raphael Blouet, Jean-François Bonastre
Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations
Theo Lepage, Reda Dehak
An investigative study of the effect of several regularization techniques on label noise robustness of self-supervised speaker verification systems
Abderrahim Fathan, Xiaolin Zhu, Jahangir Alam
Using Pretrained Language Models for Improved Speaker Identification
Oleksandra Zamana, Priit Käärd, Tanel Alumäe
A Phonetic Analysis of Speaker Verification Systems through Phoneme selection and Integrated Gradients
Thomas Thebaud, Gabriel Hernández, Sarah Flora Samson Juan, Marie Tahon
Low-resource speech recognition and dialect identification of Irish in a multi-task framework
Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide
Normalizing Flows for Speaker and Language Recognition Backend
Aleix Espuña, Amrutha Prasad, Petr Motlicek, Srikanth Madikeri, Christof Schuepbach
Joint Language and Speaker Classification in Naturalistic Bilingual Adult-Toddler Interactions
Satwik Dutta, Iván López-Espejo, Dwight Irvin, John H. L. Hansen
MAGLIC: The Maghrebi Language Identification Corpus
Karen Jones, Kevin Walker, Christopher Caruso, Stephanie Strassel
On Speaker Attribution with SURT
Desh Raj, Matthew Wiesner, Matthew Maciejewski, Paola Garcia, Daniel Povey, Sanjeev Khudanpur
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications
Can Cui, Imran Sheikh, Mostafa Sadeghi, Emmanuel Vincent
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
Juan Ignacio Alvarez-Trejos, Beltrán Labrador, Alicia Lozano-Diez
PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings
Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin
Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
Lin Zhang, Themos Stafylakis, Federico Landini, Mireia Diez, Anna Silnova, Lukáš Burget
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
Jenthe Thienpondt, Kris Demuynck
Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks
Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das
Spoofing detection in the wild: an investigation of approaches to improve generalisation
Anh-Tuan Dao, Nicholas Evans, Driss Matrouf
Meaningful Embeddings for Explainable Countermeasures
Matan Karo, Arie Yeredor, Itshak Lapidot
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification
Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-François Bonastre, Itshak Lapidot
Unraveling Adversarial Examples against Speaker Identification - Techniques for Attack Detection and Victim Model Classification
Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with A Conditional Diffusion Model
Zongyang Du, Junchen Lu, Kun Zhou, Lakshmish Kaushik, Berrak Sisman
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion
Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier, Marie Tahon, Anthony Larcher, Yannick Estève
Exploring speech style spaces with language models: Emotional TTS without emotion labels
Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman
Discovering Invariant Patterns of Cognitive Decline Via an Automated Analysis of the Cookie Thief Picture Description Task
Anna Favaro, Najim Dehak, Thomas Thebaud, Jesús Villalba, Esther Oh, Laureano Moro-Velázquez
A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness
Oubaïda Chouchane, Christoph Busch, Chiara Galdi, Nicholas Evans, Massimiliano Todisco
Noise Robust Whisper Features for Dysarthric Automatic Speech Recognition
Japan Bhatt, Harsh Patel, Hemant A. Patil
Optimizing Auditory Immersion Safety on Edge Devices: An On-Device Sound Event Detection System
Reza Amini Gougeh, Zhang Nu, Zeljko Zilic
3MAS: a multitask, multilabel, multidataset semi-supervised audio segmentation model
Martin Lebourdais, Pablo Gimeno, Théo Mariotte, Marie Tahon, Alfonso Ortega, Anthony Larcher
Cross-Modal Transformers for Audio-Visual Person Verification
Rajasekhar Gnana Praveen, Jahangir Alam
Odyssey 2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results
Lucas Goncalves, Ali N. Salman, Abinay Reddy Naini, Laureano Moro-Velázquez, Thomas Thebaud, Paola Garcia, Najim Dehak, Berrak Sisman, Carlos Busso
TalTech Systems for the Odyssey 2024 Emotion Recognition Challenge
Henry Härm, Tanel Alumäe
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem
Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua D. Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge
Federico Costa, Miquel India, Javier Hernando
The ViVoLab System for the Odyssey Emotion Recognition Challenge 2024 Evaluation
Miguel Ángel Pastor, Alfonso Ortega, Antonio Miguel, Dayana Ribas
The CONILIUM proposition for Odyssey Emotion Challenge : Leveraging major class with complex annotations
Meysam Shamsi, Lara Gauder, Marie Tahon
Multimodal Audio-Language Model for Speech Emotion Recognition
Jaime Bellver, Ivan Martín-Fernández, Jose M. Bravo-Pacheco, Sergio Esteban, Fernando Fernández-Martínez, Luis Fernando D'Haro
IRIT-MFU Multi-modal systems for emotion classification for Odyssey 2024 challenge
Adrien Lafore, Clément Pagés, Leila Moudjari, Sebastião Quintas, Hervé Bredin, Thomas Pellegrini, Farah Benamara, Isabelle Ferrané, Jérôme Bertrand, Marie-Françoise Bertrand, Véronique Moriceau, Jérôme Farinas
Adapting WavLM for Speech Emotion Recognition
Daria Diatlova, Anton Udalov, Vitalii Shutov, Egor Spirin
MSP-Podcast SER Challenge 2024: L'antenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition
Jarod Duret, Yannick Estève, Mickael Rouvier
Article |
---|