ISCA Archive SynData4GenAI 2024 Sessions Search Website
  ISCA Archive Sessions Search Website
×

Click on column names to sort.

Searching uses the 'and' of terms e.g. Smith Interspeech matches all papers by Smith in any Interspeech. The order of terms is not significant.

Use double quotes for exact phrasal matches e.g. "acoustic features".

Case is ignored.

Diacritics are optional e.g. lefevre also matches lefèvre (but not vice versa).

It can be useful to turn off spell-checking for the search box in your browser preferences.

If you prefer to scroll rather than page, increase the number in the show entries dropdown.

top

Synthetic Data’s Transformative Role in Foundational Speech Models

KOS, Greece
31 August 2024

Chairs: Pedro Moreno Mengibar, Bhuvana Ramabhadran, Shinji Watanabe, and Ahmed Ali
doi: 10.21437/SynData4GenAI.2024



Morning Poster Session


Improving Text-To-Audio Models with Synthetic Captions
Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Samuele Cornell, Jordan Darefsky, Zhiyao Duan, Shinji Watanabe

Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments
Pai Zhu, Dhruuv Agarwal, Jacob W Bartel, Kurt Partridge, Hyun Jin Park, Quan Wang

Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob W Bartel, Kyle Kastner, Yuan Wang, Andrew Rosenberg, Quan Wang

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
Nick Rossenbach, Sakriani Sakti, Ralf Schlüter

Leveraging LLM for Augmenting Textual Data in Code-Switching ASR: Arabic as an Example
Sadeen Alharbi, Reem Binmuqbil, Ahmed Ali, Raghad Aloraini, Saiful Bari, Areeb Alowisheq, Yaser Alonaizan

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu, Jiaqi Song, Xuankai Chang, Hengwei Bian, Soumi Maiti, Shinji Watanabe

Using Voicebox-based Synthetic Speech for ASR Adaptation
Hira Dhamyal, Leda Sari, Vimal Manohar, Nayan Singhal, Chunyang Wu, Jay Mahadeokar, Matt Le, Apoorv Vyas, Bowen Shi, Wei-Ning Hsu, Suyoun Kim, Ozlem Kalinli

SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning
Chien-yu Huang, Min-Han Shih, Ke-Han Lu, Chi-Yuan Hsiao, Hung-yi Lee

On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures
Benedikt Hilmes, Nick Rossenbach, Ralf Schlüter



Afternoon Poster Session


Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
Tuan-Nam Nguyen, Quan Pham, Alexander Waibel

Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Hye-jin Shim, Md Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen

Audio Dialogues: Dialogues dataset for audio and music understanding
Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Dareen Alharthi, Roshan S Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

Improving Spoken Semantic Parsing using Synthetic Data from Large Generative Models
Roshan S Sharma, Suyoun Kim, Trang Le, Daniel A Lazar, Akshat Shrivastava, Kwanghoon An, Piyush Kansal, Leda Sari, Ozlem Kalinli, Mike Seltzer

Exploring synthetic data for cross-speaker style transfer in style representation based TTS
Lucas H Ueda, Leonardo Marques, Flávio Simões, Mário Uliani Neto, Fernando Runstein, Bianca Dal Bó, Paula D P Costa

Investigating the Use of Synthetic Speech Data for the Analysis of Spanish-Accented English Pronunciation Patterns in ASR
Margot Masson, Julie Carson-Berndsen

Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob W Bartel, Kyle Kastner, Yuan Wang, Andrew Rosenberg, Quan Wang

Navigating the United States Legislative Landscape on Voice Privacy: Existing Laws, Proposed Bills, Protection for Children, and Synthetic Data for AI
Satwik Dutta, John H Hansen

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms
Joseph Konan, Shikhar Agnihotri, Ojas Bhargave, Shuo Han, Bhiksha Raj, Ankit Parag Shah, Yunyang Zeng

Naturalness and the Utility of Synthetic Speech in Model Pre-training
Diptasree Debnath, Asad Ullah, Helard Becerra, Andrew Hines




Search papers
Article
×

Keynote 1

Keynote 2

Keynote 3

Morning Poster Session

Keynote 4

Afternoon Poster Session

Keynote 5

Keynote 6