doi: 10.21437/SynData4GenAI.2024
Improving Text-To-Audio Models with Synthetic Captions
Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Samuele Cornell, Jordan Darefsky, Zhiyao Duan, Shinji Watanabe
Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments
Pai Zhu, Dhruuv Agarwal, Jacob W Bartel, Kurt Partridge, Hyun Jin Park, Quan Wang
Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob W Bartel, Kyle Kastner, Yuan Wang, Andrew Rosenberg, Quan Wang
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
Nick Rossenbach, Sakriani Sakti, Ralf Schlüter
Leveraging LLM for Augmenting Textual Data in Code-Switching ASR: Arabic as an Example
Sadeen Alharbi, Reem Binmuqbil, Ahmed Ali, Raghad Aloraini, Saiful Bari, Areeb Alowisheq, Yaser Alonaizan
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu, Jiaqi Song, Xuankai Chang, Hengwei Bian, Soumi Maiti, Shinji Watanabe
Using Voicebox-based Synthetic Speech for ASR Adaptation
Hira Dhamyal, Leda Sari, Vimal Manohar, Nayan Singhal, Chunyang Wu, Jay Mahadeokar, Matt Le, Apoorv Vyas, Bowen Shi, Wei-Ning Hsu, Suyoun Kim, Ozlem Kalinli
SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning
Chien-yu Huang, Min-Han Shih, Ke-Han Lu, Chi-Yuan Hsiao, Hung-yi Lee
On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures
Benedikt Hilmes, Nick Rossenbach, Ralf Schlüter
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
Tuan-Nam Nguyen, Quan Pham, Alexander Waibel
Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
Hye-jin Shim, Md Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen
Audio Dialogues: Dialogues dataset for audio and music understanding
Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Dareen Alharthi, Roshan S Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh
Improving Spoken Semantic Parsing using Synthetic Data from Large Generative Models
Roshan S Sharma, Suyoun Kim, Trang Le, Daniel A Lazar, Akshat Shrivastava, Kwanghoon An, Piyush Kansal, Leda Sari, Ozlem Kalinli, Mike Seltzer
Exploring synthetic data for cross-speaker style transfer in style representation based TTS
Lucas H Ueda, Leonardo Marques, Flávio Simões, Mário Uliani Neto, Fernando Runstein, Bianca Dal Bó, Paula D P Costa
Investigating the Use of Synthetic Speech Data for the Analysis of Spanish-Accented English Pronunciation Patterns in ASR
Margot Masson, Julie Carson-Berndsen
Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob W Bartel, Kyle Kastner, Yuan Wang, Andrew Rosenberg, Quan Wang
Navigating the United States Legislative Landscape on Voice Privacy: Existing Laws, Proposed Bills, Protection for Children, and Synthetic Data for AI
Satwik Dutta, John H Hansen
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms
Joseph Konan, Shikhar Agnihotri, Ojas Bhargave, Shuo Han, Bhiksha Raj, Ankit Parag Shah, Yunyang Zeng
Naturalness and the Utility of Synthetic Speech in Model Pre-training
Diptasree Debnath, Asad Ullah, Helard Becerra, Andrew Hines
Article |
---|