doi: 10.21437/SSW.2025
Fundamental Rights, and the AI Act: Why Too Much is Not Enough
Anna-Mari Wallenberg
Hot topics in speech synthesis evaluation
Gérard Bailly, Elisabeth André, Erica Cooper, Esther Klabbers, Benjamin Cowan, Jens Edlund, Naomi Harte, Simon King, Sébastien Le Maguer, Roger K. Moore, Bernd Möbius, Sebastian Möller, Ayushi Pandey, Olivier Perrotin, Fritz Seebauer, Sofia Strömbergsson, David R. Traum, Christina Tånnander, Petra Wagner, Junichi Yamagishi, Yusuke Yasuda
Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis
Marc-André Carbonneau, Benjamin van Niekerk, Hugo Seuté, Jean-Philippe Letendre, Herman Kamper, Julian Zaïdi
Continual Subjective Evaluation Method of Speech by Merging Sort-based Preference Tests Towards Ever-Expanding Corpus of Human Ratings
Yusuke Yasuda, Junichi Yamagishi, Tomoki Toda
Explicit Emphasis Control in Text-to-Speech Synthesis
Judith Bauer, Frank Zalkow, Meinard Müller, Christian Dittmar
Style and Prosody control for Zero-shot Speech Synthesis
Antti Suni, Sébastien Le Maguer, Sofoklis Kakouros, Tuukka Törö, Juraj Šimko
Lina-Style: Word-Level Style Control in TTS via Interleaved Synthetic Data
Théodor Lemerle, Nicolas Obin, Axel Roebel
Prosody Labeling with Phoneme-BERT and Speech Foundation Models
Tomoki Koriyama
Knowledge distillation for Transformer-based text-to-speech models
Erik Henriksson, Thomas Merritt, Rasmus Dall, Felix Vaughan, Veronica Morfi
Methods of efficient speech tokenization with multilingual semantic distillation
Vadim Popov, Tasnima Sadekova, Assel Yermekova, Georgii Aparin
The impact of stress and boundary information in the input to neural TTS
Christina Tånnander, Joakim Gustafsson, Jens Edlund
TTSDS2: Robust Objective Evaluation for Human-Quality Synthetic Speech
Christoph Minixhofer, Ondřej Klejch, Peter Bell
Practical & Contextual Speech Synthesis Evaluation
Aidan Pine, Delaney Lothian, Sonya Bird, Marion Caldecott, MENETIYE, PENAC, Korin Richmond, Tye Swallow, SXEDTELISIYE, Cassia Valentini-Botinhao, Dan Wells, Patrick Littell
Using a DeepFake Classifier to Rank Speech Synthesis Quality
Natacha Miniconi, Meysam Shamsi, Aghilas Sini, Anthony Larcher
Creakiness, Breathiness, and Nasality Contribute to the Perceived Suitability of Synthesized Speech in a Pragmatically-Rich Domain
Harm Lameris, Nigel Ward
Modelling degrees of Spontaneity in Text-to-Speech Synthesis
Adaeze Adigwe, Simon King, Catherine Lai
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech
Maksim Borisov, Egor Spirin, Daria Diatlova
Multi-interaction TTS toward professional recording reproduction
Hiroki Kanagawa, Kenichi Fujita, Aya Watanabe, Yusuke Ijima
SSLZip: Simple Autoencoding for Enhancing Self-Supervised Speech Representations in Speech Generation
Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Takato Fujimoto, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Speech Synthesis Evaluation from a voting perspective - a starting point
Sébastien Le Maguer, Juraj Šimko
RepeaTTS: Towards Feature Discovery through Repeated Fine-Tuning
Atli Sigurgeirsson, Simon King
Evaluating Speech Synthesis in a Nonstandardized, Multidialectal Context: A Teochew Case Study
Agathe Wallet, Ilaine Wang, Emmett Strickland, Pierre Magistry
Investigating effects of participant factors on the subjective evaluation of synthetic speech
Fritz Seebauer, Petra Wagner
Integrating Feedback Loss from Bi-modal Sarcasm Detector for Sarcastic Speech Synthesis
Zhu Li, Yuqing Zhang, Xiyuan Gao, Devraj Raghuvanshi, Nagendra Kumar, Shekhar Nayak, Matt Coler
How Silent Are Silent Speech Interfaces? Speech Reconstruction From Whispered and Silent Ultrasound Tongue Images
Gábor Gosztolya, Ibrahim Ibrahimov, Csaba Zainkó
Exploring Language Dependency in Ultrasound-to-Speech Synthesis
Ibrahim Ibrahimov, Csaba Zainkó, Gábor Gosztolya
Interpolating Speaker Identities for Synthetic Voice Generation
Juliana Francis, Joakim Gustafsson, Éva Székely
SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
Shivam Mehta, Yingru Liu, Zhenyu Tang, Kainan Peng, Vimal Manohar, Shun Zhang, Mike Seltzer, Qing He, Mingbo Ma
Fast-VGAN: Lightweight Voice Conversion with Explicit Control of F0 and Duration Parameters
Mathilde Abrassart, Nicolas Obin, Axel Roebel
Speech synthesis for Walloon, an under-resourced minority language
Jose Felipe Espinosa Orjuela, Philippe Boula de Mareüil, Marc Evrard
Does multilingual and multi-speaker modeling improve low-resource TTS? Experiments on Sámi languages
Katri Hiovain-Asikainen, Antti Suni
EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens
Joonyong Park, Kenichi Nakamura
A Multi-dimensional Evaluation of the 2025 Blizzard Challenge
Sajad Shirali-Shahreza, Gerald Penn
What is Naturalness?
Ayushi Pandey, Sébastien Le Maguer, Naomi Harte
Fricatives in modern Text-to-Speech synthesizers
Sriyugesh Bhyravajulla, Ayushi Pandey, Arun Baby
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
Paige Tuttösí, Henny Yeung, Yue Wang, Jean-Julien Aucouturier, Angelica Lim
Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams
Zirui Li, Lauri Juvela, Mikko Kurimo
| Article |
|---|