doi: 10.21437/Interspeech.2019
ISSN: 2958-1796
Statistical Approach to Speech Synthesis: Past, Present and Future
Keiichi Tokuda
Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network
Fei Wu, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur
A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of fo in Vowel Perception
Gary Yeung, Abeer Alwan
Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques
Robert Gale, Liu Chen, Jill Dolata, Jan van Santen, Meysam Asgari
Ultrasound Tongue Imaging for Diarization and Alignment of Child Speech Therapy Sessions
Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
Automated Estimation of Oral Reading Fluency During Summer Camp e-Book Reading with MyTurnToRead
Anastassia Loukina, Beata Beigman Klebanov, Patrick Lange, Yao Qian, Binod Gyawali, Nitin Madnani, Abhinav Misra, Klaus Zechner, Zuowei Wang, John Sabatini
Sustained Vowel Game: A Computer Therapy Game for Children with Dysphonia
Vanessa Lopes, João Magalhães, Sofia Cavaco
The Dependability of Voice on Elders’ Acceptance of Humanoid Agents
Anna Esposito, Terry Amorese, Marialucia Cuciniello, Maria Teresa Riviello, Antonietta M. Esposito, Alda Troncone, Gennaro Cordasco
God as Interlocutor — Real or Imaginary? Prosodic Markers of Dialogue Speech and Expected Efficacy in Spoken Prayer
Oliver Niebuhr, Uffe Schjoedt
Expressiveness Influences Human Vocal Alignment Toward voice-AI
Michelle Cohn, Georgia Zellou
Detecting Topic-Oriented Speaker Stance in Conversational Speech
Catherine Lai, Beatrice Alex, Johanna D. Moore, Leimin Tian, Tatsuro Hori, Gianpiero Francesca
Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts
Jilt Sebastian, Piero Pierucci
Explaining Sentiment Classification
Marvin Rajwadi, Cornelius Glackin, Julie Wall, Gérard Chollet, Nigel Cannings
Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models
Ricardo Kleinlein, Cristina Luna Jiménez, Juan Manuel Montero, Zoraida Callejas, Fernando Fernández-Martínez
Survey Talk: Modeling in Automatic Speech Recognition: Beyond Hidden Markov Models
Ralf Schlüter
Very Deep Self-Attention Networks for End-to-End Speech Recognition
Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller, Alex Waibel
Jasper: An End-to-End Convolutional Neural Acoustic Model
Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde
Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition
Niko Moritz, Takaaki Hori, Jonathan Le Roux
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
Yonatan Belinkov, Ahmed Ali, James Glass
Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder
Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa
On Nonlinear Spatial Filtering in Multichannel Speech Enhancement
Kristina Tesch, Robert Rehr, Timo Gerkmann
Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation
Juan M. Martín-Doñas, Jens Heitkaemper, Reinhold Haeb-Umbach, Angel M. Gomez, Antonio M. Peinado
Exploiting Multi-Channel Speech Presence Probability in Parametric Multi-Channel Wiener Filter
Saeed Bagheri, Daniele Giacobello
Variational Bayesian Multi-Channel Speech Dereverberation Under Noisy Environments with Probabilistic Convolutive Transfer Function
Masahito Togami, Tatsuya Komatsu
Simultaneous Denoising and Dereverberation for Low-Latency Applications Using Frame-by-Frame Online Unified Convolutional Beamformer
Tomohiro Nakatani, Keisuke Kinoshita
Individual Variation in Cognitive Processing Style Predicts Differences in Phonetic Imitation of Device and Human Voices
Cathryn Snyder, Michelle Cohn, Georgia Zellou
An Investigation on Speaker Specific Articulatory Synthesis with Speaker Independent Articulatory Inversion
Aravind Illa, Prasanta Kumar Ghosh
Individual Difference of Relative Tongue Size and its Acoustic Effects
Xiaohan Zhang, Chongke Bi, Kiyoshi Honda, Wenhuan Lu, Jianguo Wei
Individual Differences of Airflow and Sound Generation in the Vocal Tract of Sibilant /s/
Tsukasa Yoshinaga, Kazunori Nozaki, Shigeo Wada
Hush-Hush Speak: Speech Reconstruction Using Silent Videos
Shashwat Uttam, Yaman Kumar, Dhruva Sahrawat, Mansi Aggarwal, Rajiv Ratn Shah, Debanjan Mahata, Amanda Stent
SPEAK YOUR MIND! Towards Imagined Speech Recognition with Hierarchical Deep Learning
Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels
An Unsupervised Autoregressive Model for Speech Representation Learning
Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass
Harmonic-Aligned Frame Mask Based on Non-Stationary Gabor Transform with Application to Content-Dependent Speaker Comparison
Feng Huang, Peter Balazs
Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual
Gurunath Reddy M., K. Sreenivasa Rao, Partha Pratim Das
Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks
Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio
Excitation Source and Vocal Tract System Based Acoustic Features for Detection of Nasals in Continuous Speech
Bhanu Teja Nellore, Sri Harsha Dumpala, Karan Nathwani, Suryakanth V. Gangashetty
Data Augmentation Using GANs for Speech Emotion Recognition
Aggelina Chatziagapi, Georgios Paraskevopoulos, Dimitris Sgouropoulos, Georgios Pantazopoulos, Malvina Nikandrou, Theodoros Giannakopoulos, Athanasios Katsamanis, Alexandros Potamianos, Shrikanth Narayanan
High Quality, Lightweight and Adaptable TTS Using LPCNet
Zvi Kons, Slava Shechtman, Alex Sorin, Carmel Rabinovitz, Ron Hoory
Towards Achieving Robust Universal Neural Vocoding
Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal
Expediting TTS Synthesis with Adversarial Vocoding
Paarth Neekhara, Chris Donahue, Miller Puckette, Shlomo Dubnov, Julian McAuley
Analysis by Adversarial Synthesis — A Novel Approach for Speech Vocoding
Ahmed Mustafa, Arijit Biswas, Christian Bergler, Julia Schottenhamml, Andreas Maier
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation
Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda
A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
Xiaohai Tian, Eng Siong Chng, Haizhou Li
Survey Talk: When Attention Meets Speech Applications: Speech & Speaker Recognition Perspective
Kyu J. Han, Ramon Prieto, Tao Ma
Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition
Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins, Haishuai Wang, Björn W. Schuller
Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile
Jeng-Lin Li, Chi-Chun Lee
A Saliency-Based Attention LSTM Model for Cognitive Load Classification from Speech
Ascensión Gallardo-Antolín, Juan Manuel Montero
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews
Adria Mallol-Ragolta, Ziping Zhao, Lukas Stappen, Nicholas Cummins, Björn W. Schuller
Untranscribed Web Audio for Low Resource Speech Recognition
Andrea Carmantini, Peter Bell, Steve Renals
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention
Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
Large Margin Training for Attention Based End-to-End Speech Recognition
Peidong Wang, Jia Cui, Chao Weng, Dong Yu
Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition
Khoi-Nguyen C. Mac, Xiaodong Cui, Wei Zhang, Michael Picheny
SparseSpeech: Unsupervised Acoustic Unit Discovery with Memory-Augmented Sequence Autoencoders
Benjamin Milde, Chris Biemann
Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery
Lucas Ondel, Hari Krishna Vydana, Lukáš Burget, Jan Černocký
Speaker Adversarial Training of DPGMM-Based Feature Extractor for Zero-Resource Languages
Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa
Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data
Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen
Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio
Emmanuel Azuh, David Harwath, James Glass
Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Siyuan Feng, Tan Lee
Listeners’ Ability to Identify the Gender of Preadolescent Children in Different Linguistic Contexts
Shawn Nissen, Sharalee Blunck, Anita Dromey, Christopher Dromey
Sibilant Variation in New Englishes: A Comparative Sociophonetic Study of Trinidadian and American English /s(tr)/-Retraction
Wiebke Ahlers, Philipp Meer
Tracking the New Zealand English NEAR/SQUARE Merger Using Functional Principal Components Analysis
Michele Gubian, Jonathan Harrington, Mary Stevens, Florian Schiel, Paul Warren
Phonetic Accommodation in a Wizard-of-Oz Experiment: Intonation and Segments
Iona Gessinger, Bernd Möbius, Bistra Andreeva, Eran Raveh, Ingmar Steiner
PASCAL and DPA: A Pilot Study on Using Prosodic Competence Scores to Predict Communicative Skills for Team Working and Public Speaking
Oliver Niebuhr, Jan Michalsky
Towards the Prosody of Persuasion in Competitive Negotiation. The Relationship Between f0 and Negotiation Success in Same Sex Sales Tasks
Jan Michalsky, Heike Schoormann, Thomas Schultze
VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English
Jacob Sager, Ravi Shankar, Jacob Reinhold, Archana Venkataraman
Building the Singapore English National Speech Corpus
Jia Xin Koh, Aqilah Mislan, Kevin Khoo, Brian Ang, Wilson Ang, Charmaine Ng, Ying-Ying Tan
Challenging the Boundaries of Speech Recognition: The MALACH Corpus
Michael Picheny, Zoltán Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon
NITK Kids’ Speech Corpus
Pravin Bhaskar Ramteke, Sujata Supanekar, Pradyoth Hegde, Hanna Nelson, Venkataraja Aithal, Shashidhar G. Koolagudi
Towards Variability Resistant Dialectal Speech Evaluation
Ahmed Ali, Salam Khalifa, Nizar Habash
How to Annotate 100 Hours in 45 Minutes
Per Fallgren, Zofia Malisz, Jens Edlund
Bayesian HMM Based x-Vector Clustering for Speaker Diarization
Mireia Diez, Lukáš Burget, Shuai Wang, Johan Rohdin, Jan Černocký
Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration
Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka
MCE 2018: The 1st Multi-Target Speaker Detection and Identification Challenge Evaluation
Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System
Zhifu Gao, Yan Song, Ian McLoughlin, Pengcheng Li, Yiheng Jiang, Li-Rong Dai
LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization
Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras
Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings
Joon Son Chung, Bong-Jin Lee, Icksang Han
Multi-PLDA Diarization on Children’s Speech
Jiamin Xie, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur
Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings
Alan McCree, Gregory Sell, Daniel Garcia-Romero
Speaker-Corrupted Embeddings for Online Speaker Diarization
Omid Ghahabi, Volker Fischer
Speaker Diarization with Lexical Information
Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan
Joint Speech Recognition and Speaker Diarization via Sequence Transduction
Laurent El Shafey, Hagen Soltau, Izhak Shafran
Normal Variance-Mean Mixtures for Unsupervised Score Calibration
Sandro Cumani
Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding
Hitoshi Yamamoto, Kong Aik Lee, Koji Okabe, Takafumi Koshinaka
Large-Scale Speaker Diarization of Radio Broadcast Archives
Emre Yılmaz, Adem Derinel, Kun Zhou, Henk van den Heuvel, Niko Brummer, Haizhou Li, David A. van Leeuwen
Toeplitz Inverse Covariance Based Robust Speaker Clustering for Naturalistic Audio Streams
Harishchandra Dubey, Abhijeet Sangwan, John H.L. Hansen
Examining the Combination of Multi-Band Processing and Channel Dropout for Robust Speech Recognition
György Kovács, László Tóth, Dirk Van Compernolle, Marcus Liwicki
Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
Meet Soni, Ashish Panda
Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning
Long Wu, Hangting Chen, Li Wang, Pengyuan Zhang, Yonghong Yan
Full-Sentence Correlation: A Method to Handle Unpredictable Noise for Robust Speech Recognition
Ji Ming, Danny Crookes
Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions
Meet Soni, Sonal Joshi, Ashish Panda
Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition
Shashi Kumar, Shakti P. Rath
End-to-End SpeakerBeam for Single Channel Target Speech Recognition
Marc Delcroix, Shinji Watanabe, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani
NIESR: Nuisance Invariant End-to-End Speech Recognition
I-Hung Hsu, Ayush Jaiswal, Premkumar Natarajan
Knowledge Distillation for Throat Microphone Speech Recognition
Takahito Suzuki, Jun Ogata, Takashi Tsunakawa, Masafumi Nishida, Masafumi Nishimura
Improved Speaker-Dependent Separation for CHiME-5 Challenge
Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu
Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling
Peidong Wang, Ke Tan, DeLiang Wang
Enhanced Spectral Features for Distortion-Independent Acoustic Modeling
Peidong Wang, DeLiang Wang
Universal Adversarial Perturbations for Speech Recognition Systems
Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar
One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features
Masakiyo Fujimoto, Hisashi Kawai
Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition
Bin Liu, Shuai Nie, Shan Liang, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li
Predicting Humor by Learning from Time-Aligned Comments
Zixiaofan Yang, Bingyan Hu, Julia Hirschberg
Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information
Yoan Dinkov, Ahmed Ali, Ivan Koychev, Preslav Nakov
Mitigating Gender and L1 Differences to Improve State and Trait Recognition
Guozhen An, Rivka Levitan
Deep Learning Based Mandarin Accent Identification for Accent Robust ASR
Felix Weninger, Yang Sun, Junho Park, Daniel Willett, Puming Zhan
Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data
Gábor Gosztolya, László Tóth
Conversational and Social Laughter Synthesis with WaveNet
Hiroki Mori, Tomohiro Nagata, Yoshiko Arimoto
Laughter Dynamics in Dyadic Conversations
Bogdan Ludusan, Petra Wagner
Towards an Annotation Scheme for Complex Laughter in Speech Corpora
Khiet P. Truong, Jürgen Trouvain, Michel-Pierre Jansen
Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test
Alice Baird, Shahin Amiriparian, Nicholas Cummins, Sarah Sturmbauer, Johanna Janson, Eva-Maria Messner, Harald Baumeister, Nicolas Rohleder, Björn W. Schuller
Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results
Alice Baird, Eduardo Coutinho, Julia Hirschberg, Björn W. Schuller
Do not Hesitate! — Unless You Do it Shortly or Nasally: How the Phonetics of Filled Pauses Determine Their Subjective Frequency and Perceived Speaker Performance
Oliver Niebuhr, Kerstin Fischer
Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech
J.C. Vásquez-Correa, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth
Code-Switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation
Ching-Ting Chang, Shun-Po Chuang, Hung-Yi Lee
Comparative Analysis of Think-Aloud Methods for Everyday Activities in the Context of Cognitive Robotics
Moritz Meier, Celeste Mason, Felix Putze, Tanja Schultz
RadioTalk: A Large-Scale Corpus of Talk Radio Transcripts
Doug Beeferman, William Brannon, Deb Roy
Qualitative Evaluation of ASR Adaptation in a Lecture Context: Application to the PASTEL Corpus
Salima Mdhaffar, Yannick Estève, Nicolas Hernandez, Antoine Laurent, Richard Dufour, Solen Quiniou
Active Annotation: Bootstrapping Annotation Lexicon and Guidelines for Supervised NLU Learning
Federico Marinelli, Alessandra Cervone, Giuliano Tortoreto, Evgeny A. Stepanov, Giuseppe Di Fabbrizio, Giuseppe Riccardi
Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System
Gerardo Roa Dabike, Jon Barker
Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention
Qiang Huang, Thomas Hain
EpaDB: A Database for Development of Pronunciation Assessment Systems
Jazmín Vidal, Luciana Ferrer, Leonardo Brambilla
Automatic Compression of Subtitles with Neural Networks and its Effect on User Experience
Katrin Angerbauer, Heike Adel, Ngoc Thang Vu
Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering
Hongyin Luo, Mitra Mohtarami, James Glass, Karthik Krishnamurthy, Brigitte Richardson
Early Identification of Speech Changes Due to Amyotrophic Lateral Sclerosis Using Machine Classification
Sarah E. Gutz, Jun Wang, Yana Yunusova, Jordan R. Green
Automatic Detection of Breath Using Voice Activity Detection and SVM Classifier with Application on News Reports
Mohamed Ismail Yasar Arafath K., Aurobinda Routray
Acoustic Scene Classification Using Teacher-Student Learning with Soft-Labels
Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, Ha-Jin Yu
Rare Sound Event Detection Using Deep Learning and Data Augmentation
Yanping Chen, Hongxia Jin
A Combination of Model-Based and Feature-Based Strategy for Speech-to-Singing Alignment
Bidisha Sharma, Haizhou Li
Dr.VOT: Measuring Positive and Negative Voice Onset Time in the Wild
Yosi Shrem, Matthew Goldrick, Joseph Keshet
Effects of Base-Frequency and Spectral Envelope on Deep-Learning Speech Separation and Recognition Models
J. Hui, Y. Wei, S.T. Chen, R.H.Y. So
Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion
Nirmesh J. Shah, Hemant A. Patil
Weakly Supervised Syllable Segmentation by Vowel-Consonant Peak Classification
Ravi Shankar, Archana Venkataraman
An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs
Lukas Mateju, Petr Cerva, Jindrich Zdansky
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks
Zhenyu Tang, John D. Kanu, Kevin Hogan, Dinesh Manocha
Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks
Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou
One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
Ju-chieh Chou, Hung-Yi Lee
One-Shot Voice Conversion with Global Speaker Embeddings
Hui Lu, Zhiyong Wu, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda
StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
Robustness of Statistical Voice Conversion Based on Direct Waveform Modification Against Background Sounds
Yusuke Kurita, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda
Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks
Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma
GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram
Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
Probability Density Distillation with Generative Adversarial Networks for High-Quality Parallel Waveform Generation
Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
One-Shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams
Seyed Hamidreza Mohammadi, Taehwan Kim
Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion
Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang
Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams
Songxiang Liu, Yuewen Cao, Xixin Wu, Lifa Sun, Xunying Liu, Helen Meng
Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech
Li-Wei Chen, Hung-Yi Lee, Yu Tsao
Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion
Shaojin Ding, Ricardo Gutierrez-Osuna
Semi-Supervised Voice Conversion with Amortized Variational Inference
Cory Stephenson, Gokce Keskin, Anil Thomas, Oguz H. Elibol
Exploiting Semi-Supervised Training Through a Dropout Regularization in End-to-End Speech Recognition
Subhadeep Dey, Petr Motlicek, Trung Bui, Franck Dernoncourt
Improved Vocal Tract Length Perturbation for a State-of-the-Art End-to-End Speech Recognition System
Chanwoo Kim, Minkyu Shin, Abhinav Garg, Dhananjaya Gowda
Multi-Accent Adaptation Based on Gate Mechanism
Han Zhu, Li Wang, Pengyuan Zhang, Yonghong Yan
Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition
Pengcheng Guo, Sining Sun, Lei Xie
Cumulative Adaptation for BLSTM Acoustic Models
Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney
Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features
Xurong Xie, Xunying Liu, Tan Lee, Lan Wang
End-to-End Adaptation with Backpropagation Through WFST for On-Device Speech Recognition System
Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura
Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks
Leda Sarı, Samuel Thomas, Mark A. Hasegawa-Johnson
An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models
Khe Chai Sim, Petr Zadrazil, Françoise Beaufays
A Multi-Accent Acoustic Model Using Mixture of Experts for Speech Recognition
Abhinav Jain, Vishwanath P. Singh, Shakti P. Rath
Personalizing ASR for Dysarthric and Accented Speech with Limited Data
Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, Avinatan Hassidim, Yossi Matias
Mitigating Noisy Inputs for Question Answering
Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, Jordan Boyd-Graber
One-vs-All Models for Asynchronous Training: An Empirical Analysis
Rahul Gupta, Aman Alok, Shankar Ananthakrishnan
Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning
Gabriel Marzinotto, Géraldine Damnati, Frédéric Béchet
M2H-GAN: A GAN-Based Mapping from Machine to Human Transcripts for Speech Understanding
Titouan Parcollet, Mohamed Morchid, Xavier Bost, Georges Linarès
Ultra-Compact NLU: Neuronal Network Binarization as Regularization
Munir Georges, Krzysztof Czarnowski, Tobias Bocklet
Speech Model Pre-Training for End-to-End Spoken Language Understanding
Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio
Spoken Language Intent Detection Using Confusion2Vec
Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis Georgiou
Investigating Adaptation and Transfer Learning for End-to-End Spoken Language Understanding from Speech
Natalia Tomashenko, Antoine Caubrière, Yannick Estève
Topic-Aware Dialogue Speech Recognition with Transfer Learning
Yuanfeng Song, Di Jiang, Xueyang Wu, Qian Xu, Raymond Chi-Wing Wong, Qiang Yang
Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models
Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hosana Kamiyama, Takanobu Oba, Satoshi Kobashikawa, Yushi Aono
Meta Learning for Hyperparameter Optimization in Dialogue System
Jen-Tzung Chien, Wei Xiang Lieow
Zero Shot Intent Classification Using Long-Short Term Memory Networks
Kyle Williams
A Comparison of Deep Learning Methods for Language Understanding
Mandy Korpusik, Zoe Liu, James Glass
Slot Filling with Weighted Multi-Encoders for Out-of-Domain Values
Yuka Kobayashi, Takami Yoshida, Kenji Iwata, Hiroshi Fujimura
Multi-Corpus Acoustic-to-Articulatory Speech Inversion
Nadee Seneviratne, Ganesh Sivaraman, Carol Espy-Wilson
Towards a Speaker Independent Speech-BCI Using Speaker Adaptation
Debadatta Dash, Alan Wisler, Paul Ferrari, Jun Wang
Identifying Input Features for Development of Real-Time Translation of Neural Signals to Text
Janaki Sheth, Ariel Tankus, Michelle Tran, Lindy Comstock, Itzhak Fried, William Speier
Exploring Critical Articulator Identification from 50Hz RT-MRI Data of the Vocal Tract
Samuel Silva, António Teixeira, Conceição Cunha, Nuno Almeida, Arun A. Joseph, Jens Frahm
Towards a Method of Dynamic Vocal Tract Shapes Generation by Combining Static 3D and Dynamic 2D MRI Speech Data
Ioannis K. Douros, Anastasiia Tsukanova, Karyna Isaieva, Pierre-André Vuissoz, Yves Laprie
Temporal Coordination of Articulatory and Respiratory Events Prior to Speech Initiation
Oksana Rasskazova, Christine Mooshammer, Susanne Fuchs
Zooming in on Spatiotemporal V-to-C Coarticulation with Functional PCA
Michele Gubian, Manfred Pastätter, Marianne Pouplier
Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder
Tamás Gábor Csapó, Mohammed Salah Al-Radhi, Géza Németh, Gábor Gosztolya, Tamás Grósz, László Tóth, Alexandra Markó
Assessing Acoustic and Articulatory Dimensions of Speech Motor Adaptation with Random Forests
Eugen Klein, Jana Brunner, Phil Hoole
Speech Organ Contour Extraction Using Real-Time MRI and Machine Learning Method
Hironori Takemoto, Tsubasa Goto, Yuya Hagihara, Sayaka Hamanaka, Tatsuya Kitamura, Yukiko Nota, Kikuo Maekawa
CNN-Based Phoneme Classifier from Vocal Tract MRI Learns Embedding Consistent with Articulatory Topology
K.G. van Leeuwen, P. Bos, S. Trebeschi, M.J.A. van Alphen, L. Voskuilen, L.E. Smeele, F. van der Heijden, R.J.J.H. van Son
Strength and Structure: Coupling Tones with Oral Constriction Gestures
Doris Mücke, Anne Hermes, Sam Tilsen
Salient Speech Representations Based on Cloned Networks
W. Bastiaan Kleijn, Felicia S.C. Lim, Michael Chinen, Jan Skoglund
ASR Inspired Syllable Stress Detection for Pronunciation Evaluation Without Using a Supervised Classifier and Syllable Level Features
Manoj Kumar Ramanathi, Chiranjeevi Yarra, Prasanta Kumar Ghosh
Acoustic and Articulatory Feature Based Speech Rate Estimation Using a Convolutional Dense Neural Network
Renuka Mannem, Jhansi Mallela, Aravind Illa, Prasanta Kumar Ghosh
Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics
Sebastian Springenberg, Egor Lakomkin, Cornelius Weber, Stefan Wermter
Unsupervised Low-Rank Representations for Speech Emotion Recognition
Georgios Paraskevopoulos, Efthymios Tzinis, Nikolaos Ellinas, Theodoros Giannakopoulos, Alexandros Potamianos
On the Suitability of the Riesz Spectro-Temporal Envelope for WaveNet Based Speech Synthesis
Jitendra Kumar Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula
Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition
Xinzhou Xu, Jun Deng, Nicholas Cummins, Zixing Zhang, Li Zhao, Björn W. Schuller
An Improved Goodness of Pronunciation (GoP) Measure for Pronunciation Evaluation with DNN-HMM System Considering HMM Transition Probabilities
Sweekar Sudhakara, Manoj Kumar Ramanathi, Chiranjeevi Yarra, Prasanta Kumar Ghosh
Low Resource Automatic Intonation Classification Using Gated Recurrent Unit (GRU) Networks Pre-Trained with Synthesized Pitch Patterns
Atreyee Saha, Chiranjeevi Yarra, Prasanta Kumar Ghosh
Apkinson: A Mobile Solution for Multimodal Assessment of Patients with Parkinson’s Disease
J.C. Vásquez-Correa, T. Arias-Vergara, Philipp Klumpp, M. Strauss, A. Küderle, N. Roth, S. Bayerl, N. García-Ospina, P.A. Perez-Toro, L.F. Parra-Gallego, Cristian David Rios-Urrego, D. Escobar-Grisales, Juan Rafael Orozco-Arroyave, B. Eskofier, Elmar Nöth
Depression State Assessment: Application for Detection of Depression by Speech
Gábor Kiss, Dávid Sztahó, Klára Vicsi
SPIRE-fluent: A Self-Learning App for Tutoring Oral Fluency to Second Language English Learners
Chiranjeevi Yarra, Aparna Srinivasan, Sravani Gottimukkala, Prasanta Kumar Ghosh
Using Real-Time Visual Biofeedback for Second Language Instruction
Shawn Nissen, Rebecca Nissen
Splash: Speech and Language Assessment in Schools and Homes
A. Miwardelli, I. Gallagher, J. Gibson, N. Katsos, Kate M. Knill, H. Wood
Using Ultrasound Imaging to Create Augmented Visual Biofeedback for Articulatory Practice
Colin T. Annand, Maurice Lamb, Sarah Dugan, Sarah R. Li, Hannah M. Woeste, T. Douglas Mast, Michael A. Riley, Jack A. Masterson, Neeraja Mahalingam, Kathryn J. Eary, Caroline Spencer, Suzanne Boyce, Stephanie Jackson, Anoosha Baxi, Reneé Seward
Speech-Based Web Navigation for Limited Mobility Users
Vasiliy Radostev, Serge Berger, Justin Tabrizi, Pasha Kamyshev, Hisami Suzuki
The Second DIHARD Diarization Challenge: Dataset, Task, and Baselines
Neville Ryant, Kenneth Church, Christopher Cieri, Alejandrina Cristia, Jun Du, Sriram Ganapathy, Mark Liberman
LEAP Diarization System for the Second DIHARD Challenge
Prachi Singh, Harsha Vardhan M.A., Sriram Ganapathy, A. Kanagasundaram
ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge
Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel, Eduardo Lleida
UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge
Zbyněk Zajíc, Marie Kunešová, Marek Hrúz, Jan Vaněk
The Second DIHARD Challenge: System Description for USC-SAIL Team
Tae Jin Park, Manoj Kumar, Nikolaos Flemotomos, Monisankha Pal, Raghuveer Peri, Rimita Lahiri, Panayiotis Georgiou, Shrikanth Narayanan
Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Anastasia Avdeeva, Artem Gorlanov, Alexandr Kozlov
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection
Massimiliano Todisco, Xin Wang, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi H. Kinnunen, Kong Aik Lee
ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks
Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak
Ensemble Models for Spoofing Detection in Automatic Speaker Verification
Bhusan Chettri, Daniel Stoller, Veronica Morfi, Marco A. Martínez Ramírez, Emmanouil Benetos, Bob L. Sturm
The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion
Weicheng Cai, Haiwei Wu, Danwei Cai, Ming Li
Robust Bayesian and Light Neural Networks for Voice Spoofing Detection
Radosław Białobrzeski, Michał Kośmider, Mateusz Matuszewski, Marcin Plata, Alexander Rakowski
STC Antispoofing Systems for the ASVspoof2019 Challenge
Galina Lavrentyeva, Sergey Novoselov, Andzhukaev Tseren, Marina Volkova, Artem Gorlanov, Alexandr Kozlov
The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge
Yexin Yang, Hongji Wang, Heinrich Dinkel, Zhengyang Chen, Shuai Wang, Yanmin Qian, Kai Yu
IIIT-H Spoofing Countermeasures for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2019
K.N.R.K. Raju Alluri, Anil Kumar Vuppala
Anti-Spoofing Speaker Verification System with Multi-Feature Integration and Multi-Task Learning
Rongjin Li, Miao Zhao, Zheng Li, Lin Li, Qingyang Hong
Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features
Jennifer Williams, Joanna Rownicka
Long Range Acoustic Features for Spoofed Speech Detection
Rohan Kumar Das, Jichen Yang, Haizhou Li
Transfer-Representation Learning for Detecting Spoofing Attacks with Converted and Synthesized Speech in Automatic Speaker Verification System
Su-Yu Chang, Kai-Cheng Wu, Chia-Ping Chen
A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection
Alejandro Gomez-Alanis, Antonio M. Peinado, Jose A. Gonzalez, Angel M. Gomez
Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge
Hossein Zeinali, Themos Stafylakis, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukáš Burget, Jan Černocký
Deep Residual Neural Networks for Audio Spoofing Detection
Moustafa Alzantot, Ziqi Wang, Mani B. Srivastava
Replay Attack Detection with Complementary High-Resolution Information Using End-to-End DNN for the ASVspoof 2019 Challenge
Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
The Zero Resource Speech Challenge 2019: TTS Without T
Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux
Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling
Siyuan Feng, Tan Lee, Zhiyuan Peng
Temporally-Aware Acoustic Unit Discovery for Zerospeech 2019 Challenge
Bolaji Yusuf, Alican Gök, Batuhan Gundogdu, Oyku Deniz Kose, Murat Saraclar
Unsupervised Acoustic Unit Discovery for Speech Synthesis Using Discrete Latent-Variable Neural Networks
Ryan Eloff, André Nortje, Benjamin van Niekerk, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper
Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion
Andy T. Liu, Po-chun Hsu, Hung-Yi Lee
Zero Resource Speech Synthesis Using Transcripts Derived from Perceptual Acoustic Units
Karthik Pandia D. S., Hema A. Murthy
VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019
Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura
Survey Talk: A Survey on Speech Translation
Jan Niehues
Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model
Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu
End-to-End Speech Translation with Knowledge Distillation
Yuchen Liu, Hao Xiong, Jiajun Zhang, Zhongjun He, Hua Wu, Haifeng Wang, Chengqing Zong
Adapting Transformer to End-to-End Spoken Language Translation
Mattia A. Di Gangi, Matteo Negri, Marco Turchi
Unsupervised Phonetic and Word Level Discovery for Speech to Speech Translation for Unwritten Languages
Steven Hillis, Anushree Prasanna Kumar, Alan W. Black
Deep Speaker Recognition: Modular or Monolithic?
Gautam Bhattacharya, Jahangir Alam, Patrick Kenny
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction
Shuai Wang, Johan Rohdin, Lukáš Burget, Oldřich Plchot, Yanmin Qian, Kai Yu, Jan Černocký
Learning Speaker Representations with Mutual Information
Mirco Ravanelli, Yoshua Bengio
Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification
Lanhua You, Wu Guo, Li-Rong Dai, Jun Du
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification
Zhanghao Wu, Shuai Wang, Yanmin Qian, Kai Yu
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
Lanhua You, Wu Guo, Li-Rong Dai, Jun Du
Neural Transition Systems for Modeling Hierarchical Semantic Representations
Riyaz Bhat, John Chen, Rashmi Prasad, Srinivas Bangalore
Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding
Vedran Vukotić, Christian Raymond
Iterative Delexicalization for Improved Spoken Language Understanding
Avik Ray, Yilin Shen, Hongxia Jin
End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios
Swapnil Bhosale, Imran Sheikh, Sri Harsha Dumpala, Sunil Kumar Kopparapu
Recognition of Intentions of Users’ Short Responses for Conversational News Delivery System
Hiroaki Takatsu, Katsuya Yokoyama, Yoichi Matsuyama, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi
Curriculum-Based Transfer Learning for an Effective End-to-End Spoken Language Understanding and Domain Portability
Antoine Caubrière, Natalia Tomashenko, Antoine Laurent, Emmanuel Morin, Nathalie Camelin, Yannick Estève
Spatial and Spectral Fingerprint in the Brain: Speaker Identification from Single Trial MEG Signals
Debadatta Dash, Paul Ferrari, Jun Wang
ERP Signal Analysis with Temporal Resolution Using a Time Window Bank
Annika Nijveld, L. ten Bosch, Mirjam Ernestus
Phase Synchronization Between EEG Signals as a Function of Differences Between Stimuli Characteristics
L. ten Bosch, K. Mulder, L. Boves
The Processing of Prosodic Cues to Rhetorical Question Interpretation: Psycholinguistic and Neurolinguistics Evidence
Mariya Kharaman, Manluolan Xu, Carsten Eulitz, Bettina Braun
The Neural Correlates Underlying Lexically-Guided Perceptual Learning
Odette Scharenborg, Jiska Koemans, Cybelle Smith, Mark A. Hasegawa-Johnson, Kara D. Federmeier
Speech Quality Evaluation of Synthesized Japanese Speech Using EEG
Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Shinnosuke Takamichi, Satoshi Nakamura
Multi-Microphone Adaptive Noise Cancellation for Robust Hotword Detection
Yiteng Huang, Turaj Z. Shabestary, Alexander Gruenstein, Li Wan
Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition
Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma
R-Vectors: New Technique for Adaptation to Room Acoustics
Yuri Khokhlov, Alexander Zatvornitskiy, Ivan Medennikov, Ivan Sorokin, Tatiana Prisyach, Aleksei Romanenko, Anton Mitrofanov, Vladimir Bataev, Andrei Andrusenko, Mariya Korenevskaya, Oleg Petrov
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR
Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach
Unsupervised Training of Neural Mask-Based Beamforming
Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach
Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge
Feng Ma, Li Chai, Jun Du, Diyuan Liu, Zhongfu Ye, Chin-Hui Lee
Survey Talk: End-to-End Deep Neural Network Based Speaker and Language Recognition
Ming Li, Weicheng Cai, Danwei Cai
Attention Based Hybrid i-Vector BLSTM Model for Language Recognition
Bharat Padi, Anand Mohan, Sriram Ganapathy
RawNet: Advanced End-to-End Deep Neural Network Using Raw Waveforms for Text-Independent Speaker Verification
Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu
Target Speaker Extraction for Multi-Talker Speaker Verification
Wei Rao, Chenglin Xu, Eng Siong Chng, Haizhou Li
Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale
Hanna Mazzawi, Xavi Gonzalvo, Aleks Kracun, Prashant Sridhar, Niranjan Subrahmanya, Ignacio Lopez Moreno, Hyun Jin Park, Patrick Violette
Forward-Backward Decoding for Regularizing End-to-End TTS
Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jianhua Tao
A New GAN-Based End-to-End TTS Training Algorithm
Haohan Guo, Frank K. Soong, Lei He, Lei Xie
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
Mutian He, Yan Deng, Lei He
Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet
Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi
Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora
Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa
Real-Time Neural Text-to-Speech with Sequence-to-Sequence Acoustic Model and WaveGlow or Single Gaussian WaveRNN Vocoders
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Fusion Strategy for Prosodic and Lexical Representations of Word Importance
Sushant Kafle, Cecilia Ovesdotter Alm, Matt Huenerfauth
Self Attention in Variational Sequential Learning for Summarization
Jen-Tzung Chien, Chun-Wei Wang
Multi-Modal Sentiment Analysis Using Deep Canonical Correlation Analysis
Zhongkai Sun, Prathusha K. Sarma, William Sethares, Erik P. Bucy
Interpreting and Improving Deep Neural SLU Models via Vocabulary Importance
Yilin Shen, Wenhu Chen, Hongxia Jin
Assessing the Semantic Space Bias Caused by ASR Error Propagation and its Effect on Spoken Document Summarization
Máté Ákos Tündik, Valér Kaszás, György Szaszák
Latent Topic Attention for Domain Classification
Peisong Huang, Peijie Huang, Wencheng Ai, Jiande Ding, Jinchuan Zhang
A Unified Bayesian Source Modelling for Determined Blind Source Separation
Chaitanya Narisetty
Recursive Speech Separation for Unknown Number of Speakers
Naoya Takahashi, Sudarsanam Parthasaarathy, Nabarun Goswami, Yuki Mitsufuji
Practical Applicability of Deep Neural Networks for Overlapping Speaker Separation
Pieter Appeltans, Jeroen Zegers, Hugo Van hamme
Speech Separation Using Independent Vector Analysis with an Amplitude Variable Gaussian Mixture Model
Zhaoyi Gu, Jing Lu, Kai Chen
Improved Speech Separation with Time-and-Frequency Cross-Domain Joint Embedding and Clustering
Gene-Ping Yang, Chao-I Tuan, Hung-Yi Lee, Lin-shan Lee
WHAM!: Extending Speech Separation to Noisy Environments
Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux
Survey Talk: Preserving Privacy in Speaker and Speech Characterisation
Andreas Nautsch
Evaluating Near End Listening Enhancement Algorithms in Realistic Environments
Carol Chermaz, Cassia Valentini-Botinhao, Henning Schepker, Simon King
Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation
Amin Edraki, Wai-Yip Chan, Jesper Jensen, Daniel Fogerty
Listener Preference on the Local Criterion for Ideal Binary-Masked Speech
Zhuohuang Zhang, Yi Shen
Using a Manifold Vocoder for Spectral Voice and Style Conversion
Tuan Dinh, Alexander Kain, Kris Tjaden
Multi-Span Acoustic Modelling Using Raw Waveform Signals
P. von Platen, Chao Zhang, P.C. Woodland
An Analysis of Local Monotonic Attention Variants
André Merboldt, Albert Zeyer, Ralf Schlüter, Hermann Ney
Layer Trajectory BLSTM
Eric Sun, Jinyu Li, Yifan Gong
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration
Shigeki Karita, Nelson Enrique Yalta Soplin, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani
Trainable Dynamic Subsampling for End-to-End Speech Recognition
Shucong Zhang, Erfan Loweimi, Yumo Xu, Peter Bell, Steve Renals
Shallow-Fusion End-to-End Contextual Biasing
Ding Zhao, Tara N. Sainath, David Rybach, Pat Rondon, Deepti Bhatia, Bo Li, Ruoming Pang
Modeling Interpersonal Linguistic Coordination in Conversations Using Word Mover’s Distance
Md. Nasir, Sandeep Nallan Chakravarthula, Brian R.W. Baucom, David C. Atkins, Panayiotis Georgiou, Shrikanth Narayanan
Bag-of-Acoustic-Words for Mental Health Assessment: A Deep Autoencoding Approach
Wenchao Du, Louis-Philippe Morency, Jeffrey Cohn, Alan W. Black
Objective Assessment of Social Skills Using Automated Language Analysis for Identification of Schizophrenia and Bipolar Disorder
Rohit Voleti, Stephanie Woolridge, Julie M. Liss, Melissa Milanovic, Christopher R. Bowie, Visar Berisha
Into the Wild: Transitioning from Recognizing Mood in Clinical Interactions to Personal Conversations for Individuals with Bipolar Disorder
Katie Matton, Melvin G. McInnis, Emily Mower Provost
Detecting Depression with Word-Level Multimodal Fusion
Morteza Rohanian, Julian Hough, Matthew Purver
Assessing Neuromotor Coordination in Depression Using Inverted Vocal Tract Variables
Carol Espy-Wilson, Adam C. Lammert, Nadee Seneviratne, Thomas F. Quatieri
Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues
Shachi Paul, Rahul Goel, Dilek Hakkani-Tür
HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking
Rahul Goel, Shachi Paul, Dilek Hakkani-Tür
Multi-Lingual Dialogue Act Recognition with Deep Learning Methods
Jiří Martínek, Pavel Král, Ladislav Lenc, Christophe Cerisara
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer
Guan-Lin Chao, Ian Lane
Discovering Dialog Rules by Means of an Evolutionary Approach
David Griol, Zoraida Callejas
Active Learning for Domain Classification in a Commercial Spoken Personal Assistant
Xi C. Chen, Adithya Sagar, Justine T. Kao, Tony Y. Li, Christopher Klein, Stephen Pulman, Ashish Garg, Jason D. Williams
The 2018 NIST Speaker Recognition Evaluation
Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Douglas Reynolds, Lisa Mason, Jaime Hernandez-Cordero
State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18
Jesús Villalba, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Fred Richardson, Suwon Shon, François Grondin, Réda Dehak, Leibny Paola García-Perera, Daniel Povey, Pedro A. Torres-Carrasquillo, Sanjeev Khudanpur, Najim Dehak
x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition
Daniel Garcia-Romero, David Snyder, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
Kong Aik Lee, Ville Hautamäki, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco
Pindrop Labs’ Submission to the First Multi-Target Speaker Detection and Identification Challenge
Elie Khoury, Khaled Lakhdhar, Andrew Vaughan, Ganesh Sivaraman, Parav Nagarsheth
Speaker Recognition Benchmark Using the CHiME-5 Corpus
Daniel Garcia-Romero, David Snyder, Shinji Watanabe, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur
Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems
David Ayllón, Héctor A. Sánchez-Hevia, Carol Figueroa, Pierre Lanchantin
Selection and Training Schemes for Improving TTS Voice Built on Found Data
F.-Y. Kuo, I.C. Ouyang, S. Aryal, Pierre Lanchantin
All Together Now: The Living Audio Dataset
David A. Braude, Matthew P. Aylett, Caoimhín Laoide-Kemp, Simone Ashby, Kristen M. Scott, Brian Ó Raghallaigh, Anna Braudo, Alex Brouwer, Adriana Stan
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu
Corpus Design Using Convolutional Auto-Encoder Embeddings for Audio-Book Synthesis
Meysam Shamsi, Damien Lolive, Nelly Barbot, Jonathan Chevelu
Evaluating Intention Communication by TTS Using Explicit Definitions of Illocutionary Act Performance
Nobukatsu Hojo, Noboru Miyazaki
MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion
Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang
Investigating the Robustness of Sequence-to-Sequence Text-to-Speech Models to Imperfectly-Transcribed Training Data
Jason Fong, Pilar Oplustil Gallegos, Zack Hodari, Simon King
Using Pupil Dilation to Measure Cognitive Load When Listening to Text-to-Speech in Quiet and in Noise
Avashna Govender, Anita E. Wagner, Simon King
A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research
Ioannis K. Douros, Jacques Felblinger, Jens Frahm, Karyna Isaieva, Arun A. Joseph, Yves Laprie, Freddy Odille, Anastasiia Tsukanova, Dirk Voit, Pierre-André Vuissoz
A Chinese Dataset for Identifying Speakers in Novels
Jia-Xiang Chen, Zhen-Hua Ling, Li-Rong Dai
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Kyubyong Park, Thomas Mulc
Attention Model for Articulatory Features Detection
Ievgen Karaulov, Dmytro Tkanov
Unbiased Semi-Supervised LF-MMI Training Using Dropout
Sibo Tong, Apoorv Vyas, Philip N. Garner, Hervé Bourlard
Acoustic Model Optimization Based on Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition
Xiaodong Cui, Michael Picheny
Whether to Pretrain DNN or not?: An Empirical Analysis for Voice Conversion
Nirmesh J. Shah, Hardik B. Sailor, Hemant A. Patil
Detection of Glottal Closure Instants from Raw Speech Using Convolutional Neural Networks
Mohit Goyal, Varun Srivastava, Prathosh A.P.
Lattice-Based Lightly-Supervised Acoustic Model Training
Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell
Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR
Wilfried Michel, Ralf Schlüter, Hermann Ney
End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders
Ryo Masumura, Hiroshi Sato, Tomohiro Tanaka, Takafumi Moriya, Yusuke Ijima, Takanobu Oba
Char+CV-CTC: Combining Graphemes and Consonant/Vowel Units for CTC-Based ASR Using Multitask Learning
Abdelwahab Heba, Thomas Pellegrini, Jean-Pierre Lorré, Régine Andre-Obrecht
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation
Gakuto Kurata, Kartik Audhkhasi
Direct Neuron-Wise Fusion of Cognate Neural Networks
Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata
Two Tiered Distributed Training Algorithm for Acoustic Modeling
Pranav Ladkat, Oleg Rybakov, Radhika Arava, Sree Hari Krishnan Parthasarathi, I-Fan Chen, Nikko Strom
Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR
Pin-Tuan Huang, Hung-Shin Lee, Syu-Siang Wang, Kuan-Yu Chen, Yu Tsao, Hsin-Min Wang
Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition
Gakuto Kurata, Kartik Audhkhasi
Framewise Supervised Training Towards End-to-End Speech Recognition Models: First Results
Mohan Li, Yuanjiang Cao, Weicong Zhou, Min Liu
Deep Hierarchical Fusion with Application in Sentiment Analysis
Efthymios Georgiou, Charilaos Papaioannou, Alexandros Potamianos
Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice
Vikramjit Mitra, Sue Booker, Erik Marchi, David Scott Farrar, Ute Dorothea Peitz, Bridget Cheng, Ermine Teves, Anuj Mehta, Devang Naik
Analysis of Deep Learning Architectures for Cross-Corpus Speech Emotion Recognition
Jack Parry, Dimitri Palaz, Georgia Clarke, Pauline Lecomte, Rebecca Mead, Michael Berger, Gregor Hofer
A Path Signature Approach for Speech Emotion Recognition
Bo Wang, Maria Liakata, Hao Ni, Terry Lyons, Alejo J. Nevado-Holgado, Kate Saunders
Employing Bottleneck and Convolutional Features for Speech-Based Physical Load Detection on Limited Data Amounts
Olga Egorow, Tarik Mrech, Norman Weißkirchen, Andreas Wendemuth
Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling
Jinming Zhao, Shizhe Chen, Jingjun Liang, Qin Jin
Predicting Group Performances Using a Personality Composite-Network Architecture During Collaborative Task
Shun-Chang Zhong, Yun-Shao Lin, Chun-Min Chang, Yi-Ching Liu, Chi-Chun Lee
Enforcing Semantic Consistency for Cross Corpus Valence Regression from Speech Using Adversarial Discrepancy Learning
Gao-Yi Chao, Yun-Shao Lin, Chun-Min Chang, Chi-Chun Lee
Deep Learning of Segment-Level Feature Representation with Multiple Instance Learning for Utterance-Level Speech Emotion Recognition
Shuiyang Mao, P.C. Ching, Tan Lee
Towards Robust Speech Emotion Recognition Using Deep Residual Networks for Speech Enhancement
Andreas Triantafyllopoulos, Gil Keren, Johannes Wagner, Ingmar Steiner, Björn W. Schuller
Towards Discriminative Representations and Unbiased Predictions: Class-Specific Angular Softmax for Speech Emotion Recognition
Zhixuan Li, Liang He, Jingyang Li, Li Wang, Wei-Qiang Zhang
Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition
Md. Asif Jalal, Erfan Loweimi, Roger K. Moore, Thomas Hain
L2 Pronunciation Accuracy and Context: A Pilot Study on the Realization of Geminates in Italian as L2 by French Learners
Sonia d’Apolito, Barbara Gili Fivela
The Monophthongs of Formal Nigerian English: An Acoustic Analysis
Nisad Jamakovic, Robert Fuchs
Quantifying Fundamental Frequency Modulation as a Function of Language, Speaking Style and Speaker
Pablo Arantes, Anders Eriksson
The Voicing Contrast in Stops and Affricates in the Western Armenian of Lebanon
Niamh E. Kelly, Lara Keshishian
“ Gra[f] e!” Word-Final Devoicing of Obstruents in Standard French: An Acoustic Study Based on Large Corpora
Adèle Jatteau, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker, Nicolas Audibert
Acoustic Indicators of Deception in Mandarin Daily Conversations Recorded from an Interactive Game
Chih-Hsiang Huang, Huang-Cheng Chou, Yi-Tong Wu, Chi-Chun Lee, Yi-Wen Liu
Prosodic Effects on Plosive Duration in German and Austrian German
Barbara Schuppler, Margaret Zellers
Cross-Lingual Consistency of Phonological Features: An Empirical Study
Cibu Johny, Alexander Gutkin, Martin Jansche
Are IP Initial Vowels Acoustically More Distinct? Results from LDA and CNN Classifications
Fanny Guitard-Ivent, Gabriele Chignoli, Cécile Fougeron, Laurianne Georgeton
Neural Network-Based Modeling of Phonetic Durations
Xizi Wei, Melvyn Hunt, Adrian Skilling
An Acoustic Study of Vowel Undershoot in a System with Several Degrees of Prominence
Janina Mołczanow, Beata Łukaszewicz, Anna Łukaszewicz
A Preliminary Study of Charismatic Speech on YouTube: Correlating Prosodic Variation with Counts of Subscribers, Views and Likes
Stephanie Berger, Oliver Niebuhr, Margaret Zellers
Phonetic Detail Encoding in Explaining the Size of Speech Planning Window
Shan Luo
Acoustic Cues to Topic and Narrow Focus in Egyptian Arabic
Dina El Zarka, Barbara Schuppler, Francesco Cangemi
Acoustic and Articulatory Study of Ewe Vowels: A Comparative Study of Male and Female
Kowovi Comivi Alowonou, Jianguo Wei, Wenhuan Lu, Zhicheng Liu, Kiyoshi Honda, Jianwu Dang
Speech Augmentation via Speaker-Specific Noise in Unseen Environment
Ya’nan Guo, Ziping Zhao, Yide Ma, Björn W. Schuller
UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-Noise Ratio Condition
Xiang Hao, Xiangdong Su, Zhiyu Wang, Hui Zhang, Batushiren
Towards Generalized Speech Enhancement with Generative Adversarial Networks
Santiago Pascual, Joan Serrà, Antonio Bonafonte
A Convolutional Neural Network with Non-Local Module for Speech Enhancement
Xiaoqi Li, Yaxing Li, Meng Li, Shan Xu, Yuanjie Dong, Xinrong Sun, Shengwu Xiong
IA-NET: Acceleration and Compression of Speech Enhancement Using Integer-Adder Deep Neural Network
Yu-Chen Lin, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo
KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement
Li Chai, Jun Du, Chin-Hui Lee
Speech Enhancement with Wide Residual Networks in Reverberant Environments
Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega, Eduardo Lleida
A Scalable Noisy Speech Dataset and Online Subjective Test Framework
Chandan K.A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke
Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN
Nagaraj Adiga, Yannis Pantazis, Vassilis Tsiaras, Yannis Stylianou
A Non-Causal FFTNet Architecture for Speech Enhancement
Muhammed Shifas P.V., Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou
Speech Enhancement with Variance Constrained Autoencoders
D.T. Braithwaite, W. Bastiaan Kleijn
A Deep Learning Approach to Automatic Characterisation of Rhythm in Non-Native English Speech
Konstantinos Kyriakopoulos, Kate M. Knill, Mark J.F. Gales
Language Learning Using Speech to Image Retrieval
Danny Merkx, Stefan L. Frank, Mirjam Ernestus
Using Alexa for Flashcard-Based Learning
Lucy Skidmore, Roger K. Moore
The 2019 Inaugural Fearless Steps Challenge: A Giant Leap for Naturalistic Audio
John H.L. Hansen, Aditya Joglekar, Meena Chandra Shekhar, Vinay Kothapally, Chengzhu Yu, Lakshmish Kaushik, Abhijeet Sangwan
Completely Unsupervised Phoneme Recognition by a Generative Adversarial Network Harmonized with Iteratively Refined Hidden Markov Models
Kuan-Yu Chen, Che-Ping Tsai, Da-Rong Liu, Hung-Yi Lee, Lin-shan Lee
Analysis of Native Listeners’ Facial Microexpressions While Shadowing Non-Native Speech — Potential of Shadowers’ Facial Expressions for Comprehensibility Prediction
Tasavat Trisitichoke, Shintaro Ando, Daisuke Saito, Nobuaki Minematsu
Transparent Pronunciation Scoring Using Articulatorily Weighted Phoneme Edit Distance
Reima Karhila, Anna-Riikka Smolander, Sari Ylinen, Mikko Kurimo
Development of Robust Automated Scoring Models Using Adversarial Input for Oral Proficiency Assessment
Su-Youn Yoon, Chong Min Lee, Klaus Zechner, Keelan Evanini
Impact of ASR Performance on Spoken Grammatical Error Detection
Y. Lu, Mark J.F. Gales, Kate M. Knill, P. Manakul, L. Wang, Y. Wang
Self-Imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training
Seung Hee Yang, Minhwa Chung
Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog
Chiori Hori, Anoop Cherian, Tim K. Marks, Takaaki Hori
Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations
Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, Dilek Hakkani-Tür
Analyzing Verbal and Nonverbal Features for Predicting Group Performance
Uliyana Kubasova, Gabriel Murray, McKenzie Braley
Identifying Therapist and Client Personae for Therapeutic Alliance Estimation
Victor R. Martinez, Nikolaos Flemotomos, Victor Ardulov, Krishna Somandepalli, Simon B. Goldberg, Zac E. Imel, David C. Atkins, Shrikanth Narayanan
Do Hesitations Facilitate Processing of Partially Defective System Utterances? An Exploratory Eye Tracking Study
Kristin Haake, Sarah Schimke, Simon Betz, Sina Zarrieß
Influence of Contextuality on Prosodic Realization of Information Structure in Chinese Dialogues
Bin Li, Yuan Jia
Cross-Lingual Transfer Learning for Affective Spoken Dialogue Systems
Kristijan Gjoreski, Aleksandar Gjoreski, Ivan Kraljevski, Diane Hirschfeld
Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue
Mingzhi Yu, Emer Gilmartin, Diane Litman
Identifying Mood Episodes Using Dialogue Features from Clinical Interviews
Zakaria Aldeneh, Mimansa Jaiswal, Michael Picheny, Melvin G. McInnis, Emily Mower Provost
Do Conversational Partners Entrain on Articulatory Precision?
Nichola Lubold, Stephanie A. Borrie, Tyson S. Barrett, Megan Willi, Visar Berisha
Conversational Emotion Analysis via Attention Mechanisms
Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang
The Effect of Phoneme Distribution on Perceptual Similarity in English
Emma O’Neill, Julie Carson-Berndsen
Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features
Sofoklis Kakouros, Antti Suni, Juraj Šimko, Martti Vainio
Compensation for French Liquid Deletion During Auditory Sentence Processing
Sharon Peperkamp, Alvaro Martin Iturralde Zurita
Prosodic Factors Influencing Vowel Reduction in Russian
Daniil Kocharov, Tatiana Kachkovskaia, Pavel Skrelin
Time to Frequency Domain Mapping of the Voice Source: The Influence of Open Quotient and Glottal Skew on the Low End of the Source Spectrum
Christer Gobl, Ailbhe Ní Chasaide
Testing the Distinctiveness of Intonational Tunes: Evidence from Imitative Productions in American English
Eleanor Chodroff, Jennifer S. Cole
A Study of a Cross-Language Perception Based on Cortical Analysis Using Biomimetic STRFs
Sangwook Park, David K. Han, Mounya Elhilali
Perceptual Evaluation of Early versus Late F0 Peaks in the Intonation Structure of Czech Question-Word Questions
Pavel Šturm, Jan Volín
Acoustic Correlates of Phonation Type in Chichimec
Anneliese Kelterer, Barbara Schuppler
F0 Variability Measures Based on Glottal Closure Instants
Yu-Ren Chien, Michal Borský, Jón Guðnason
Recognition of Creaky Voice from Emergency Calls
Lauri Tavi, Tanel Alumäe, Stefan Werner
Direct F0 Estimation with Neural-Network-Based Regression
Shuzhuang Xu, Hiroshi Shimodaira
Real Time Online Visual End Point Detection Using Unidirectional LSTM
Tanay Sharma, Rohith Chandrashekar Aralikatti, Dilip Kumar Margam, Abhinav Thanda, Sharad Roy, Pujitha Appan Kandala, Shankar M. Venkatesan
Fully-Convolutional Network for Pitch Estimation of Speech Signals
Luc Ardaillon, Axel Roebel
Vocal Pitch Extraction in Polyphonic Music Using Convolutional Residual Network
Mingye Dong, Jie Wu, Jian Luan
Multi-Level Adaptive Speech Activity Detector for Speech in Naturalistic Environments
Bidisha Sharma, Rohan Kumar Das, Haizhou Li
On the Importance of Audio-Source Separation for Singer Identification in Polyphonic Music
Bidisha Sharma, Rohan Kumar Das, Haizhou Li
Investigating the Physiological and Acoustic Contrasts Between Choral and Operatic Singing
Hiroko Terasawa, Kenta Wakasa, Hideki Kawahara, Ken-Ichi Sakakibara
Optimizing Voice Activity Detection for Noisy Conditions
Ruixi Lin, Charles Costello, Charles Jankowski, Vishwas Mruthyunjaya
Small-Footprint Magic Word Detection Method Using Convolutional LSTM Neural Network
Taiki Yamamoto, Ryota Nishimura, Masayuki Misaki, Norihide Kitaoka
Acoustic Modeling for Automatic Lyrics-to-Audio Alignment
Chitralekha Gupta, Emre Yılmaz, Haizhou Li
Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection
Anastasios Vafeiadis, Eleftherios Fanioudakis, Ilyas Potamitis, Konstantinos Votis, Dimitrios Giakoumis, Dimitrios Tzovaras, Liming Chen, Raouf Hamzaoui
A Study of Soprano Singing in Light of the Source-Filter Interaction
Tokihiko Kaburagi
Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring
Yuxiang Zou, Linhao Dong, Bo Xu
Building a Mixed-Lingual Neural TTS System with Only Monolingual Data
Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu
Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion
Alex Sokolov, Tracy Rohlin, Ariya Rastrow
Analysis of Pronunciation Learning in End-to-End Speech Synthesis
Jason Taylor, Korin Richmond
End-to-End Text-to-Speech for Low-Resource Languages by Cross-Lingual Transfer Learning
Yuan-Jui Chen, Tao Tu, Cheng-chieh Yeh, Hung-Yi Lee
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R.J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran
Unified Language-Independent DNN-Based G2P Converter
Markéta Jůzová, Daniel Tihelka, Jakub Vít
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT
Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng
Transformer Based Grapheme-to-Phoneme Conversion
Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth
Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages
Harry Bleyan, Sandy Ritchie, Jonas Fromseier Mortensen, Daan van Esch
Cross-Lingual, Multi-Speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
Mengnan Chen, Minchuan Chen, Shuang Liang, Jun Ma, Lei Chen, Shaojun Wang, Jing Xiao
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features
Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, Ming Li
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
Hao Sun, Xu Tan, Jun-Wei Gan, Hongzhi Liu, Sheng Zhao, Tao Qin, Tie-Yan Liu
Multilingual Speech Recognition with Corpus Relatedness Sampling
Xinjian Li, Siddharth Dalmia, Alan W. Black, Florian Metze
Multi-Dialect Acoustic Modeling Using Phone Mapping and Online i-Vectors
Harish Arsikere, Ashtosh Sapru, Sri Garimella
Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model
Anjuli Kannan, Arindrima Datta, Tara N. Sainath, Eugene Weinstein, Bhuvana Ramabhadran, Yonghui Wu, Ankur Bapna, Zhifeng Chen, Seungji Lee
Recognition of Latin American Spanish Using Multi-Task Learning
Carlos Mendes, Alberto Abad, João Paulo Neto, Isabel Trancoso
End-to-End Accented Speech Recognition
Thibault Viglino, Petr Motlicek, Milos Cernak
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition
Sheng Li, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
Exploiting Monolingual Speech Corpora for Code-Mixed Speech Recognition
Karan Taneja, Satarupa Guha, Preethi Jyothi, Basil Abraham
Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar, Golan Pundak
Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data
Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma
On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition
Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Eng Siong Chng, Haizhou Li
Towards Language-Universal Mandarin-English Speech Recognition
Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma, Lei Xie
Improving ASR Confidence Scores for Alexa Using Acoustic and Hypothesis Embeddings
Prakhar Swarup, Roland Maas, Sri Garimella, Sri Harish Mallidi, Björn Hoffmeister
Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition
Shiliang Zhang, Ming Lei, Zhijie Yan
Improving Performance of End-to-End ASR on Numeric Sequences
Cal Peyser, Hao Zhang, Tara N. Sainath, Zelin Wu
A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Chenghao Zhao, Cunhang Fan
Sub-Band Convolutional Neural Networks for Small-Footprint Spoken Term Classification
Chieh-Chi Kao, Ming Sun, Yixin Gao, Shiv Vitaladevuni, Chao Wang
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese
Sheng Li, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
Joint Decoding of CTC Based Systems for Speech Recognition
Jiaqi Guo, Yongbin You, Yanmin Qian, Kai Yu
A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge
Tomohiro Tanaka, Ryo Masumura, Takafumi Moriya, Takanobu Oba, Yushi Aono
Active Learning Methods for Low Resource End-to-End Speech Recognition
Karan Malhotra, Shubham Bansal, Sriram Ganapathy
Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems
Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan Černocký
Lattice Generation in Attention-Based Speech Recognition Models
Michał Zapotoczny, Piotr Pietrzak, Adrian Łańcucki, Jan Chorowski
Sampling from Stochastic Finite Automata with Applications to CTC Decoding
Martin Jansche, Alexander Gutkin
ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning
Łukasz Dudziak, Mohamed S. Abdelfattah, Ravichander Vipperla, Stefanos Laskaridis, Nicholas D. Lane
Acoustic-to-Phrase Models for Speech Recognition
Yashesh Gaur, Jinyu Li, Zhong Meng, Yifan Gong
Performance Monitoring for End-to-End Speech Recognition
Ruizhi Li, Gregory Sell, Hynek Hermansky
The Role of Musical Experience in the Perceptual Weighting of Acoustic Cues for the Obstruent Coda Voicing Contrast in American English
Michelle Cohn, Georgia Zellou, Santiago Barreda
Individual Differences in Implicit Attention to Phonetic Detail in Speech Perception
Natalie Lewandowski, Daniel Duran
Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit
Kaylah Lalonde
Listening with Great Expectations: An Investigation of Word Form Anticipations in Naturalistic Speech
M. Bentum, L. ten Bosch, A. van den Bosch, Mirjam Ernestus
Quantifying Expectation Modulation in Human Speech Processing
M. Bentum, L. ten Bosch, A. van den Bosch, Mirjam Ernestus
Perception of Pitch Contours in Speech and Nonspeech
Daniel R. Turner, Ann R. Bradlow, Jennifer S. Cole
Analyzing Reaction Time and Error Sequences in Lexical Decision Experiments
L. ten Bosch, L. Boves, K. Mulder
Automatic Detection of the Temporal Segmentation of Hand Movements in British English Cued Speech
Li Liu, Jianze Li, Gang Feng, Xiao-Ping Zhang
Place Shift as an Autonomous Process: Evidence from Japanese Listeners
Yuriko Yokoe
A Perceptual Study of CV Syllables in Both Spoken and Whistled Speech: A Tashlhiyt Berber Perspective
Julien Meyer, Laure Dentel, Silvain Gerber, Rachid Ridouane
Consonant Classification in Mandarin Based on the Depth Image Feature: A Pilot Study
Han-Chi Hsieh, Wei-Zhong Zheng, Ko-Chiang Chen, Ying-Hui Lai
The Different Roles of Expectations in Phonetic and Lexical Processing
Shiri Lev-Ari, Robin Dodsworth, Jeff Mielke, Sharon Peperkamp
Perceptual Adaptation to Device and Human Voices: Learning and Generalization of a Phonetic Shift Across Real and Voice-AI Talkers
Bruno Ferenc Segedin, Michelle Cohn, Georgia Zellou
End-to-End Convolutional Sequence Learning for ASL Fingerspelling Recognition
Katerina Papadimitriou, Gerasimos Potamianos
Multiview Shared Subspace Learning Across Speakers and Speech Commands
Krishna Somandepalli, Naveen Kumar, Arindam Jati, Panayiotis Georgiou, Shrikanth Narayanan
A Machine Learning Based Clustering Protocol for Determining Hearing Aid Initial Configurations from Pure-Tone Audiograms
Chelzy Belitz, Hussnain Ali, John H.L. Hansen
Acoustic Scene Classification with Mismatched Devices Using CliqueNets and Mixup Data Augmentation
Truc Nguyen, Franz Pernkopf
DeepLung: Smartphone Convolutional Neural Network-Based Inference of Lung Anomalies for Pulmonary Patients
Mohsin Y. Ahmed, Md. Mahbubur Rahman, Jilong Kuang
On the Use/Misuse of the Term ‘Phoneme’
Roger K. Moore, Lucy Skidmore
Understanding and Visualizing Raw Waveform-Based CNNs
Hannah Muckenhirn, Vinayak Abrol, Mathew Magimai-Doss, Sébastien Marcel
Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms
Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, Matthew Sharifi
ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems
Yuan Gong, Jian Yang, Jacob Huber, Mitchell MacKnight, Christian Poellabauer
Analyzing Intra-Speaker and Inter-Speaker Vocal Tract Impedance Characteristics in a Low-Dimensional Feature Space Using t-SNE
Balamurali B.T., Jer-Ming Chen
Directional Audio Rendering Using a Neural Network Based Personalized HRTF
Geon Woo Lee, Jung Hyuk Lee, Seong Ju Kim, Hong Kook Kim
Online Speech Processing and Analysis Suite
Wikus Pienaar, Daan Wissing
Formant Pattern and Spectral Shape Ambiguity of Vowel Sounds, and Related Phenomena of Vowel Acoustics — Exemplary Evidence
Dieter Maurer, Heidy Suter, Christian d’Hereuse, Volker Dellwo
Sound Tools eXtended (STx) 5.0 — A Powerful Sound Analysis Tool Optimized for Speech
Anton Noll, Jonathan Stuefer, Nicola Klingler, Hannah Leykum, Carina Lozo, Jan Luttenberger, Michael Pucher, Carolin Schmid
FarSpeech: Arabic Natural Language Processing for Live Arabic Speech
Mohamed Eldesouki, Naassih Gopee, Ahmed Ali, Kareem Darwish
A System for Real-Time Privacy Preserving Data Collection for Ambient Assisted Living
Fasih Haider, Saturnino Luz
NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion
Chitralekha Gupta, Karthika Vijayan, Bidisha Sharma, Xiaoxue Gao, Haizhou Li
The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity
Björn W. Schuller, Anton Batliner, Christian Bergler, Florian B. Pokorny, Jarek Krajewski, Margaret Cychosz, Ralf Vollmann, Sonja-Dana Roelen, Sebastian Schnieder, Elika Bergelson, Alejandrina Cristia, Amanda Seidl, Anne S. Warlaumont, Lisa Yankowitz, Elmar Nöth, Shahin Amiriparian, Simone Hantke, Maximilian Schmitt
Using Speech Production Knowledge for Raw Waveform Modelling Based Styrian Dialect Identification
S. Pavankumar Dubagunta, Mathew Magimai-Doss
Deep Neural Baselines for Computational Paralinguistics
Daniel Elsner, Stefan Langer, Fabian Ritz, Robert Mueller, Steffen Illium
Styrian Dialect Classification: Comparing and Fusing Classifiers Based on a Feature Selection Using a Genetic Algorithm
Thomas Kisler, Raphael Winkelmann, Florian Schiel
Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition
Sung-Lin Yeh, Gao-Yi Chao, Bo-Hao Su, Yu-Lin Huang, Meng-Han Lin, Yin-Chun Tsai, Yu-Wen Tai, Zheng-Chi Lu, Chieh-Yu Chen, Tsung-Ming Tai, Chiu-Wang Tseng, Cheng-Kuang Lee, Chi-Chun Lee
Ordinal Triplet Loss: Investigating Sleepiness Detection from Speech
Peter Wu, SaiKrishna Rallabandi, Alan W. Black, Eric Nyberg
Voice Quality and Between-Frame Entropy for Sleepiness Estimation
Vijay Ravi, Soo Jin Park, Amber Afshan, Abeer Alwan
Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds
Gábor Gosztolya
Instantaneous Phase and Long-Term Acoustic Cues for Orca Activity Detection
Rohan Kumar Das, Haizhou Li
Relevance-Based Feature Masking: Improving Neural Network Based Whale Classification Through Explainable Artificial Intelligence
Dominik Schiller, Tobias Huber, Florian Lingenfelser, Michael Dietz, Andreas Seiderer, Elisabeth André
Spatial, Temporal and Spectral Multiresolution Analysis for the INTERSPEECH 2019 ComParE Challenge
Marie-José Caraty, Claude Montacié
The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge
Haiwei Wu, Weiqing Wang, Ming Li
The VOiCES from a Distance Challenge 2019
Mahesh Kumar Nandwana, Julien van Hout, Colleen Richey, Mitchell McLaren, Maria A. Barrios, Aaron Lawson
STC Speaker Recognition Systems for the VOiCES from a Distance Challenge
Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Galina Lavrentyeva, Vladimir Volokhov, Alexandr Kozlov
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge
Pavel Matějka, Oldřich Plchot, Hossein Zeinali, Ladislav Mošner, Anna Silnova, Lukáš Burget, Ondřej Novotný, Ondřej Glembek
The STC ASR System for the VOiCES from a Distance Challenge 2019
Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Ivan Sorokin, Anton Mitrofanov, Vladimir Bataev, Andrei Andrusenko, Tatiana Prisyach, Mariya Korenevskaya, Oleg Petrov, Alexander Zatvornitskiy
The I2R’s ASR System for the VOiCES from a Distance Challenge 2019
Tze Yuang Chong, Kye Min Tan, Kah Kuan Teh, Chang Huai You, Hanwu Sun, Huy Dat Tran
The VOiCES from a Distance Challenge 2019
Mahesh Kumar Nandwana, Julien van Hout, Colleen Richey, Mitchell McLaren, Maria A. Barrios, Aaron Lawson
STC Speaker Recognition Systems for the VOiCES from a Distance Challenge
Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Galina Lavrentyeva, Vladimir Volokhov, Alexandr Kozlov
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge
Pavel Matějka, Oldřich Plchot, Hossein Zeinali, Ladislav Mošner, Anna Silnova, Lukáš Burget, Ondřej Novotný, Ondřej Glembek
The STC ASR System for the VOiCES from a Distance Challenge 2019
Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Ivan Sorokin, Anton Mitrofanov, Vladimir Bataev, Andrei Andrusenko, Tatiana Prisyach, Mariya Korenevskaya, Oleg Petrov, Alexander Zatvornitskiy
The I2R’s ASR System for the VOiCES from a Distance Challenge 2019
Tze Yuang Chong, Kye Min Tan, Kah Kuan Teh, Chang Huai You, Hanwu Sun, Huy Dat Tran
Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech
Arindam Jati, Raghuveer Peri, Monisankha Pal, Tae Jin Park, Naveen Kumar, Ruchir Travadi, Panayiotis Georgiou, Shrikanth Narayanan
The JHU Speaker Recognition System for the VOiCES 2019 Challenge
David Snyder, Jesús Villalba, Nanxin Chen, Daniel Povey, Gregory Sell, Najim Dehak, Sanjeev Khudanpur
Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019
Jonathan Huang, Tobias Bocklet
The I2R’s Submission to VOiCES Distance Speaker Recognition Challenge 2019
Hanwu Sun, Kah Kuan Teh, Ivan Kukanov, Huy Dat Tran
The LeVoice Far-Field Speech Recognition System for VOiCES from a Distance Challenge 2019
Yulong Liang, Lin Yang, Xuyang Wang, Yingjie Li, Chen Jia, Junjie Wang
The JHU ASR System for VOiCES from a Distance Challenge 2019
Yiming Wang, David Snyder, Hainan Xu, Vimal Manohar, Phani Sankar Nidadavolu, Daniel Povey, Sanjeev Khudanpur
The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge
Danwei Cai, Xiaoyi Qin, Weicheng Cai, Ming Li
Identifying Distinctive Acoustic and Spectral Features in Parkinson’s Disease
Yermiyahu Hauptman, Ruth Aloni-Lavi, Itshak Lapidot, Tanya Gurevich, Yael Manor, Stav Naor, Noa Diamant, Irit Opher
Aerodynamics and Lumped-Masses Combined with Delay Lines for Modeling Vertical and Anterior-Posterior Phase Differences in Pathological Vocal Fold Vibration
Carlo Drioli, Philipp Aichinger
Mel-Frequency Cepstral Coefficients of Voice Source Waveforms for Classification of Phonation Types in Speech
Sudarsana Reddy Kadiri, Paavo Alku
Automatic Detection of Autism Spectrum Disorder in Children Using Acoustic and Text Features from Brief Natural Conversations
Sunghye Cho, Mark Liberman, Neville Ryant, Meredith Cola, Robert T. Schultz, Julia Parish-Morris
Analysis and Synthesis of Vocal Flutter and Vocal Jitter
Jean Schoentgen, Philipp Aichinger
Reliability of Clinical Voice Parameters Captured with Smartphones — Measurements of Added Noise and Spectral Tilt
Felix Schaeffler, Stephen Jannetts, Janet Beck
Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make
Meredith Moore, Michael Saxon, Hemanth Venkateswara, Visar Berisha, Sethuraman Panchanathan
Survey Talk: Prosody Research and Applications: The State of the Art
Nigel G. Ward
Dimensions of Prosodic Prominence in an Attractor Model
Simon Roessig, Doris Mücke, Lena Pagel
Comparative Analysis of Prosodic Characteristics Using WaveNet Embeddings
Antti Suni, Marcin Włodarczak, Martti Vainio, Juraj Šimko
The Role of Voice Quality in the Perception of Prominence in Synthetic Speech
Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl
Phonological Awareness of French Rising Contours in Japanese Learners
Rachel Albar, Hiyon Yoo
Audio Classification of Bit-Representation Waveform
Masaki Okawa, Takuya Saito, Naoki Sawada, Hiromitsu Nishizaki
Locality-Constrained Linear Coding Based Fused Visual Features for Robust Acoustic Event Classification
Manjunath Mulimani, Shashidhar G. Koolagudi
Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection
Yu-Han Shen, Ke-Xin He, Wei-Qiang Zhang
A Deep Residual Network for Large-Scale Acoustic Scene Analysis
Logan Ford, Hao Tang, François Grondin, James Glass
Supervised Classifiers for Audio Impairments with Noisy Labels
Chandan K.A. Reddy, Ross Cutler, Johannes Gehrke
Self-Attention for Speech Emotion Recognition
Lorenzo Tarantino, Philip N. Garner, Alexandros Lazaridis
Unsupervised Singing Voice Conversion
Eliya Nachmani, Lior Wolf
Adversarially Trained End-to-End Korean Singing Voice Synthesis System
Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, Kyogu Lee
Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling
Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai
Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis
Sara Dahmani, Vincent Colotte, Valérian Girard, Slim Ouni
A Strategy for Improved Phone-Level Lyrics-to-Audio Alignment for Speech-to-Singing Synthesis
David Ayllón, Fernando Villavicencio, Pierre Lanchantin
Modeling Labial Coarticulation with Bidirectional Gated Recurrent Networks and Transfer Learning
Théo Biasutto--Lervat, Sara Dahmani, Slim Ouni
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le
Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition
Kartik Audhkhasi, George Saon, Zoltán Tüske, Brian Kingsbury, Michael Picheny
Online Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, Ta Li, Yonghong Yan
A Highly Efficient Distributed Deep Learning System for Automatic Speech Recognition
Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny
Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System
Wangyou Zhang, Xuankai Chang, Yanmin Qian
Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney
Survey Talk: Recognition of Foreign-Accented Speech: Challenges and Opportunities for Human and Computer Speech Communication
Ann R. Bradlow
The Effects of Time Expansion on English as a Second Language Individuals
John S. Novak, Daniel Bunn, Robert V. Kenyon
Capturing L1 Influence on L2 Pronunciation by Simulating Perceptual Space Using Acoustic Features
Shuju Shi, Chilin Shih, Jinsong Zhang
Cognitive Factors in Thai-Naïve Mandarin Speakers’ Imitation of Thai Lexical Tones
Juqiang Chen, Catherine T. Best, Mark Antoniou
Foreign-Language Knowledge Enhances Artificial-Language Segmentation
Annie Tremblay, Mirjam Broersma
Neural Named Entity Recognition from Subword Units
Abdalghani Abujabal, Judith Gaspers
Unsupervised Acoustic Segmentation and Clustering Using Siamese Network Embeddings
Saurabhchand Bhati, Shekhar Nayak, K. Sri Rama Murty, Najim Dehak
An Empirical Evaluation of DTW Subsampling Methods for Keyword Search
Bolaji Yusuf, Murat Saraclar
Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages
Zixiaofan Yang, Julia Hirschberg
Multimodal Word Discovery and Retrieval with Phone Sequence and Image Concepts
Liming Wang, Mark A. Hasegawa-Johnson
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-Resource Settings
Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation
Wei Xue, Ying Tong, Guohong Ding, Chao Zhang, Tao Ma, Xiaodong He, Bowen Zhou
Multiple Sound Source Localization with SVD-PHAT
François Grondin, James Glass
Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking
Wangyou Zhang, Ying Zhou, Yanmin Qian
Multichannel Loss Function for Supervised Speech Source Separation by Mask-Based Beamforming
Yoshiki Masuyama, Masahito Togami, Tatsuya Komatsu
Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction
Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li
Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues
Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani
Speech Denoising with Deep Feature Losses
François G. Germain, Qifeng Chen, Vladlen Koltun
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John R. Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno
Incorporating Symbolic Sequential Modeling for Speech Enhancement
Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai
Maximum a posteriori Speech Enhancement Based on Double Spectrum
Pejman Mowlaee, Daniel Scheran, Johannes Stahl, Sean U.N. Wood, W. Bastiaan Kleijn
Coarse-to-Fine Optimization for Speech Enhancement
Jian Yao, Ahmad Al-Dahle
Kernel Machines Beat Deep Neural Networks on Mask-Based Single-Channel Speech Enhancement
Like Hui, Siyuan Ma, Mikhail Belkin
Survey Talk: Multimodal Processing of Speech and Language
Florian Metze
MobiVSR : Efficient and Light-Weight Neural Network for Visual Speech Recognition on Mobile Devices
Nilay Shrivastava, Astitwa Saxena, Yaman Kumar, Rajiv Ratn Shah, Amanda Stent, Debanjan Mahata, Preeti Kaur, Roger Zimmermann
Speaker Adaptation for Lip-Reading Using Visual Identity Vectors
Pujitha Appan Kandala, Abhinav Thanda, Dilip Kumar Margam, Rohith Chandrashekar Aralikatti, Tanay Sharma, Sharad Roy, Shankar M. Venkatesan
MobiLipNet: Resource-Efficient Deep Learning Based Lipreading
Alexandros Koumparoulis, Gerasimos Potamianos
LipSound: Neural Mel-Spectrogram Reconstruction for Lip Reading
Leyuan Qu, Cornelius Weber, Stefan Wermter
Two-Pass End-to-End Speech Recognition
Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu
Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition
Max W.Y. Lam, Jun Wang, Xunying Liu, Helen Meng, Dan Su, Dong Yu
Multi-Task Multi-Resolution Char-to-BPE Cross-Attention Decoder for End-to-End Speech Recognition
Dhananjaya Gowda, Abhinav Garg, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim
Multi-Stride Self-Attention for Speech Recognition
Kyu J. Han, Jing Huang, Yun Tang, Xiaodong He, Bowen Zhou
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition
Shoukang Hu, Xurong Xie, Shansong Liu, Max W.Y. Lam, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng
Self-Teaching Networks
Liang Lu, Eric Sun, Yifan Gong
Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
Yuanchao Li, Tianyu Zhao, Tatsuya Kawahara
Continuous Emotion Recognition in Speech — Do We Need Recurrence?
Maximilian Schmitt, Nicholas Cummins, Björn W. Schuller
Speech Based Emotion Prediction: Can a Linear Model Work?
Anda Ouyang, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Speech Emotion Recognition Based on Multi-Label Emotion Existence Model
Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono
Gender De-Biasing in Speech Emotion Recognition
Cristina Gorrostieta, Reza Lotfian, Kye Taylor, Richard Brutti, John Kane
CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition
Fang Bao, Michael Neumann, Ngoc Thang Vu
Lombard Speech Synthesis Using Transfer Learning in a Tacotron Text-to-Speech System
Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
Augmented CycleGANs for Continuous Scale Normal-to-Lombard Speaking Style Conversion
Shreyas Seshadri, Lauri Juvela, Paavo Alku, Okko Räsänen
Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams
Guanlong Zhao, Shaojin Ding, Ricardo Gutierrez-Osuna
A Multi-Speaker Emotion Morphing Model Using Highway Networks and Maximum Likelihood Objective
Ravi Shankar, Jacob Sager, Archana Venkataraman
Effects of Waveform PMF on Anti-Spoofing Detection
Itshak Lapidot, Jean-François Bonastre
Nonparallel Emotional Speech Conversion
Jian Gao, Deep Chakraborty, Hamidou Tembine, Olaitan Olaleye
Self-Supervised Speaker Embeddings
Themos Stafylakis, Johan Rohdin, Oldřich Plchot, Petr Mizera, Lukáš Burget
Privacy-Preserving Speaker Recognition with Cohort Score Normalisation
Andreas Nautsch, Jose Patino, Amos Treiber, Themos Stafylakis, Petr Mizera, Massimiliano Todisco, Thomas Schneider, Nicholas Evans
Large Margin Softmax Loss for Speaker Verification
Yi Liu, Liang He, Jia Liu
A Deep Neural Network for Short-Segment Speaker Recognition
Amirhossein Hajavi, Ali Etemad
Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function
Jianfeng Zhou, Tao Jiang, Zheng Li, Lin Li, Qingyang Hong
VoiceID Loss: Speech Enhancement for Speaker Verification
Suwon Shon, Hao Tang, James Glass
Blind Channel Response Estimation for Replay Attack Detection
Anderson R. Avila, Jahangir Alam, Douglas O’Shaughnessy, Tiago H. Falk
Energy Separation-Based Instantaneous Frequency Estimation for Cochlear Cepstral Feature for Replay Spoof Detection
Ankur T. Patil, Rajul Acharya, Pulikonda Aditya Sai, Hemant A. Patil
Optimization of False Acceptance/Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems
Victoria Mingote, Antonio Miguel, Dayana Ribas, Alfonso Ortega, Eduardo Lleida
Deep Hashing for Speaker Identification and Retrieval
Lei Fan, Qing-Yuan Jiang, Ya-Qi Yu, Wu-Jun Li
Adversarial Optimization for Dictionary Attacks on Speaker Verification
Mirko Marras, Paweł Korus, Nasir Memon, Gianni Fenu
An Adaptive-Q Cochlear Model for Replay Spoofing Detection
Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Haizhou Li
An End-to-End Text-Independent Speaker Verification Framework with a Keyword Adversarial Network
Sungrack Yun, Janghoon Cho, Jungyun Eum, Wonil Chang, Kyuwoong Hwang
Shortcut Connections Based Deep Speaker Embeddings for End-to-End Speaker Verification System
Soonshin Seo, Daniel Jun Rim, Minkyu Lim, Donghyun Lee, Hosung Park, Junseok Oh, Changmin Kim, Ji-Hwan Kim
Device Feature Extractor for Replay Spoofing Detection
Chang Huai You, Jichen Yang, Huy Dat Tran
Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training
Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian, Kai Yu
A Study of x-Vector Based Speaker Recognition on Short Utterances
A. Kanagasundaram, S. Sridharan, G. Sriram, S. Prachi, C. Fookes
Tied Mixture of Factor Analyzers Layer to Combine Frame Level Representations in Neural Speaker Embeddings
Nanxin Chen, Jesús Villalba, Najim Dehak
Biologically Inspired Adaptive-Q Filterbanks for Replay Spoofing Attack Detection
Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps
On Robustness of Unsupervised Domain Adaptation for Speaker Recognition
Pierre-Michel Bousquet, Mickael Rouvier
Large-Scale Speaker Retrieval on Random Speaker Variability Subspace
Suwon Shon, Younggun Lee, Taesu Kim
Meeting Transcription Using Asynchronous Distant Microphones
Takuya Yoshioka, Dimitrios Dimitriadis, Andreas Stolcke, William Hinthorn, Zhuo Chen, Michael Zeng, Xuedong Huang
Detection and Recovery of OOVs for Improved English Broadcast News Captioning
Samuel Thomas, Kartik Audhkhasi, Zoltán Tüske, Yinghui Huang, Michael Picheny
Improving Large Vocabulary Urdu Speech Recognition System Using Deep Neural Networks
Muhammad Umar Farooq, Farah Adeeba, Sahar Rauf, Sarmad Hussain
Hybrid Arbitration Using Raw ASR String and NLU Information — Taking the Best of Both Embedded World and Cloud World
Min Tang
Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach
György Szaszák, Máté Ákos Tündik
The Airbus Air Traffic Control Speech Recognition 2018 Challenge: Towards ATC Automatic Transcription and Call Sign Detection
Thomas Pellegrini, Jérôme Farinas, Estelle Delpech, François Lancelot
Kite: Automatic Speech Recognition for Unmanned Aerial Vehicles
Dan Oneață, Horia Cucu
Exploring Methods for the Automatic Detection of Errors in Manual Transcription
Xiaofei Wang, Jinyi Yang, Ruizhi Li, Samik Sadhu, Hynek Hermansky
Improved Low-Resource Somali Speech Recognition by Semi-Supervised Acoustic and Language Model Training
Astik Biswas, Raghav Menon, Ewald van der Westhuizen, Thomas Niesler
The Althingi ASR System
Inga R. Helgadóttir, Anna Björk Nikulásdóttir, Michal Borský, Judy Y. Fong, Róbert Kjaran, Jón Guðnason
CRIM’s Speech Transcription and Call Sign Detection System for the ATC Airbus Challenge Task
Vishwa Gupta, Lise Rebout, Gilles Boulianne, Pierre-André Ménard, Jahangir Alam
Optimizing Speech-Input Length for Speaker-Independent Depression Classification
Tomasz Rutowski, Amir Harati, Yang Lu, Elizabeth Shriberg
A New Approach for Automating Analysis of Responses on Verbal Fluency Tests from Subjects At-Risk for Schizophrenia
Mary Pietrowicz, Carla Agurto, Raquel Norel, Elif Eyigoz, Guillermo Cecchi, Zarina R. Bilgrami, Cheryl Corcoran
Comparison of Telephone Recordings and Professional Microphone Recordings for Early Detection of Parkinson’s Disease, Using Mel-Frequency Cepstral Coefficients with Gaussian Mixture Models
Laetitia Jeancolas, Graziella Mangone, Jean-Christophe Corvol, Marie Vidailhet, Stéphane Lehéricy, Badr-Eddine Benkelfat, Habib Benali, Dijana Petrovska-Delacrétaz
Spectral Subspace Analysis for Automatic Assessment of Pathological Speech Intelligibility
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard
An Investigation of Therapeutic Rapport Through Prosody in Brief Psychodynamic Psychotherapy
Carolina De Pasquale, Charlie Cullen, Brian Vaughan
Feature Representation of Pathophysiology of Parkinsonian Dysarthria
Alice Rueda, J.C. Vásquez-Correa, Cristian David Rios-Urrego, Juan Rafael Orozco-Arroyave, Sridhar Krishnan, Elmar Nöth
Neural Transfer Learning for Cry-Based Diagnosis of Perinatal Asphyxia
Charles C. Onu, Jonathan Lebensold, William L. Hamilton, Doina Precup
Investigating the Variability of Voice Quality and Pain Levels as a Function of Multiple Clinical Parameters
Hui-Ting Hong, Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee
Assessing Parkinson’s Disease from Speech Using Fisher Vectors
José Vicente Egas López, Juan Rafael Orozco-Arroyave, Gábor Gosztolya
Feature Space Visualization with Spatial Similarity Maps for Pathological Speech Data
Philipp Klumpp, J.C. Vásquez-Correa, Tino Haderlein, Elmar Nöth
Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions Using Speech and Language
Sandeep Nallan Chakravarthula, Haoqi Li, Shao-Yen Tseng, Maija Reblin, Panayiotis Georgiou
Automatic Assessment of Language Impairment Based on Raw ASR Output
Ying Qin, Tan Lee, Anthony Pak Hin Kong
Effects of Spectral and Temporal Cues to Mandarin Concurrent-Vowels Identification for Normal-Hearing and Hearing-Impaired Listeners
Zhen Fu, Xihong Wu, Jing Chen
Disfluencies and Human Speech Transcription Errors
Vicky Zayats, Trang Tran, Richard Wright, Courtney Mansfield, Mari Ostendorf
The Influence of Distraction on Speech Processing: How Selective is Selective Attention?
Sandra I. Parhammer, Miriam Ebersberg, Jenny Tippmann, Katja Stärk, Andreas Opitz, Barbara Hinger, Sonja Rossi
Subjective Evaluation of Communicative Effort for Younger and Older Adults in Interactive Tasks with Energetic and Informational Masking
Valerie Hazan, Outi Tuomainen, Linda Taschenberger
Perceiving Older Adults Producing Clear and Lombard Speech
Chris Davis, Jeesun Kim
Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users
T. Arias-Vergara, Juan Rafael Orozco-Arroyave, Milos Cernak, S. Gollwitzer, M. Schuster, Elmar Nöth
Effects of Urgent Speech and Congruent/Incongruent Text on Speech Intelligibility in Noise and Reverberation
Nao Hodoshima
Quantifying Cochlear Implant Users’ Ability for Speaker Identification Using CI Auditory Stimuli
Nursadul Mamun, Ria Ghosh, John H.L. Hansen
Lexically Guided Perceptual Learning of a Vowel Shift in an Interactive L2 Listening Context
E. Felker, Mirjam Ernestus, Mirjam Broersma
Talker Intelligibility and Listening Effort with Temporally Modified Speech
Maximillian Paulus, Valerie Hazan, Patti Adank
R2SPIN: Re-Recording the Revised Speech Perception in Noise Test
Lauren Ward, Catherine Robinson, Matthew Paradis, Katherine M. Tucker, Ben Shirley
Contributions of Consonant-Vowel Transitions to Mandarin Tone Identification in Simulated Electric-Acoustic Hearing
Fei Chen
Monaural Speech Enhancement with Dilated Convolutions
Shadi Pirhosseinloo, Jonathan S. Brumberg
Noise Adaptive Speech Enhancement Using Domain Adversarial Training
Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang
Environment-Dependent Attention-Driven Recurrent Convolutional Neural Network for Robust Speech Enhancement
Meng Ge, Longbiao Wang, Nan Li, Hao Shi, Jianwu Dang, Xiangang Li
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement Using Variational Autoencoders
Manuel Pariente, Antoine Deleforge, Emmanuel Vincent
Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction
Ju Lin, Sufeng Niu, Zice Wei, Xiang Lan, Adriaan J. van Wijngaarden, Melissa C. Smith, Kuang-Ching Wang
Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric
Ryandhimas E. Zezario, Szu-Wei Fu, Xugang Lu, Hsin-Min Wang, Yu Tsao
Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement
Fu-Kai Chuang, Syu-Siang Wang, Jeih-weih Hung, Yu Tsao, Shih-Hau Fang
Investigation of Cost Function for Supervised Monaural Speech Separation
Yun Liu, Hui Zhang, Xueliang Zhang, Yuhang Cao
Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han, Anyan Shi
Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement
Xianyun Wang, Changchun Bao
Progressive Speech Enhancement with Residual Connections
Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega, Eduardo Lleida
Acoustic Model Bootstrapping Using Semi-Supervised Learning
Langzhou Chen, Volker Leutnant
Bandwidth Embeddings for Mixed-Bandwidth Speech Recognition
Gautam Mantena, Ozlem Kalinli, Ossama Abdel-Hamid, Don McAllaster
Adversarial Black-Box Attacks on Automatic Speech Recognition Systems Using Multi-Objective Evolutionary Optimization
Shreya Khare, Rahul Aralikatte, Senthil Mani
Towards Debugging Deep Neural Networks by Generating Speech Utterances
Bilal Soomro, Anssi Kanervisto, Trung Ngo Trong, Ville Hautamäki
Compression of CTC-Trained Acoustic Models by Dynamic Frame-Wise Distillation or Segment-Wise N-Best Hypotheses Imitation
Haisong Ding, Kai Chen, Qiang Huo
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers
Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen
Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition
Mortaza Doulaty, Thomas Hain
Target Speaker Recovery and Recognition Network with Average x-Vector and Global Training
Wenjie Li, Pengyuan Zhang, Yonghong Yan
Lyrics Recognition from Singing Voice Focused on Correspondence Between Voice and Notes
Motoyuki Suzuki, Sho Tomita, Tomoki Morita
Transfer Learning from Audio-Visual Grounding to Speech Recognition
Wei-Ning Hsu, David Harwath, James Glass
Cross-Corpus Speech Emotion Recognition Using Semi-Supervised Transfer Non-Negative Matrix Factorization with Adaptation Regularization
Hui Luo, Jiqing Han
Modeling User Context for Valence Prediction from Narratives
Aniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner, Giuseppe Riccardi
Front-End Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition
Rupayan Chakraborty, Ashish Panda, Meghna Pandharipande, Sonal Joshi, Sunil Kumar Kopparapu
The Contribution of Acoustic Features Analysis to Model Emotion Perceptual Process for Language Diversity
Xingfeng Li, Masato Akagi
Design and Development of a Multi-Lingual Speech Corpora (TaMaR-EmoDB) for Emotion Analysis
Rajeev Rajan, Haritha U.G., Sujitha A.C., Rejisha T. M.
Speech Emotion Recognition with a Reject Option
Kusha Sridhar, Carlos Busso
Development of Emotion Rankers Based on Intended and Perceived Emotion Labels
Zhenghao Jin, Houwei Cao
Emotion Recognition from Natural Phone Conversations in Individuals with and without Recent Suicidal Ideation
John Gideon, Heather T. Schatten, Melvin G. McInnis, Emily Mower Provost
An Acoustic and Lexical Analysis of Emotional Valence in Spontaneous Speech: Autobiographical Memory Recall in Older Adults
Deniece S. Nazareth, Ellen Tournier, Sarah Leimkötter, Esther Janse, Dirk Heylen, Gerben J. Westerhof, Khiet P. Truong
Does the Lombard Effect Improve Emotional Communication in Noise? — Analysis of Emotional Speech Acted in Noise
Yi Zhao, Atsushi Ando, Shinji Takaki, Junichi Yamagishi, Satoshi Kobashikawa
Linear Discriminant Differential Evolution for Feature Selection in Emotional Speech Recognition
Soumaya Gharsellaoui, Sid Ahmed Selouani, Mohammed Sidi Yakoub
Multi-Modal Learning for Speech Emotion Recognition: An Analysis and Comparison of ASR Outputs with Ground Truth Transcription
Saurabh Sahu, Vikramjit Mitra, Nadee Seneviratne, Carol Espy-Wilson
Articulatory Characteristics of Secondary Palatalization in Romanian Fricatives
Laura Spinu, Maida Percival, Alexei Kochetov
Articulation of Vowel Length Contrasts in Australian English
Louise Ratko, Michael Proctor, Felicity Cox
V-to-V Coarticulation Induced Acoustic and Articulatory Variability of Vowels: The Effect of Pitch-Accent
Andrea Deme, Márton Bartók, Tekla Etelka Gráczi, Tamás Gábor Csapó, Alexandra Markó
The Contribution of Lip Protrusion to Anglo-English /r/: Evidence from Hyper- and Non-Hyperarticulated Speech
Hannah King, Emmanuel Ferragne
Articulatory Analysis of Transparent Vowel /iː/ in Harmonic and Antiharmonic Hungarian Stems: Is There a Difference?
Alexandra Markó, Márton Bartók, Tamás Gábor Csapó, Tekla Etelka Gráczi, Andrea Deme
On the Role of Oral Configurations in European Portuguese Nasal Vowels
Conceição Cunha, Samuel Silva, António Teixeira, Catarina Oliveira, Paula Martins, Arun A. Joseph, Jens Frahm
Residual + Capsule Networks (ResCap) for Simultaneous Single-Channel Overlapped Keyword Recognition
Yan Xiong, Visar Berisha, Chaitali Chakrabarti
A Study for Improving Device-Directed Speech Detection Toward Frictionless Human-Machine Interaction
Che-Wei Huang, Roland Maas, Sri Harish Mallidi, Björn Hoffmeister
Unsupervised Methods for Audio Classification from Lecture Discussion Recordings
Hang Su, Borislav Dzodzo, Xixin Wu, Xunying Liu, Helen Meng
Neural Whispered Speech Detection with Imbalanced Learning
Takanori Ashihara, Yusuke Shinohara, Hiroshi Sato, Takafumi Moriya, Kiyoaki Matsui, Takaaki Fukutomi, Yoshikazu Yamaguchi, Yushi Aono
Deep Learning for Orca Call Type Identification — A Fully Unsupervised Approach
Christian Bergler, Manuel Schmitt, Rachael Xi Cheng, Andreas Maier, Volker Barth, Elmar Nöth
Open-Vocabulary Keyword Spotting with Audio and Text Embeddings
Niccolò Sacchi, Alexandre Nanchen, Martin Jaggi, Milos Cernak
ToneNet: A CNN Model of Tone Classification of Mandarin Chinese
Qiang Gao, Shutao Sun, Yaping Yang
Temporal Convolution for Real-Time Keyword Spotting on Mobile Devices
Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha
Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation
Zhiying Huang, Shiliang Zhang, Ming Lei
Music Genre Classification Using Duplicated Convolutional Layers in Neural Networks
Hansi Yang, Wei-Qiang Zhang
A Storyteller’s Tale: Literature Audiobooks Genre Classification Using CNN and RNN Architectures
Nehory Carmi, Azaria Cohen, Mireille Avigal, Anat Lerner
Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment
Min-Jae Hwang, Hong-Goo Kang
Cascaded Cross-Module Residual Learning Towards Lightweight End-to-End Speech Coding
Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, Minje Kim
End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework
Tom Bäckström
A Real-Time Wideband Neural Vocoder at 1.6kb/s Using LPCNet
Jean-Marc Valin, Jan Skoglund
Super-Wideband Spectral Envelope Modeling for Speech Coding
Guillaume Fuchs, Chamran Ashour, Tom Bäckström
Speech Audio Super-Resolution for Speech Recognition
Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff
Artificial Bandwidth Extension Using H∞ Optimization
Deepika Gupta, Hanumant Singh Shekhawat
Quality Degradation Diagnosis for Voice Networks — Estimating the Perceived Noisiness, Coloration, and Discontinuity of Transmitted Speech
Gabriel Mittag, Sebastian Möller
A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition
Li Chai, Jun Du, Chin-Hui Lee
Extending the E-Model Towards Super-Wideband and Fullband Speech Communication Scenarios
Sebastian Möller, Gabriel Mittag, Thilo Michael, Vincent Barriac, Hitoshi Aoki
Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions
Samik Sadhu, Hynek Hermansky
Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech
Chenda Li, Yanmin Qian
Unsupervised Raw Waveform Representation Learning for ASR
Purvi Agrawal, Sriram Ganapathy
Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
David B. Ramsay, Kevin Kilgour, Dominik Roblek, Matthew Sharifi
Binary Speech Features for Keyword Spotting Tasks
Alexandre Riviello, Jean-Pierre David
wav2vec: Unsupervised Pre-Training for Speech Recognition
Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli
Automatic Detection of Prosodic Focus in American English
Sunghye Cho, Mark Liberman, Yong-cheol Lee
Feature Exploration for Almost Zero-Resource ASR-Free Keyword Spotting Using a Multilingual Bottleneck Extractor and Correspondence Autoencoders
Raghav Menon, Herman Kamper, Ewald van der Westhuizen, John Quinn, Thomas Niesler
On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters
Erfan Loweimi, Peter Bell, Steve Renals
Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?
Lyan Verwimp, Jerome R. Bellegarda
Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen
Character-Aware Sub-Word Level Language Modeling for Uyghur and Turkish ASR
Chang Liu, Zhen Zhang, Pengyuan Zhang, Yonghong Yan
Connecting and Comparing Language Model Interpolation Techniques
Ernest Pusateri, Christophe Van Gysel, Rami Botros, Sameer Badaskar, Mirko Hannemann, Youssef Oualil, Ilya Oparin
Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation
Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Haihua Xu, Eng Siong Chng
Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models
Jianwei Yu, Max W.Y. Lam, Shoukang Hu, Xixin Wu, Xu Li, Yuewen Cao, Xunying Liu, Helen Meng
Improving Automatically Induced Lexicons for Highly Agglutinating Languages Using Data-Driven Morphological Segmentation
Wiehan Agenbag, Thomas Niesler
Attention-Based Word Vector Prediction with LSTMs and its Application to the OOV Problem in ASR
Alejandro Coucheiro-Limeres, Fernando Fernández-Martínez, Rubén San-Segundo, Javier Ferreiros-López
Code-Switching Sentence Generation by Bert and Generative Adversarial Networks
Yingying Gao, Junlan Feng, Ying Liu, Leijing Hou, Xin Pan, Yong Ma
Unified Verbalization for Speech Recognition & Synthesis Across Languages
Sandy Ritchie, Richard Sproat, Kyle Gorman, Daan van Esch, Christian Schallhart, Nikos Bampounis, Benoît Brard, Jonas Fromseier Mortensen, Millie Holt, Eoin Mahon
Better Morphology Prediction for Better Speech Systems
Dravyansh Sharma, Melissa Wilson, Antoine Bruguier
Vietnamese Learners Tackling the German /ʃt/ in Perception
Anke Sennema, Silke Hamann
An Articulatory-Acoustic Investigation into GOOSE-Fronting in German-English Bilinguals Residing in London, UK
Scott Lewis, Adib Mehrabi, Esther de Leeuw
Multimodal Articulation-Based Pronunciation Error Detection with Spectrogram and Acoustic Features
Sabrina Jenne, Ngoc Thang Vu
Using Prosody to Discover Word Order Alternations in a Novel Language
Anouschka Foltz, Sarah Cooper, Tamsin M. McKelvey
Speaking Rate, Information Density, and Information Rate in First-Language and Second-Language Speech
Ann R. Bradlow
Articulation Rate as a Metric in Spoken Language Assessment
Calbert Graham, Francis Nolan
Learning Alignment for Multimodal Emotion Recognition from Speech
Haiyang Xu, Hui Zhang, Kun Han, Yun Wang, Yiping Peng, Xiangang Li
Liquid Deletion in French Child-Directed Speech
Sharon Peperkamp, Monica Hegde, Maria Julia Carbajal
Towards Detection of Canonical Babbling by Citizen Scientists: Performance as a Function of Clip Length
Amanda Seidl, Anne S. Warlaumont, Alejandrina Cristia
Nasal Consonant Discrimination in Infant- and Adult-Directed Speech
Bogdan Ludusan, Annett Jorschick, Reiko Mazuka
No Distributional Learning in Adults from Attended Listening to Non-Speech
Ellen Marklund, Johan Sjons, Lisa Gustavsson, Elísabet Eir Cortes
A Computational Model of Early Language Acquisition from Audiovisual Experiences of Young Infants
Okko Räsänen, Khazar Khorrami
The Production of Chinese Affricates /ts/ and /tsh/ by Native Urdu Speakers
Dan Du, Jinsong Zhang
Multi-Stream Network with Temporal Attention for Environmental Sound Classification
Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff
Neural Network Distillation on IoT Platforms for Sound Event Detection
Gianmarco Cerutti, Rahul Prasad, Alessio Brutti, Elisabetta Farella
Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai
A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models
Xue Bai, Jun Du, Zi-Rui Wang, Chin-Hui Lee
Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection
Ke-Xin He, Yu-Han Shen, Wei-Qiang Zhang
Sound Event Detection in Multichannel Audio Using Convolutional Time-Frequency-Channel Squeeze and Excitation
Wei Xia, Kazuhito Koishida
A Robust Framework for Acoustic Scene Classification
Lam Pham, Ian McLoughlin, Huy Phan, Ramaswamy Palaniappan
Compression of Acoustic Event Detection Models with Quantized Distillation
Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
An End-to-End Audio Classification System Based on Raw Waveforms and Mix-Training Strategy
Jiaxu Chen, Jing Hao, Kai Chen, Di Xie, Shicai Yang, Shiliang Pu
Few-Shot Audio Classification with Attentional Graph Neural Networks
Shilei Zhang, Yong Qin, Kewei Sun, Yonghua Lin
Semi-Supervised Audio Classification with Consistency-Based Regularization
Kangkang Lu, Chuan-Sheng Foo, Kah Kuan Teh, Huy Dat Tran, Vijay Ramaseshan Chandrasekhar
Avaya Conversational Intelligence: A Real-Time System for Spoken Language Understanding in Human-Human Call Center Conversations
Jan Mizgajski, Adrian Szymczak, Robert Głowski, Piotr Szymański, Piotr Żelasko, Łukasz Augustyniak, Mikołaj Morzy, Yishay Carmiel, Jeff Hodson, Łukasz Wójciak, Daniel Smoczyk, Adam Wróbel, Bartosz Borowik, Adam Artajew, Marcin Baran, Cezary Kwiatkowski, Marzena Żyła-Hoppe
Robust Keyword Spotting via Recycle-Pooling for Mobile Game
Shounan An, Youngsoo Kim, Hu Xu, Jinwoo Lee, Myungwoo Lee, Insoo Oh
Multimodal Dialog with the MALACH Audiovisual Archive
Adam Chýlek, Luboš Šmídl, Jan Švec
SpeechMarker: A Voice Based Multi-Level Attendance Application
Sarfaraz Jelil, Abhishek Shrivastava, Rohan Kumar Das, S.R. Mahadeva Prasanna, Rohit Sinha
Robust Sound Recognition: A Neuromorphic Approach
Jibin Wu, Zihan Pan, Malu Zhang, Rohan Kumar Das, Yansong Chua, Haizhou Li
The CUHK Dysarthric Speech Recognition Systems for English and Cantonese
Shoukang Hu, Shansong Liu, Heng Fai Chang, Mengzhe Geng, Jiani Chen, Lau Wing Chung, To Ka Hei, Jianwei Yu, Ka Ho Wong, Xunying Liu, Helen Meng
BAS Web Services for Automatic Subtitle Creation and Anonymization
Florian Schiel, Thomas Kisler
A User-Friendly and Adaptable Re-Implementation of an Acoustic Prominence Detection and Annotation Tool
Jana Voße, Petra Wagner
PyToBI: A Toolkit for ToBI Labeling Under Python
Mónica Domínguez, Patrick Louis Rohrer, Juan Soler-Company
GECKO — A Tool for Effective Annotation of Human Conversations
Golan Levy, Raquel Sitman, Ido Amir, Eduard Golshtein, Ran Mochary, Eilon Reshef, Roi Reichart, Omri Allouche
SLP-AA: Tools for Sign Language Phonetic and Phonological Research
Roger Yu-Hsiang Lo, Kathleen Currie Hall
SANTLR: Speech Annotation Toolkit for Low Resource Languages
Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. Black, Florian Metze
Web-Based Speech Synthesis Editor
Martin Grůber, Jakub Vít, Jindřich Matoušek
GFM-Voc: A Real-Time Voice Quality Modification System
Olivier Perrotin, Ian McLoughlin
Off the Cuff: Exploring Extemporaneous Speech Delivery with TTS
Éva Székely, Gustav Eje Henter, Jonas Beskow, Joakim Gustafson
Synthesized Spoken Names: Biases Impacting Perception
Lucas Kessler, Cecilia Ovesdotter Alm, Reynold Bailey
Unbabel Talk — Human Verified Translations for Voice Instant Messaging
Luís Bernardo, Mathieu Giquel, Sebastião Quintas, Paulo Dimas, Helena Moniz, Isabel Trancoso
Adjusting Pleasure-Arousal-Dominance for Continuous Emotional Text-to-Speech Synthesizer
Azam Rabiee, Tae-Ho Kim, Soo-Young Lee
The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Understanding
Andreas Nautsch, Catherine Jasserand, Els Kindt, Massimiliano Todisco, Isabel Trancoso, Nicholas Evans
Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?
Brij Mohan Lal Srivastava, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent
Privacy-Preserving Siamese Feature Extraction for Gender Recognition versus Speaker Identification
Alexandru Nelus, Silas Rech, Timm Koppelmann, Henrik Biermann, Rainer Martin
Privacy-Preserving Variational Information Feature Extraction for Domestic Activity Monitoring versus Speaker Identification
Alexandru Nelus, Janek Ebbers, Reinhold Haeb-Umbach, Rainer Martin
Extracting Mel-Frequency and Bark-Frequency Cepstral Coefficients from Encrypted Signals
Patricia Thaine, Gerald Penn
Sound Privacy: A Conversational Speech Corpus for Quantifying the Experience of Privacy
Pablo Pérez Zarazaga, Sneha Das, Tom Bäckström, V. V. Vidyadhara Raju, Anil Kumar Vuppala
Improving Code-Switched Language Modeling Performance Using Cognate Features
Victor Soto, Julia Hirschberg
Linguistically Motivated Parallel Data Augmentation for Code-Switch Language Modeling
Grandee Lee, Xianghu Yue, Haizhou Li
Variational Attention Using Articulatory Priors for Generating Code Mixed Speech Using Monolingual Corpora
SaiKrishna Rallabandi, Alan W. Black
Code-Switching Detection Using ASR-Generated Language Posteriors
Qinyi Wang, Emre Yılmaz, Adem Derinel, Haizhou Li
Semi-Supervised Acoustic Model Training for Five-Lingual Code-Switched ASR
Astik Biswas, Emre Yılmaz, Febe de Wet, Ewald van der Westhuizen, Thomas Niesler
Multi-Graph Decoding for Code-Switching ASR
Emre Yılmaz, Samuel Cohen, Xianghu Yue, David A. van Leeuwen, Haizhou Li
End-to-End Multilingual Multi-Speaker Speech Recognition
Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey
Survey Talk: Realistic Physics-Based Computational Voice Production
Oriol Guasch
An Extended Two-Dimensional Vocal Tract Model for Fast Acoustic Simulation of Single-Axis Symmetric Three-Dimensional Tubes
Debasish Ray Mohapatra, Victor Zappi, Sidney Fels
Perceptual Optimization of an Enhanced Geometric Vocal Fold Model for Articulatory Speech Synthesis
Peter Birkholz, Susanne Drechsel, Simon Stone
Articulatory Copy Synthesis Based on a Genetic Algorithm
Yingming Gao, Simon Stone, Peter Birkholz
A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion
Abdolreza Sabzi Shahrebabaki, Negar Olfati, Ali Shariq Imran, Sabato Marco Siniscalchi, Torbjørn Svendsen
Advancing Sequence-to-Sequence Based Speech Recognition
Zoltán Tüske, Kartik Audhkhasi, George Saon
Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions
Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert
Semi-Supervised Sequence-to-Sequence ASR Using Unpaired Speech and Text
Murali Karthick Baskar, Shinji Watanabe, Ramon Astudillo, Takaaki Hori, Lukáš Burget, Jan Černocký
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen
Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR
Felix Weninger, Jesús Andrés-Ferrer, Xinwei Li, Puming Zhan
Lattice Re-Scoring During Manual Editing for Automatic Error Correction of ASR Transcripts
Anna V. Rúnarsdóttir, Inga R. Helgadóttir, Jón Guðnason
GPU-Based WFST Decoding with Extra Large Language Model
Daisuke Fukunaga, Yoshiki Tanaka, Yuichi Kageyama
Real-Time One-Pass Decoder for Speech Recognition Using LSTM Language Models
Javier Jorge, Adrià Giménez, Javier Iranzo-Sánchez, Jorge Civera, Albert Sanchis, Alfons Juan
Vectorized Beam Search for CTC-Attention-Based Speech Recognition
Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Niko Moritz, Jonathan Le Roux
Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition
Jack Serrino, Leonid Velikovich, Petar Aleksic, Cyril Allauzen
Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang
Spatio-Temporal Attention Pooling for Audio Scene Classification
Huy Phan, Oliver Y. Chén, Lam Pham, Philipp Koch, Maarten De Vos, Ian McLoughlin, Alfred Mertins
Subspace Pooling Based Temporal Features Extraction for Audio Event Recognition
Qiuying Shi, Hui Luo, Jiqing Han
Multi-Scale Time-Frequency Attention for Acoustic Event Detection
Jingyang Zhang, Wenhao Ding, Jintao Kang, Liang He
Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events
Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du
Parameter-Transfer Learning for Low-Resource Individualization of Head-Related Transfer Functions
Xiaoke Qi, Lu Wang
Prosodic Characteristics of Mandarin Declarative and Interrogative Utterances in Parkinson’s Disease
Lei Liu, Meng Jian, Wentao Gu
Study of the Performance of Automatic Speech Recognition Systems in Speakers with Parkinson’s Disease
Laureano Moro-Velazquez, JaeJin Cho, Shinji Watanabe, Mark A. Hasegawa-Johnson, Odette Scharenborg, Heejin Kim, Najim Dehak
Towards the Speech Features of Mild Cognitive Impairment: Universal Evidence from Structured and Unstructured Connected Speech of Chinese
Tianqi Wang, Chongyuan Lian, Jingshen Pan, Quanlei Yan, Feiqi Zhu, Manwa L. Ng, Lan Wang, Nan Yan
Child Speech Disorder Detection with Siamese Recurrent Network Using Speech Attribute Features
Jiarui Wang, Ying Qin, Zhiyuan Peng, Tan Lee
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech
Daniel Korzekwa, Roberto Barra-Chicote, Bozena Kostek, Thomas Drugman, Mateusz Lajszczak
Vocal Biomarker Assessment Following Pediatric Traumatic Brain Injury: A Retrospective Cohort Study
Camille Noufi, Adam C. Lammert, Daryush D. Mehta, James R. Williamson, Gregory Ciccarelli, Douglas Sturim, Jordan R. Green, Thomas F. Campbell, Thomas F. Quatieri
Survey Talk: Reaching Over the Gap: Cross- and Interdisciplinary Research on Human and Automatic Speech Processing
Odette Scharenborg
Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders
Atsunori Ogawa, Marc Delcroix, Shigeki Karita, Tomohiro Nakatani
Language Modeling with Deep Transformers
Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney
Scalable Multi Corpora Neural Language Models for ASR
Anirudh Raju, Denis Filimonov, Gautam Tiwari, Guitang Lan, Ariya Rastrow
Who Needs Words? Lexicon-Free Speech Recognition
Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
Direct Modelling of Speech Emotion from Raw Speech
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps
Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN
Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma, Najim Dehak
Pyramid Memory Block and Timestep Attention for Speech Emotion Recognition
Miao Cao, Chun Yang, Fang Zhou, Xu-cheng Yin
Robust Speech Emotion Recognition Under Different Encoding Conditions
Christopher Oates, Andreas Triantafyllopoulos, Ingmar Steiner, Björn W. Schuller
Using the Bag-of-Audio-Word Feature Representation of ASR DNN Posteriors for Paralinguistic Classification
Gábor Gosztolya
Disentangling Style Factors from Speaker Representations
Jennifer Williams, Simon King
Sentence Prosody and Wh-Indeterminates in Taiwan Mandarin
Yu-Yin Hsu, Anqi Xu
Frication as a Vowel Feature? — Evidence from the Rui’an Wu Chinese Dialect
Fang Hu, Youjue He
Vowels and Diphthongs in the Xupu Xiang Chinese Dialect
Zhenrui Zhang, Fang Hu
Age-Related Changes in European Portuguese Vowel Acoustics
Luciana Albuquerque, Catarina Oliveira, António Teixeira, Pedro Sa-Couto, Daniela Figueiredo
Vowel-Tone Interaction in Two Tibeto-Burman Languages
Wendy Lalhminghlui, Viyazonuo Terhiija, Priyankoo Sarmah
The Vowel System of Korebaju
Jenifer Vega Rodríguez
Fundamental Frequency Accommodation in Multi-Party Human-Robot Game Interactions: The Effect of Winning or Losing
Omnia Ibrahim, Gabriel Skantze, Sabine Stoll, Volker Dellwo
Pitch Accent Trajectories Across Different Conditions of Visibility and Information Structure — Evidence from Spontaneous Dyadic Interaction
Petra Wagner, Nataliya Bryhadyr, Marin Schröer
The Greennn Tree — Lengthening Position Influences Uncertainty Perception
Simon Betz, Sina Zarrieß, Éva Székely, Petra Wagner
CNN-BLSTM Based Question Detection from Dialogs Considering Phase and Context Information
Yuke Si, Longbiao Wang, Jianwu Dang, Mengfei Wu, Aijun Li
Mirroring to Build Trust in Digital Assistants
Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff
Three’s a Crowd? Effects of a Second Human on Vocal Accommodation with a Voice Assistant
Eran Raveh, Ingo Siegert, Ingmar Steiner, Iona Gessinger, Bernd Möbius
Adversarial Regularization for End-to-End Robust Speaker Verification
Qing Wang, Pengcheng Guo, Sining Sun, Lei Xie, John H.L. Hansen
Combining Speaker Recognition and Metric Learning for Speaker-Dependent Representation Learning
João Monteiro, Jahangir Alam, Tiago H. Falk
VAE-Based Regularization for Deep Speaker Embedding
Yang Zhang, Lantian Li, Dong Wang
Language Recognition Using Triplet Neural Networks
Victoria Mingote, Diego Castan, Mitchell McLaren, Mahesh Kumar Nandwana, Alfonso Ortega, Eduardo Lleida, Antonio Miguel
Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification
Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim
End-to-End Losses Based on Speaker Basis Vectors and All-Speaker Hard Negative Mining for Speaker Verification
Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu
An Effective Deep Embedding Learning Architecture for Speaker Verification
Yiheng Jiang, Yan Song, Ian McLoughlin, Zhifu Gao, Li-Rong Dai
Far-Field End-to-End Text-Dependent Speaker Verification Based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation
Xiaoyi Qin, Danwei Cai, Ming Li
Two-Stage Training for Chinese Dialect Recognition
Zongze Ren, Guofu Yang, Shugong Xu
Investigation on Blind Bandwidth Extension with a Non-Linear Function and its Evaluation of x-Vector-Based Speaker Verification
Ryota Kaminishi, Haruna Miyamoto, Sayaka Shiota, Hitoshi Kiya
Auto-Encoding Nearest Neighbor i-Vectors for Speaker Verification
Umair Khan, Miquel India, Javier Hernando
Towards a Fault-Tolerant Speaker Verification System: A Regularization Approach to Reduce the Condition Number
Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei
Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments
Hassan Taherian, Zhong-Qiu Wang, DeLiang Wang
Joint Optimization of Neural Acoustic Beamforming and Dereverberation with x-Vectors for Robust Speaker Verification
Joon-Young Yang, Joon-Hyuk Chang
A New Time-Frequency Attention Mechanism for TDNN and CNN-LSTM-TDNN, with Application to Language Identification
Xiaoxiao Miao, Ian McLoughlin, Yonghong Yan
An Attention-Based Hybrid Network for Automatic Detection of Alzheimer’s Disease from Narrative Speech
Jun Chen, Ji Zhu, Jieping Ye
Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition
Pingchuan Ma, Stavros Petridis, Maja Pantic
“Computer, Test My Hearing”: Accurate Speech Audiometry with Smart Speakers
Jasper Ooster, Pia Nancy Porysek Moreta, Jörg-Hendrik Bach, Inga Holube, Bernd T. Meyer
Synchronising Audio and Ultrasound by Learning Cross-Modal Embeddings
Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals
Automatic Hierarchical Attention Neural Network for Detecting AD
Yilin Pan, Bahman Mirheidari, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen
Deep Sensing of Breathing Signal During Conversational Speech
Venkata Srikanth Nallanthighal, Aki Härmä, Helmer Strik
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
Fadi Biadsy, Ron J. Weiss, Pedro J. Moreno, Dimitri Kanvesky, Ye Jia
Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition
Shansong Liu, Shoukang Hu, Yi Wang, Jianwei Yu, Rongfeng Su, Xunying Liu, Helen Meng
Video-Driven Speech Reconstruction Using Generative Adversarial Networks
Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic
On the Use of Pitch Features for Disordered Speech Recognition
Shansong Liu, Shoukang Hu, Xunying Liu, Helen Meng
Large-Scale Visual Speech Recognition
Brendan Shillingford, Yannis Assael, Matthew W. Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, Hasim Sak, Kanishka Rao, Lorrayne Bennett, Marie Mulville, Misha Denil, Ben Coppin, Ben Laurie, Andrew Senior, Nando de Freitas
Investigating Linguistic and Semantic Features for Turn-Taking Prediction in Open-Domain Human-Computer Conversation
S. Zahra Razavi, Benjamin Kane, Lenhart K. Schubert
Benchmarking Benchmarks: Introducing New Automatic Indicators for Benchmarking Spoken Language Understanding Corpora
Frédéric Béchet, Christian Raymond
A Neural Turn-Taking Model without RNN
Chaoran Liu, Carlos Ishi, Hiroshi Ishiguro
An Incremental Turn-Taking Model for Task-Oriented Dialog Systems
Andrei C. Coman, Koichiro Yoshino, Yukitoshi Murase, Satoshi Nakamura, Giuseppe Riccardi
Personalized Dialogue Response Generation Learned from Monologues
Feng-Guang Su, Aliyah R. Hsu, Yi-Lin Tuan, Hung-Yi Lee
Voice Quality as a Turn-Taking Cue
Mattias Heldner, Marcin Włodarczak, Štefan Beňuš, Agustín Gravano
Turn-Taking Prediction Based on Detection of Transition Relevance Place
Kohei Hara, Koji Inoue, Katsuya Takanashi, Tatsuya Kawahara
Analysis of Effect and Timing of Fillers in Natural Turn-Taking
Divesh Lala, Shizuka Nakamura, Tatsuya Kawahara
Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation
Shota Horiguchi, Naoyuki Kanda, Kenji Nagamatsu
Follow-Up Question Generation Using Neural Tensor Network-Based Domain Ontology Population in an Interview Coaching System
Ming-Hsiang Su, Chung-Hsien Wu, Yi Chang
On the Role of Style in Parsing Speech with Neural Models
Trang Tran, Jiahong Yuan, Yang Liu, Mari Ostendorf
On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval
Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu
Automatic Detection of Off-Topic Spoken Responses Using Very Deep Convolutional Neural Networks
Xinhao Wang, Su-Youn Yoon, Keelan Evanini, Klaus Zechner, Yao Qian
Rescoring Keyword Search Confidence Estimates with Graph-Based Re-Ranking Using Acoustic Word Embeddings
Anna Piunova, Eugen Beck, Ralf Schlüter, Hermann Ney
SpeechYOLO: Detection and Localization of Speech Objects
Yael Segal, Tzeviya Sylvia Fuchs, Joseph Keshet
Prosodic Phrase Alignment for Machine Dubbing
Alp Öktem, Mireia Farrús, Antonio Bonafonte
Spot the Pleasant People! Navigating the Cocktail Party Buzz
Christina Tånnander, Per Fallgren, Jens Edlund, Joakim Gusafsson
Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels
Zhi Chen, Wu Guo, Li-Rong Dai, Zhen-Hua Ling, Jun Du
Noisy BiLSTM-Based Models for Disfluency Detection
Nguyen Bach, Fei Huang
Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search
Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo
Simultaneous Detection and Localization of a Wake-Up Word Using Multi-Task Learning of the Duration and Endpoint
Takashi Maekaku, Yusuke Kida, Akihiko Sugiyama
On Mitigating Acoustic Feedback in Hearing Aids with Frequency Warping by All-Pass Networks
Ching-Hua Lee, Kuan-Lin Chen, Fred Harris, Bhaskar D. Rao, Harinath Garudadri
Deep Multitask Acoustic Echo Cancellation
Amin Fazel, Mostafa El-Khamy, Jungwon Lee
Deep Learning for Joint Acoustic Echo and Noise Cancellation with Nonlinear Distortions
Hao Zhang, Ke Tan, DeLiang Wang
Harmonic Beamformers for Non-Intrusive Speech Intelligibility Prediction
Charlotte Sørensen, Jesper B. Boldt, Mads G. Christensen
Convolutional Neural Network-Based Speech Enhancement for Cochlear Implant Recipients
Nursadul Mamun, Soheil Khorram, John H.L. Hansen
Validation of the Non-Intrusive Codebook-Based Short Time Objective Intelligibility Metric for Processed Speech
Charlotte Sørensen, Jesper B. Boldt, Mads G. Christensen
Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-Based ASR System
Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Katsuhiko Yamamoto, Toshio Irino
A Novel Method to Correct Steering Vectors in MVDR Beamformer for Noise Robust ASR
Suliang Bu, Yunxin Zhao, Mei-Yuh Hwang
End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform
Hyeonseung Lee, Hyung Yong Kim, Woo Hyun Kang, Jeunghun Kim, Nam Soo Kim
Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information
Rongzhi Gu, Lianwu Chen, Shi-Xiong Zhang, Jimeng Zheng, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
End-to-End Neural Speaker Diarization with Permutation-Free Objectives
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe
Self Multi-Head Attention for Speaker Recognition
Miquel India, Pooyan Safari, Javier Hernando
Phonetically-Aware Embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation
Ignacio Viñals, Dayana Ribas, Victoria Mingote, Jorge Llombart, Pablo Gimeno, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
Variational Domain Adversarial Learning for Speaker Verification
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien
A Unified Framework for Speaker and Utterance Verification
Tianchi Liu, Maulik Madhavi, Rohan Kumar Das, Haizhou Li
Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems
Mahesh Kumar Nandwana, Luciana Ferrer, Mitchell McLaren, Diego Castan, Aaron Lawson
Factorization of Discriminatively Trained i-Vector Extractor for Speaker Recognition
Ondřej Novotný, Oldřich Plchot, Ondřej Glembek, Lukáš Burget
End-to-End Speaker Identification in Noisy and Reverberant Environments Using Raw Waveform Convolutional Neural Networks
Daniele Salvati, Carlo Drioli, Gian Luca Foresti
Whisper to Neutral Mapping Using Cosine Similarity Maximization in i-Vector Space for Speaker Verification
Abinay Reddy Naini, Achuth Rao M.V., Prasanta Kumar Ghosh
Mixup Learning Strategies for Text-Independent Speaker Verification
Yingke Zhu, Tom Ko, Brian Mak
Optimizing a Speaker Embedding Extractor Through Backend-Driven Regularization
Luciana Ferrer, Mitchell McLaren
The NEC-TT 2018 Speaker Verification System
Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda
Autoencoder-Based Semi-Supervised Curriculum Learning for Out-of-Domain Speaker Verification
Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei
Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment
Danwei Cai, Xiaoyi Qin, Ming Li
The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation
Danwei Cai, Weicheng Cai, Ming Li
Pretraining by Backtranslation for End-to-End ASR in Low-Resource Settings
Matthew Wiesner, Adithya Renduchintala, Shinji Watanabe, Chunxi Liu, Najim Dehak, Sanjeev Khudanpur
Cross-Attention End-to-End ASR for Two-Party Conversations
Suyoun Kim, Siddharth Dalmia, Florian Metze
Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees
Jan Chorowski, Adrian Łańcucki, Bartosz Kostka, Michał Zapotoczny
An Online Attention-Based Model for Speech Recognition
Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu
Self-Attention Transducers for End-to-End Speech Recognition
Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengqi Wen
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation
Sheng Li, Dabre Raj, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
Extending an Acoustic Data-Driven Phone Set for Spontaneous Speech Recognition
Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon
Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition
Takafumi Moriya, Jian Wang, Tomohiro Tanaka, Ryo Masumura, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono
Real to H-Space Encoder for Speech Recognition
Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori
Ectc-Docd: An End-to-End Structure with CTC Encoder and OCD Decoder for Speech Recognition
Cheng Yi, Feng Wang, Bo Xu
End-to-End Multi-Speaker Speech Recognition Using Speaker Embeddings and Transfer Learning
Pavel Denisov, Ngoc Thang Vu
Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Shubham Toshniwal, Karen Livescu
Spontaneous Conversational Speech Synthesis from Found Data
Éva Székely, Gustav Eje Henter, Jonas Beskow, Joakim Gustafson
Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech
Viacheslav Klimkov, Srikanth Ronanki, Jonas Rohnke, Thomas Drugman
Speech Driven Backchannel Generation Using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction
Nusrah Hussain, Engin Erzin, T. Metin Sezgin, Yücel Yemez
Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model
Tomoki Koriyama, Takao Kobayashi
Bootstrapping a Text Normalization System for an Inflected Language. Numbers as a Test Case
Anna Björk Nikulásdóttir, Jón Guðnason
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS
Haohan Guo, Frank K. Soong, Lei He, Lei Xie
Duration Modeling with Global Phoneme-Duration Vectors
Jinfu Ni, Yoshinori Shiga, Hisashi Kawai
Improving Speech Synthesis with Discourse Relations
Adèle Aubin, Alessandra Cervone, Oliver Watts, Simon King
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis Through Audio Analysis
Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, Thierry Dutoit
Pre-Trained Text Representations for Improving Front-End Text Processing in Mandarin Text-to-Speech Synthesis
Bing Yang, Jiaqi Zhong, Shan Liu
A Mandarin Prosodic Boundary Prediction Model Based on Multi-Task Learning
Huashan Pan, Xiulin Li, Zhiqiang Huang
Dual Encoder Classifier Models as Constraints in Neural Text Normalization
Ajda Gokcen, Hao Zhang, Richard Sproat
Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis
Jingbei Li, Zhiyong Wu, Runnan Li, Pengpeng Zhi, Song Yang, Helen Meng
Automated Emotion Morphing in Speech Based on Diffeomorphic Curve Registration and Highway Networks
Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman
Use of Beiwe Smartphone App to Identify and Track Speech Decline in Amyotrophic Lateral Sclerosis (ALS)
Kathryn P. Connaghan, Jordan R. Green, Sabrina Paganoni, James Chan, Harli Weber, Ella Collins, Brian Richburg, Marziye Eshghi, J.P. Onnela, James D. Berry
Profiling Speech Motor Impairments in Persons with Amyotrophic Lateral Sclerosis: An Acoustic-Based Approach
Hannah P. Rowe, Jordan R. Green
Diagnosing Dysarthria with Long Short-Term Memory Networks
Alex Mayle, Zhiwei Mou, Razvan Bunescu, Sadegh Mirshekarian, Li Xu, Chang Liu
Modification of Devoicing Error in Cleft Lip and Palate Speech
Protima Nomo Sudro, S.R. Mahadeva Prasanna
Reduced Task Adaptation in Alternating Motion Rate Tasks as an Early Marker of Bulbar Involvement in Amyotrophic Lateral Sclerosis
Marziye Eshghi, Panying Rong, Antje S. Mefferd, Kaila L. Stipancic, Yana Yunusova, Jordan R. Green
Towards the Speech Features of Early-Stage Dementia: Design and Application of the Mandarin Elderly Cognitive Speech Database
Tianqi Wang, Quanlei Yan, Jingshen Pan, Feiqi Zhu, Rongfeng Su, Yi Guo, Lan Wang, Nan Yan
Acoustic Characteristics of Lexical Tone Disruption in Mandarin Speakers After Brain Damage
Wenjun Chen, Jeroen van de Weijer, Shuangshuang Zhu, Qian Qian, Manna Wang
Intragestural Variation in Natural Sentence Production: Essential Tremor Patients Treated with DBS
Anne Hermes, Doris Mücke, Tabea Thies, Michael T. Barbe
Nasal Air Emission in Sibilant Fricatives of Cleft Lip and Palate Speech
Sishir Kalita, Protima Nomo Sudro, S.R. Mahadeva Prasanna, S. Dandapat
Parallel vs. Non-Parallel Voice Conversion for Esophageal Speech
Luis Serrano, Sneha Raman, David Tavarez, Eva Navas, Inma Hernaez
Hypernasality Severity Detection Using Constant Q Cepstral Coefficients
Akhilesh Kumar Dubey, S.R. Mahadeva Prasanna, S. Dandapat
Automatic Depression Level Detection via ℓp-Norm Pooling
Mingyue Niu, Jianhua Tao, Bin Liu, Cunhang Fan
Comparison of Speech Tasks and Recording Devices for Voice Based Automatic Classification of Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis
Suhas B.N., Deep Patel, Nithin Rao, Yamini Belur, Pradeep Reddy, Nalini Atchayaram, Ravi Yadav, Dipanjan Gope, Prasanta Kumar Ghosh
A Modified Algorithm for Multiple Input Spectrogram Inversion
Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda
A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation
Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu
Evaluating Audiovisual Source Separation in the Context of Video Conferencing
Berkay İnan, Milos Cernak, Helmut Grabner, Helena Peic Tukuljac, Rodrigo C.G. Pena, Benjamin Ricaud
Influence of Speaker-Specific Parameters on Speech Separation Systems
David Ditter, Timo Gerkmann
CNN-LSTM Models for Multi-Speaker Source Separation Using Bayesian Hyper Parameter Optimization
Jeroen Zegers, Hugo Van hamme
Towards Joint Sound Scene and Polyphonic Sound Event Recognition
Helen L. Bear, Inês Nolasco, Emmanouil Benetos
Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features
Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen
Probabilistic Permutation Invariant Training for Speech Separation
Midia Yousefi, Soheil Khorram, John H.L. Hansen
Which Ones Are Speaking? Speaker-Inferred Model for Multi-Talker Speech Separation
Jing Shi, Jiaming Xu, Bo Xu
End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa, Shouji Harada, Jiqing Han
End-to-End Music Source Separation: Is it Possible in the Waveform Domain?
Francesc Lluís, Jordi Pons, Xavier Serra
Elpis, an Accessible Speech-to-Text Tool
Ben Foley, Alina Rakhi, Nicholas Lambourne, Nicholas Buckeridge, Janet Wiles
Framework for Conducting Tasks Requiring Human Assessment
Martin Grůber, Adam Chýlek, Jindřich Matoušek
Multimedia Simultaneous Translation System for Minority Language Communication with Mandarin
Shen Huang, Bojie Hu, Shan Huang, Pengfei Hu, Jian Kang, Zhiqiang Lv, Jinghao Yan, Qi Ju, Shiyin Kang, Deyi Tuo, Guangzhi Li, Nurmemet Yolwas
The SAIL LABS Media Mining Indexer and the CAVA Framework
Erinc Dikici, Gerhard Backfried, Jürgen Riedler
CaptionAI: A Real-Time Multilingual Captioning Application
Nagendra Kumar Goel, Mousmita Sarma, Saikiran Valluri, Dharmeshkumar Agrawal, Steve Braich, Tejendra Singh Kuswah, Zikra Iqbal, Surbhi Chauhan, Raj Karbar
Article |
---|