ISCA Archive Interspeech 2019 Sessions Search Website Booklet
  ISCA Archive Sessions Search Website Booklet
×

Click on column names to sort.

Searching uses the 'and' of terms e.g. Smith Interspeech matches all papers by Smith in any Interspeech. The order of terms is not significant.

Use double quotes for exact phrasal matches e.g. "acoustic features".

Case is ignored.

Diacritics are optional e.g. lefevre also matches lefèvre (but not vice versa).

It can be useful to turn off spell-checking for the search box in your browser preferences.

If you prefer to scroll rather than page, increase the number in the show entries dropdown.

top

Interspeech 2019

Graz, Austria
15-19 September 2019

Chairs: Gernot Kubin and Zdravko Kačič
doi: 10.21437/Interspeech.2019













Speaker Recognition and Diarization


Bayesian HMM Based x-Vector Clustering for Speaker Diarization
Mireia Diez, Lukáš Burget, Shuai Wang, Johan Rohdin, Jan Černocký

Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration
Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka

MCE 2018: The 1st Multi-Target Speaker Detection and Identification Challenge Evaluation
Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass

Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System
Zhifu Gao, Yan Song, Ian McLoughlin, Pengcheng Li, Yiheng Jiang, Li-Rong Dai

LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization
Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras

Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings
Joon Son Chung, Bong-Jin Lee, Icksang Han

Multi-PLDA Diarization on Children’s Speech
Jiamin Xie, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur

Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings
Alan McCree, Gregory Sell, Daniel Garcia-Romero

Speaker-Corrupted Embeddings for Online Speaker Diarization
Omid Ghahabi, Volker Fischer

Speaker Diarization with Lexical Information
Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

Joint Speech Recognition and Speaker Diarization via Sequence Transduction
Laurent El Shafey, Hagen Soltau, Izhak Shafran

Normal Variance-Mean Mixtures for Unsupervised Score Calibration
Sandro Cumani

Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding
Hitoshi Yamamoto, Kong Aik Lee, Koji Okabe, Takafumi Koshinaka

Large-Scale Speaker Diarization of Radio Broadcast Archives
Emre Yılmaz, Adem Derinel, Kun Zhou, Henk van den Heuvel, Niko Brummer, Haizhou Li, David A. van Leeuwen

Toeplitz Inverse Covariance Based Robust Speaker Clustering for Naturalistic Audio Streams
Harishchandra Dubey, Abhijeet Sangwan, John H.L. Hansen


ASR for Noisy and Far-Field Speech


Examining the Combination of Multi-Band Processing and Channel Dropout for Robust Speech Recognition
György Kovács, László Tóth, Dirk Van Compernolle, Marcus Liwicki

Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
Meet Soni, Ashish Panda

Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning
Long Wu, Hangting Chen, Li Wang, Pengyuan Zhang, Yonghong Yan

Full-Sentence Correlation: A Method to Handle Unpredictable Noise for Robust Speech Recognition
Ji Ming, Danny Crookes

Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions
Meet Soni, Sonal Joshi, Ashish Panda

Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition
Shashi Kumar, Shakti P. Rath

End-to-End SpeakerBeam for Single Channel Target Speech Recognition
Marc Delcroix, Shinji Watanabe, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani

NIESR: Nuisance Invariant End-to-End Speech Recognition
I-Hung Hsu, Ayush Jaiswal, Premkumar Natarajan

Knowledge Distillation for Throat Microphone Speech Recognition
Takahito Suzuki, Jun Ogata, Takashi Tsunakawa, Masafumi Nishida, Masafumi Nishimura

Improved Speaker-Dependent Separation for CHiME-5 Challenge
Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu

Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling
Peidong Wang, Ke Tan, DeLiang Wang

Enhanced Spectral Features for Distortion-Independent Acoustic Modeling
Peidong Wang, DeLiang Wang

Universal Adversarial Perturbations for Speech Recognition Systems
Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar

One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features
Masakiyo Fujimoto, Hisashi Kawai

Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition
Bin Liu, Shuai Nie, Shan Liang, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li


Social Signals Detection and Speaker Traits Analysis


Predicting Humor by Learning from Time-Aligned Comments
Zixiaofan Yang, Bingyan Hu, Julia Hirschberg

Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information
Yoan Dinkov, Ahmed Ali, Ivan Koychev, Preslav Nakov

Mitigating Gender and L1 Differences to Improve State and Trait Recognition
Guozhen An, Rivka Levitan

Deep Learning Based Mandarin Accent Identification for Accent Robust ASR
Felix Weninger, Yang Sun, Junho Park, Daniel Willett, Puming Zhan

Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data
Gábor Gosztolya, László Tóth

Conversational and Social Laughter Synthesis with WaveNet
Hiroki Mori, Tomohiro Nagata, Yoshiko Arimoto

Laughter Dynamics in Dyadic Conversations
Bogdan Ludusan, Petra Wagner

Towards an Annotation Scheme for Complex Laughter in Speech Corpora
Khiet P. Truong, Jürgen Trouvain, Michel-Pierre Jansen

Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test
Alice Baird, Shahin Amiriparian, Nicholas Cummins, Sarah Sturmbauer, Johanna Janson, Eva-Maria Messner, Harald Baumeister, Nicolas Rohleder, Björn W. Schuller

Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results
Alice Baird, Eduardo Coutinho, Julia Hirschberg, Björn W. Schuller

Do not Hesitate! — Unless You Do it Shortly or Nasally: How the Phonetics of Filled Pauses Determine Their Subjective Frequency and Perceived Speaker Performance
Oliver Niebuhr, Kerstin Fischer

Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech
J.C. Vásquez-Correa, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth


Applications of Language Technologies


Code-Switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation
Ching-Ting Chang, Shun-Po Chuang, Hung-Yi Lee

Comparative Analysis of Think-Aloud Methods for Everyday Activities in the Context of Cognitive Robotics
Moritz Meier, Celeste Mason, Felix Putze, Tanja Schultz

RadioTalk: A Large-Scale Corpus of Talk Radio Transcripts
Doug Beeferman, William Brannon, Deb Roy

Qualitative Evaluation of ASR Adaptation in a Lecture Context: Application to the PASTEL Corpus
Salima Mdhaffar, Yannick Estève, Nicolas Hernandez, Antoine Laurent, Richard Dufour, Solen Quiniou

Active Annotation: Bootstrapping Annotation Lexicon and Guidelines for Supervised NLU Learning
Federico Marinelli, Alessandra Cervone, Giuliano Tortoreto, Evgeny A. Stepanov, Giuseppe Di Fabbrizio, Giuseppe Riccardi

Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System
Gerardo Roa Dabike, Jon Barker

Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention
Qiang Huang, Thomas Hain

EpaDB: A Database for Development of Pronunciation Assessment Systems
Jazmín Vidal, Luciana Ferrer, Leonardo Brambilla

Automatic Compression of Subtitles with Neural Networks and its Effect on User Experience
Katrin Angerbauer, Heike Adel, Ngoc Thang Vu

Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering
Hongyin Luo, Mitra Mohtarami, James Glass, Karthik Krishnamurthy, Brigitte Richardson


Speech and Audio Characterization and Segmentation


Early Identification of Speech Changes Due to Amyotrophic Lateral Sclerosis Using Machine Classification
Sarah E. Gutz, Jun Wang, Yana Yunusova, Jordan R. Green

Automatic Detection of Breath Using Voice Activity Detection and SVM Classifier with Application on News Reports
Mohamed Ismail Yasar Arafath K., Aurobinda Routray

Acoustic Scene Classification Using Teacher-Student Learning with Soft-Labels
Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, Ha-Jin Yu

Rare Sound Event Detection Using Deep Learning and Data Augmentation
Yanping Chen, Hongxia Jin

A Combination of Model-Based and Feature-Based Strategy for Speech-to-Singing Alignment
Bidisha Sharma, Haizhou Li

Dr.VOT: Measuring Positive and Negative Voice Onset Time in the Wild
Yosi Shrem, Matthew Goldrick, Joseph Keshet

Effects of Base-Frequency and Spectral Envelope on Deep-Learning Speech Separation and Recognition Models
J. Hui, Y. Wei, S.T. Chen, R.H.Y. So

Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion
Nirmesh J. Shah, Hemant A. Patil

Weakly Supervised Syllable Segmentation by Vowel-Consonant Peak Classification
Ravi Shankar, Archana Venkataraman

An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs
Lukas Mateju, Petr Cerva, Jindrich Zdansky

Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks
Zhenyu Tang, John D. Kanu, Kevin Hogan, Dinesh Manocha


Neural Techniques for Voice Conversion and Waveform Generation


Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks
Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou

One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
Ju-chieh Chou, Hung-Yi Lee

One-Shot Voice Conversion with Global Speaker Embeddings
Hui Lu, Zhiyong Wu, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

Robustness of Statistical Voice Conversion Based on Direct Waveform Modification Against Background Sounds
Yusuke Kurita, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda

Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks
Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma

GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram
Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku

Probability Density Distillation with Generative Adversarial Networks for High-Quality Parallel Waveform Generation
Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

One-Shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams
Seyed Hamidreza Mohammadi, Taehwan Kim

Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion
Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams
Songxiang Liu, Yuewen Cao, Xixin Wu, Lifa Sun, Xunying Liu, Helen Meng

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech
Li-Wei Chen, Hung-Yi Lee, Yu Tsao

Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion
Shaojin Ding, Ricardo Gutierrez-Osuna

Semi-Supervised Voice Conversion with Amortized Variational Inference
Cory Stephenson, Gokce Keskin, Anil Thomas, Oguz H. Elibol


Model Adaptation for ASR


Exploiting Semi-Supervised Training Through a Dropout Regularization in End-to-End Speech Recognition
Subhadeep Dey, Petr Motlicek, Trung Bui, Franck Dernoncourt

Improved Vocal Tract Length Perturbation for a State-of-the-Art End-to-End Speech Recognition System
Chanwoo Kim, Minkyu Shin, Abhinav Garg, Dhananjaya Gowda

Multi-Accent Adaptation Based on Gate Mechanism
Han Zhu, Li Wang, Pengyuan Zhang, Yonghong Yan

Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition
Pengcheng Guo, Sining Sun, Lei Xie

Cumulative Adaptation for BLSTM Acoustic Models
Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney

Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features
Xurong Xie, Xunying Liu, Tan Lee, Lan Wang

End-to-End Adaptation with Backpropagation Through WFST for On-Device Speech Recognition System
Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura

Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks
Leda Sarı, Samuel Thomas, Mark A. Hasegawa-Johnson

An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models
Khe Chai Sim, Petr Zadrazil, Françoise Beaufays

A Multi-Accent Acoustic Model Using Mixture of Experts for Speech Recognition
Abhinav Jain, Vishwanath P. Singh, Shakti P. Rath

Personalizing ASR for Dysarthric and Accented Speech with Limited Data
Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, Avinatan Hassidim, Yossi Matias


Dialogue Speech Understanding


Mitigating Noisy Inputs for Question Answering
Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, Jordan Boyd-Graber

One-vs-All Models for Asynchronous Training: An Empirical Analysis
Rahul Gupta, Aman Alok, Shankar Ananthakrishnan

Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning
Gabriel Marzinotto, Géraldine Damnati, Frédéric Béchet

M2H-GAN: A GAN-Based Mapping from Machine to Human Transcripts for Speech Understanding
Titouan Parcollet, Mohamed Morchid, Xavier Bost, Georges Linarès

Ultra-Compact NLU: Neuronal Network Binarization as Regularization
Munir Georges, Krzysztof Czarnowski, Tobias Bocklet

Speech Model Pre-Training for End-to-End Spoken Language Understanding
Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

Spoken Language Intent Detection Using Confusion2Vec
Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis Georgiou

Investigating Adaptation and Transfer Learning for End-to-End Spoken Language Understanding from Speech
Natalia Tomashenko, Antoine Caubrière, Yannick Estève

Topic-Aware Dialogue Speech Recognition with Transfer Learning
Yuanfeng Song, Di Jiang, Xueyang Wu, Qian Xu, Raymond Chi-Wing Wong, Qiang Yang

Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models
Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hosana Kamiyama, Takanobu Oba, Satoshi Kobashikawa, Yushi Aono

Meta Learning for Hyperparameter Optimization in Dialogue System
Jen-Tzung Chien, Wei Xiang Lieow

Zero Shot Intent Classification Using Long-Short Term Memory Networks
Kyle Williams

A Comparison of Deep Learning Methods for Language Understanding
Mandy Korpusik, Zoe Liu, James Glass

Slot Filling with Weighted Multi-Encoders for Out-of-Domain Values
Yuka Kobayashi, Takami Yoshida, Kenji Iwata, Hiroshi Fujimura


Speech Production and Silent Interfaces


Multi-Corpus Acoustic-to-Articulatory Speech Inversion
Nadee Seneviratne, Ganesh Sivaraman, Carol Espy-Wilson

Towards a Speaker Independent Speech-BCI Using Speaker Adaptation
Debadatta Dash, Alan Wisler, Paul Ferrari, Jun Wang

Identifying Input Features for Development of Real-Time Translation of Neural Signals to Text
Janaki Sheth, Ariel Tankus, Michelle Tran, Lindy Comstock, Itzhak Fried, William Speier

Exploring Critical Articulator Identification from 50Hz RT-MRI Data of the Vocal Tract
Samuel Silva, António Teixeira, Conceição Cunha, Nuno Almeida, Arun A. Joseph, Jens Frahm

Towards a Method of Dynamic Vocal Tract Shapes Generation by Combining Static 3D and Dynamic 2D MRI Speech Data
Ioannis K. Douros, Anastasiia Tsukanova, Karyna Isaieva, Pierre-André Vuissoz, Yves Laprie

Temporal Coordination of Articulatory and Respiratory Events Prior to Speech Initiation
Oksana Rasskazova, Christine Mooshammer, Susanne Fuchs

Zooming in on Spatiotemporal V-to-C Coarticulation with Functional PCA
Michele Gubian, Manfred Pastätter, Marianne Pouplier

Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder
Tamás Gábor Csapó, Mohammed Salah Al-Radhi, Géza Németh, Gábor Gosztolya, Tamás Grósz, László Tóth, Alexandra Markó

Assessing Acoustic and Articulatory Dimensions of Speech Motor Adaptation with Random Forests
Eugen Klein, Jana Brunner, Phil Hoole

Speech Organ Contour Extraction Using Real-Time MRI and Machine Learning Method
Hironori Takemoto, Tsubasa Goto, Yuya Hagihara, Sayaka Hamanaka, Tatsuya Kitamura, Yukiko Nota, Kikuo Maekawa

CNN-Based Phoneme Classifier from Vocal Tract MRI Learns Embedding Consistent with Articulatory Topology
K.G. van Leeuwen, P. Bos, S. Trebeschi, M.J.A. van Alphen, L. Voskuilen, L.E. Smeele, F. van der Heijden, R.J.J.H. van Son

Strength and Structure: Coupling Tones with Oral Constriction Gestures
Doris Mücke, Anne Hermes, Sam Tilsen


Speech Signal Characterization 2


Salient Speech Representations Based on Cloned Networks
W. Bastiaan Kleijn, Felicia S.C. Lim, Michael Chinen, Jan Skoglund

ASR Inspired Syllable Stress Detection for Pronunciation Evaluation Without Using a Supervised Classifier and Syllable Level Features
Manoj Kumar Ramanathi, Chiranjeevi Yarra, Prasanta Kumar Ghosh

Acoustic and Articulatory Feature Based Speech Rate Estimation Using a Convolutional Dense Neural Network
Renuka Mannem, Jhansi Mallela, Aravind Illa, Prasanta Kumar Ghosh

Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics
Sebastian Springenberg, Egor Lakomkin, Cornelius Weber, Stefan Wermter

Unsupervised Low-Rank Representations for Speech Emotion Recognition
Georgios Paraskevopoulos, Efthymios Tzinis, Nikolaos Ellinas, Theodoros Giannakopoulos, Alexandros Potamianos

On the Suitability of the Riesz Spectro-Temporal Envelope for WaveNet Based Speech Synthesis
Jitendra Kumar Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula

Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition
Xinzhou Xu, Jun Deng, Nicholas Cummins, Zixing Zhang, Li Zhao, Björn W. Schuller

An Improved Goodness of Pronunciation (GoP) Measure for Pronunciation Evaluation with DNN-HMM System Considering HMM Transition Probabilities
Sweekar Sudhakara, Manoj Kumar Ramanathi, Chiranjeevi Yarra, Prasanta Kumar Ghosh

Low Resource Automatic Intonation Classification Using Gated Recurrent Unit (GRU) Networks Pre-Trained with Synthesized Pitch Patterns
Atreyee Saha, Chiranjeevi Yarra, Prasanta Kumar Ghosh






The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge — P


ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks
Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

Ensemble Models for Spoofing Detection in Automatic Speaker Verification
Bhusan Chettri, Daniel Stoller, Veronica Morfi, Marco A. Martínez Ramírez, Emmanouil Benetos, Bob L. Sturm

The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion
Weicheng Cai, Haiwei Wu, Danwei Cai, Ming Li

Robust Bayesian and Light Neural Networks for Voice Spoofing Detection
Radosław Białobrzeski, Michał Kośmider, Mateusz Matuszewski, Marcin Plata, Alexander Rakowski

STC Antispoofing Systems for the ASVspoof2019 Challenge
Galina Lavrentyeva, Sergey Novoselov, Andzhukaev Tseren, Marina Volkova, Artem Gorlanov, Alexandr Kozlov

The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge
Yexin Yang, Hongji Wang, Heinrich Dinkel, Zhengyang Chen, Shuai Wang, Yanmin Qian, Kai Yu

IIIT-H Spoofing Countermeasures for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2019
K.N.R.K. Raju Alluri, Anil Kumar Vuppala

Anti-Spoofing Speaker Verification System with Multi-Feature Integration and Multi-Task Learning
Rongjin Li, Miao Zhao, Zheng Li, Lin Li, Qingyang Hong

Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features
Jennifer Williams, Joanna Rownicka

Long Range Acoustic Features for Spoofed Speech Detection
Rohan Kumar Das, Jichen Yang, Haizhou Li

Transfer-Representation Learning for Detecting Spoofing Attacks with Converted and Synthesized Speech in Automatic Speaker Verification System
Su-Yu Chang, Kai-Cheng Wu, Chia-Ping Chen

A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection
Alejandro Gomez-Alanis, Antonio M. Peinado, Jose A. Gonzalez, Angel M. Gomez

Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge
Hossein Zeinali, Themos Stafylakis, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukáš Burget, Jan Černocký

Deep Residual Neural Networks for Audio Spoofing Detection
Moustafa Alzantot, Ziqi Wang, Mani B. Srivastava

Replay Attack Detection with Complementary High-Resolution Information Using End-to-End DNN for the ASVspoof 2019 Challenge
Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu

















Speech Synthesis: Data and Evaluation


Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems
David Ayllón, Héctor A. Sánchez-Hevia, Carol Figueroa, Pierre Lanchantin

Selection and Training Schemes for Improving TTS Voice Built on Found Data
F.-Y. Kuo, I.C. Ouyang, S. Aryal, Pierre Lanchantin

All Together Now: The Living Audio Dataset
David A. Braude, Matthew P. Aylett, Caoimhín Laoide-Kemp, Simone Ashby, Kristen M. Scott, Brian Ó Raghallaigh, Anna Braudo, Alex Brouwer, Adriana Stan

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu

Corpus Design Using Convolutional Auto-Encoder Embeddings for Audio-Book Synthesis
Meysam Shamsi, Damien Lolive, Nelly Barbot, Jonathan Chevelu

Evaluating Intention Communication by TTS Using Explicit Definitions of Illocutionary Act Performance
Nobukatsu Hojo, Noboru Miyazaki

MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion
Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

Investigating the Robustness of Sequence-to-Sequence Text-to-Speech Models to Imperfectly-Transcribed Training Data
Jason Fong, Pilar Oplustil Gallegos, Zack Hodari, Simon King

Using Pupil Dilation to Measure Cognitive Load When Listening to Text-to-Speech in Quiet and in Noise
Avashna Govender, Anita E. Wagner, Simon King

A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research
Ioannis K. Douros, Jacques Felblinger, Jens Frahm, Karyna Isaieva, Arun A. Joseph, Yves Laprie, Freddy Odille, Anastasiia Tsukanova, Dirk Voit, Pierre-André Vuissoz

A Chinese Dataset for Identifying Speakers in Novels
Jia-Xiang Chen, Zhen-Hua Ling, Li-Rong Dai

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Kyubyong Park, Thomas Mulc


Model Training for ASR


Attention Model for Articulatory Features Detection
Ievgen Karaulov, Dmytro Tkanov

Unbiased Semi-Supervised LF-MMI Training Using Dropout
Sibo Tong, Apoorv Vyas, Philip N. Garner, Hervé Bourlard

Acoustic Model Optimization Based on Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition
Xiaodong Cui, Michael Picheny

Whether to Pretrain DNN or not?: An Empirical Analysis for Voice Conversion
Nirmesh J. Shah, Hardik B. Sailor, Hemant A. Patil

Detection of Glottal Closure Instants from Raw Speech Using Convolutional Neural Networks
Mohit Goyal, Varun Srivastava, Prathosh A.P.

Lattice-Based Lightly-Supervised Acoustic Model Training
Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell

Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR
Wilfried Michel, Ralf Schlüter, Hermann Ney

End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders
Ryo Masumura, Hiroshi Sato, Tomohiro Tanaka, Takafumi Moriya, Yusuke Ijima, Takanobu Oba

Char+CV-CTC: Combining Graphemes and Consonant/Vowel Units for CTC-Based ASR Using Multitask Learning
Abdelwahab Heba, Thomas Pellegrini, Jean-Pierre Lorré, Régine Andre-Obrecht

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation
Gakuto Kurata, Kartik Audhkhasi

Direct Neuron-Wise Fusion of Cognate Neural Networks
Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata

Two Tiered Distributed Training Algorithm for Acoustic Modeling
Pranav Ladkat, Oleg Rybakov, Radhika Arava, Sree Hari Krishnan Parthasarathi, I-Fan Chen, Nikko Strom

Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR
Pin-Tuan Huang, Hung-Shin Lee, Syu-Siang Wang, Kuan-Yu Chen, Yu Tsao, Hsin-Min Wang

Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition
Gakuto Kurata, Kartik Audhkhasi

Framewise Supervised Training Towards End-to-End Speech Recognition Models: First Results
Mohan Li, Yuanjiang Cao, Weicong Zhou, Min Liu


Network Architectures for Emotion and Paralinguistics Recognition


Deep Hierarchical Fusion with Application in Sentiment Analysis
Efthymios Georgiou, Charilaos Papaioannou, Alexandros Potamianos

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice
Vikramjit Mitra, Sue Booker, Erik Marchi, David Scott Farrar, Ute Dorothea Peitz, Bridget Cheng, Ermine Teves, Anuj Mehta, Devang Naik

Analysis of Deep Learning Architectures for Cross-Corpus Speech Emotion Recognition
Jack Parry, Dimitri Palaz, Georgia Clarke, Pauline Lecomte, Rebecca Mead, Michael Berger, Gregor Hofer

A Path Signature Approach for Speech Emotion Recognition
Bo Wang, Maria Liakata, Hao Ni, Terry Lyons, Alejo J. Nevado-Holgado, Kate Saunders

Employing Bottleneck and Convolutional Features for Speech-Based Physical Load Detection on Limited Data Amounts
Olga Egorow, Tarik Mrech, Norman Weißkirchen, Andreas Wendemuth

Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling
Jinming Zhao, Shizhe Chen, Jingjun Liang, Qin Jin

Predicting Group Performances Using a Personality Composite-Network Architecture During Collaborative Task
Shun-Chang Zhong, Yun-Shao Lin, Chun-Min Chang, Yi-Ching Liu, Chi-Chun Lee

Enforcing Semantic Consistency for Cross Corpus Valence Regression from Speech Using Adversarial Discrepancy Learning
Gao-Yi Chao, Yun-Shao Lin, Chun-Min Chang, Chi-Chun Lee

Deep Learning of Segment-Level Feature Representation with Multiple Instance Learning for Utterance-Level Speech Emotion Recognition
Shuiyang Mao, P.C. Ching, Tan Lee

Towards Robust Speech Emotion Recognition Using Deep Residual Networks for Speech Enhancement
Andreas Triantafyllopoulos, Gil Keren, Johannes Wagner, Ingmar Steiner, Björn W. Schuller

Towards Discriminative Representations and Unbiased Predictions: Class-Specific Angular Softmax for Speech Emotion Recognition
Zhixuan Li, Liang He, Jingyang Li, Li Wang, Wei-Qiang Zhang

Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition
Md. Asif Jalal, Erfan Loweimi, Roger K. Moore, Thomas Hain


Acoustic Phonetics


L2 Pronunciation Accuracy and Context: A Pilot Study on the Realization of Geminates in Italian as L2 by French Learners
Sonia d’Apolito, Barbara Gili Fivela

The Monophthongs of Formal Nigerian English: An Acoustic Analysis
Nisad Jamakovic, Robert Fuchs

Quantifying Fundamental Frequency Modulation as a Function of Language, Speaking Style and Speaker
Pablo Arantes, Anders Eriksson

The Voicing Contrast in Stops and Affricates in the Western Armenian of Lebanon
Niamh E. Kelly, Lara Keshishian

“ Gra[f] e!” Word-Final Devoicing of Obstruents in Standard French: An Acoustic Study Based on Large Corpora
Adèle Jatteau, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker, Nicolas Audibert

Acoustic Indicators of Deception in Mandarin Daily Conversations Recorded from an Interactive Game
Chih-Hsiang Huang, Huang-Cheng Chou, Yi-Tong Wu, Chi-Chun Lee, Yi-Wen Liu

Prosodic Effects on Plosive Duration in German and Austrian German
Barbara Schuppler, Margaret Zellers

Cross-Lingual Consistency of Phonological Features: An Empirical Study
Cibu Johny, Alexander Gutkin, Martin Jansche

Are IP Initial Vowels Acoustically More Distinct? Results from LDA and CNN Classifications
Fanny Guitard-Ivent, Gabriele Chignoli, Cécile Fougeron, Laurianne Georgeton

Neural Network-Based Modeling of Phonetic Durations
Xizi Wei, Melvyn Hunt, Adrian Skilling

An Acoustic Study of Vowel Undershoot in a System with Several Degrees of Prominence
Janina Mołczanow, Beata Łukaszewicz, Anna Łukaszewicz

A Preliminary Study of Charismatic Speech on YouTube: Correlating Prosodic Variation with Counts of Subscribers, Views and Likes
Stephanie Berger, Oliver Niebuhr, Margaret Zellers

Phonetic Detail Encoding in Explaining the Size of Speech Planning Window
Shan Luo

Acoustic Cues to Topic and Narrow Focus in Egyptian Arabic
Dina El Zarka, Barbara Schuppler, Francesco Cangemi

Acoustic and Articulatory Study of Ewe Vowels: A Comparative Study of Male and Female
Kowovi Comivi Alowonou, Jianguo Wei, Wenhuan Lu, Zhicheng Liu, Kiyoshi Honda, Jianwu Dang


Speech Enhancement: Noise Attenuation


Speech Augmentation via Speaker-Specific Noise in Unseen Environment
Ya’nan Guo, Ziping Zhao, Yide Ma, Björn W. Schuller

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-Noise Ratio Condition
Xiang Hao, Xiangdong Su, Zhiyu Wang, Hui Zhang, Batushiren

Towards Generalized Speech Enhancement with Generative Adversarial Networks
Santiago Pascual, Joan Serrà, Antonio Bonafonte

A Convolutional Neural Network with Non-Local Module for Speech Enhancement
Xiaoqi Li, Yaxing Li, Meng Li, Shan Xu, Yuanjie Dong, Xinrong Sun, Shengwu Xiong

IA-NET: Acceleration and Compression of Speech Enhancement Using Integer-Adder Deep Neural Network
Yu-Chen Lin, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement
Li Chai, Jun Du, Chin-Hui Lee

Speech Enhancement with Wide Residual Networks in Reverberant Environments
Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega, Eduardo Lleida

A Scalable Noisy Speech Dataset and Online Subjective Test Framework
Chandan K.A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke

Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN
Nagaraj Adiga, Yannis Pantazis, Vassilis Tsiaras, Yannis Stylianou

A Non-Causal FFTNet Architecture for Speech Enhancement
Muhammed Shifas P.V., Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou

Speech Enhancement with Variance Constrained Autoencoders
D.T. Braithwaite, W. Bastiaan Kleijn


Language Learning and Databases


A Deep Learning Approach to Automatic Characterisation of Rhythm in Non-Native English Speech
Konstantinos Kyriakopoulos, Kate M. Knill, Mark J.F. Gales

Language Learning Using Speech to Image Retrieval
Danny Merkx, Stefan L. Frank, Mirjam Ernestus

Using Alexa for Flashcard-Based Learning
Lucy Skidmore, Roger K. Moore

The 2019 Inaugural Fearless Steps Challenge: A Giant Leap for Naturalistic Audio
John H.L. Hansen, Aditya Joglekar, Meena Chandra Shekhar, Vinay Kothapally, Chengzhu Yu, Lakshmish Kaushik, Abhijeet Sangwan

Completely Unsupervised Phoneme Recognition by a Generative Adversarial Network Harmonized with Iteratively Refined Hidden Markov Models
Kuan-Yu Chen, Che-Ping Tsai, Da-Rong Liu, Hung-Yi Lee, Lin-shan Lee

Analysis of Native Listeners’ Facial Microexpressions While Shadowing Non-Native Speech — Potential of Shadowers’ Facial Expressions for Comprehensibility Prediction
Tasavat Trisitichoke, Shintaro Ando, Daisuke Saito, Nobuaki Minematsu

Transparent Pronunciation Scoring Using Articulatorily Weighted Phoneme Edit Distance
Reima Karhila, Anna-Riikka Smolander, Sari Ylinen, Mikko Kurimo

Development of Robust Automated Scoring Models Using Adversarial Input for Oral Proficiency Assessment
Su-Youn Yoon, Chong Min Lee, Klaus Zechner, Keelan Evanini

Impact of ASR Performance on Spoken Grammatical Error Detection
Y. Lu, Mark J.F. Gales, Kate M. Knill, P. Manakul, L. Wang, Y. Wang

Self-Imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training
Seung Hee Yang, Minhwa Chung


Emotion and Personality in Conversation


Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog
Chiori Hori, Anoop Cherian, Tim K. Marks, Takaaki Hori

Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations
Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, Dilek Hakkani-Tür

Analyzing Verbal and Nonverbal Features for Predicting Group Performance
Uliyana Kubasova, Gabriel Murray, McKenzie Braley

Identifying Therapist and Client Personae for Therapeutic Alliance Estimation
Victor R. Martinez, Nikolaos Flemotomos, Victor Ardulov, Krishna Somandepalli, Simon B. Goldberg, Zac E. Imel, David C. Atkins, Shrikanth Narayanan

Do Hesitations Facilitate Processing of Partially Defective System Utterances? An Exploratory Eye Tracking Study
Kristin Haake, Sarah Schimke, Simon Betz, Sina Zarrieß

Influence of Contextuality on Prosodic Realization of Information Structure in Chinese Dialogues
Bin Li, Yuan Jia

Cross-Lingual Transfer Learning for Affective Spoken Dialogue Systems
Kristijan Gjoreski, Aleksandar Gjoreski, Ivan Kraljevski, Diane Hirschfeld

Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue
Mingzhi Yu, Emer Gilmartin, Diane Litman

Identifying Mood Episodes Using Dialogue Features from Clinical Interviews
Zakaria Aldeneh, Mimansa Jaiswal, Michael Picheny, Melvin G. McInnis, Emily Mower Provost

Do Conversational Partners Entrain on Articulatory Precision?
Nichola Lubold, Stephanie A. Borrie, Tyson S. Barrett, Megan Willi, Visar Berisha

Conversational Emotion Analysis via Attention Mechanisms
Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang



Speech Signal Characterization 3


Direct F0 Estimation with Neural-Network-Based Regression
Shuzhuang Xu, Hiroshi Shimodaira

Real Time Online Visual End Point Detection Using Unidirectional LSTM
Tanay Sharma, Rohith Chandrashekar Aralikatti, Dilip Kumar Margam, Abhinav Thanda, Sharad Roy, Pujitha Appan Kandala, Shankar M. Venkatesan

Fully-Convolutional Network for Pitch Estimation of Speech Signals
Luc Ardaillon, Axel Roebel

Vocal Pitch Extraction in Polyphonic Music Using Convolutional Residual Network
Mingye Dong, Jie Wu, Jian Luan

Multi-Level Adaptive Speech Activity Detector for Speech in Naturalistic Environments
Bidisha Sharma, Rohan Kumar Das, Haizhou Li

On the Importance of Audio-Source Separation for Singer Identification in Polyphonic Music
Bidisha Sharma, Rohan Kumar Das, Haizhou Li

Investigating the Physiological and Acoustic Contrasts Between Choral and Operatic Singing
Hiroko Terasawa, Kenta Wakasa, Hideki Kawahara, Ken-Ichi Sakakibara

Optimizing Voice Activity Detection for Noisy Conditions
Ruixi Lin, Charles Costello, Charles Jankowski, Vishwas Mruthyunjaya

Small-Footprint Magic Word Detection Method Using Convolutional LSTM Neural Network
Taiki Yamamoto, Ryota Nishimura, Masayuki Misaki, Norihide Kitaoka

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment
Chitralekha Gupta, Emre Yılmaz, Haizhou Li

Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection
Anastasios Vafeiadis, Eleftherios Fanioudakis, Ilyas Potamitis, Konstantinos Votis, Dimitrios Giakoumis, Dimitrios Tzovaras, Liming Chen, Raouf Hamzaoui

A Study of Soprano Singing in Light of the Source-Filter Interaction
Tokihiko Kaburagi


Speech Synthesis: Pronunciation, Multilingual, and Low Resource


Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring
Yuxiang Zou, Linhao Dong, Bo Xu

Building a Mixed-Lingual Neural TTS System with Only Monolingual Data
Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion
Alex Sokolov, Tracy Rohlin, Ariya Rastrow

Analysis of Pronunciation Learning in End-to-End Speech Synthesis
Jason Taylor, Korin Richmond

End-to-End Text-to-Speech for Low-Resource Languages by Cross-Lingual Transfer Learning
Yuan-Jui Chen, Tao Tu, Cheng-chieh Yeh, Hung-Yi Lee

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R.J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran

Unified Language-Independent DNN-Based G2P Converter
Markéta Jůzová, Daniel Tihelka, Jakub Vít

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT
Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng

Transformer Based Grapheme-to-Phoneme Conversion
Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth

Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages
Harry Bleyan, Sandy Ritchie, Jonas Fromseier Mortensen, Daan van Esch

Cross-Lingual, Multi-Speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
Mengnan Chen, Minchuan Chen, Shuang Liang, Jun Ma, Lei Chen, Shaojun Wang, Jing Xiao

Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features
Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, Ming Li

Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
Hao Sun, Xu Tan, Jun-Wei Gan, Hongzhi Liu, Sheng Zhao, Tao Qin, Tie-Yan Liu


Cross-Lingual and Multilingual ASR


Multilingual Speech Recognition with Corpus Relatedness Sampling
Xinjian Li, Siddharth Dalmia, Alan W. Black, Florian Metze

Multi-Dialect Acoustic Modeling Using Phone Mapping and Online i-Vectors
Harish Arsikere, Ashtosh Sapru, Sri Garimella

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model
Anjuli Kannan, Arindrima Datta, Tara N. Sainath, Eugene Weinstein, Bhuvana Ramabhadran, Yonghui Wu, Ankur Bapna, Zhifeng Chen, Seungji Lee

Recognition of Latin American Spanish Using Multi-Task Learning
Carlos Mendes, Alberto Abad, João Paulo Neto, Isabel Trancoso

End-to-End Accented Speech Recognition
Thibault Viglino, Petr Motlicek, Milos Cernak

End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition
Sheng Li, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai

Exploiting Monolingual Speech Corpora for Code-Mixed Speech Recognition
Karan Taneja, Satarupa Guha, Preethi Jyothi, Basil Abraham

Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar, Golan Pundak

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data
Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma

On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition
Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Eng Siong Chng, Haizhou Li

Towards Language-Universal Mandarin-English Speech Recognition
Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma, Lei Xie


Spoken Term Detection, Confidence Measure, and End-to-End Speech Recognition


Improving ASR Confidence Scores for Alexa Using Acoustic and Hypothesis Embeddings
Prakhar Swarup, Roland Maas, Sri Garimella, Sri Harish Mallidi, Björn Hoffmeister

Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition
Shiliang Zhang, Ming Lei, Zhijie Yan

Improving Performance of End-to-End ASR on Numeric Sequences
Cal Peyser, Hao Zhang, Tara N. Sainath, Zelin Wu

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Chenghao Zhao, Cunhang Fan

Sub-Band Convolutional Neural Networks for Small-Footprint Spoken Term Classification
Chieh-Chi Kao, Ming Sun, Yixin Gao, Shiv Vitaladevuni, Chao Wang

Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese
Sheng Li, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai

Joint Decoding of CTC Based Systems for Speech Recognition
Jiaqi Guo, Yongbin You, Yanmin Qian, Kai Yu

A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge
Tomohiro Tanaka, Ryo Masumura, Takafumi Moriya, Takanobu Oba, Yushi Aono

Active Learning Methods for Low Resource End-to-End Speech Recognition
Karan Malhotra, Shubham Bansal, Sriram Ganapathy

Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems
Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan Černocký

Lattice Generation in Attention-Based Speech Recognition Models
Michał Zapotoczny, Piotr Pietrzak, Adrian Łańcucki, Jan Chorowski

Sampling from Stochastic Finite Automata with Applications to CTC Decoding
Martin Jansche, Alexander Gutkin

ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning
Łukasz Dudziak, Mohamed S. Abdelfattah, Ravichander Vipperla, Stefanos Laskaridis, Nicholas D. Lane

Acoustic-to-Phrase Models for Speech Recognition
Yashesh Gaur, Jinyu Li, Zhong Meng, Yifan Gong

Performance Monitoring for End-to-End Speech Recognition
Ruizhi Li, Gregory Sell, Hynek Hermansky


Speech Perception


The Role of Musical Experience in the Perceptual Weighting of Acoustic Cues for the Obstruent Coda Voicing Contrast in American English
Michelle Cohn, Georgia Zellou, Santiago Barreda

Individual Differences in Implicit Attention to Phonetic Detail in Speech Perception
Natalie Lewandowski, Daniel Duran

Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit
Kaylah Lalonde

Listening with Great Expectations: An Investigation of Word Form Anticipations in Naturalistic Speech
M. Bentum, L. ten Bosch, A. van den Bosch, Mirjam Ernestus

Quantifying Expectation Modulation in Human Speech Processing
M. Bentum, L. ten Bosch, A. van den Bosch, Mirjam Ernestus

Perception of Pitch Contours in Speech and Nonspeech
Daniel R. Turner, Ann R. Bradlow, Jennifer S. Cole

Analyzing Reaction Time and Error Sequences in Lexical Decision Experiments
L. ten Bosch, L. Boves, K. Mulder

Automatic Detection of the Temporal Segmentation of Hand Movements in British English Cued Speech
Li Liu, Jianze Li, Gang Feng, Xiao-Ping Zhang

Place Shift as an Autonomous Process: Evidence from Japanese Listeners
Yuriko Yokoe

A Perceptual Study of CV Syllables in Both Spoken and Whistled Speech: A Tashlhiyt Berber Perspective
Julien Meyer, Laure Dentel, Silvain Gerber, Rachid Ridouane

Consonant Classification in Mandarin Based on the Depth Image Feature: A Pilot Study
Han-Chi Hsieh, Wei-Zhong Zheng, Ko-Chiang Chen, Ying-Hui Lai

The Different Roles of Expectations in Phonetic and Lexical Processing
Shiri Lev-Ari, Robin Dodsworth, Jeff Mielke, Sharon Peperkamp

Perceptual Adaptation to Device and Human Voices: Learning and Generalization of a Phonetic Shift Across Real and Voice-AI Talkers
Bruno Ferenc Segedin, Michelle Cohn, Georgia Zellou

End-to-End Convolutional Sequence Learning for ASL Fingerspelling Recognition
Katerina Papadimitriou, Gerasimos Potamianos





The Interspeech 2019 Computational Paralinguistics Challenge (ComParE)


The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity
Björn W. Schuller, Anton Batliner, Christian Bergler, Florian B. Pokorny, Jarek Krajewski, Margaret Cychosz, Ralf Vollmann, Sonja-Dana Roelen, Sebastian Schnieder, Elika Bergelson, Alejandrina Cristia, Amanda Seidl, Anne S. Warlaumont, Lisa Yankowitz, Elmar Nöth, Shahin Amiriparian, Simone Hantke, Maximilian Schmitt

Using Speech Production Knowledge for Raw Waveform Modelling Based Styrian Dialect Identification
S. Pavankumar Dubagunta, Mathew Magimai-Doss

Deep Neural Baselines for Computational Paralinguistics
Daniel Elsner, Stefan Langer, Fabian Ritz, Robert Mueller, Steffen Illium

Styrian Dialect Classification: Comparing and Fusing Classifiers Based on a Feature Selection Using a Genetic Algorithm
Thomas Kisler, Raphael Winkelmann, Florian Schiel

Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition
Sung-Lin Yeh, Gao-Yi Chao, Bo-Hao Su, Yu-Lin Huang, Meng-Han Lin, Yin-Chun Tsai, Yu-Wen Tai, Zheng-Chi Lu, Chieh-Yu Chen, Tsung-Ming Tai, Chiu-Wang Tseng, Cheng-Kuang Lee, Chi-Chun Lee

Ordinal Triplet Loss: Investigating Sleepiness Detection from Speech
Peter Wu, SaiKrishna Rallabandi, Alan W. Black, Eric Nyberg

Voice Quality and Between-Frame Entropy for Sleepiness Estimation
Vijay Ravi, Soo Jin Park, Amber Afshan, Abeer Alwan

Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds
Gábor Gosztolya

Instantaneous Phase and Long-Term Acoustic Cues for Orca Activity Detection
Rohan Kumar Das, Haizhou Li

Relevance-Based Feature Masking: Improving Neural Network Based Whale Classification Through Explainable Artificial Intelligence
Dominik Schiller, Tobias Huber, Florian Lingenfelser, Michael Dietz, Andreas Seiderer, Elisabeth André

Spatial, Temporal and Spectral Multiresolution Analysis for the INTERSPEECH 2019 ComParE Challenge
Marie-José Caraty, Claude Montacié

The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge
Haiwei Wu, Weiqing Wang, Ming Li



The VOiCES from a Distance Challenge — P


The VOiCES from a Distance Challenge 2019
Mahesh Kumar Nandwana, Julien van Hout, Colleen Richey, Mitchell McLaren, Maria A. Barrios, Aaron Lawson

STC Speaker Recognition Systems for the VOiCES from a Distance Challenge
Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Galina Lavrentyeva, Vladimir Volokhov, Alexandr Kozlov

Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge
Pavel Matějka, Oldřich Plchot, Hossein Zeinali, Ladislav Mošner, Anna Silnova, Lukáš Burget, Ondřej Novotný, Ondřej Glembek

The STC ASR System for the VOiCES from a Distance Challenge 2019
Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Ivan Sorokin, Anton Mitrofanov, Vladimir Bataev, Andrei Andrusenko, Tatiana Prisyach, Mariya Korenevskaya, Oleg Petrov, Alexander Zatvornitskiy

The I2R’s ASR System for the VOiCES from a Distance Challenge 2019
Tze Yuang Chong, Kye Min Tan, Kah Kuan Teh, Chang Huai You, Hanwu Sun, Huy Dat Tran

Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech
Arindam Jati, Raghuveer Peri, Monisankha Pal, Tae Jin Park, Naveen Kumar, Ruchir Travadi, Panayiotis Georgiou, Shrikanth Narayanan

The JHU Speaker Recognition System for the VOiCES 2019 Challenge
David Snyder, Jesús Villalba, Nanxin Chen, Daniel Povey, Gregory Sell, Najim Dehak, Sanjeev Khudanpur

Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019
Jonathan Huang, Tobias Bocklet

The I2R’s Submission to VOiCES Distance Speaker Recognition Challenge 2019
Hanwu Sun, Kah Kuan Teh, Ivan Kukanov, Huy Dat Tran

The LeVoice Far-Field Speech Recognition System for VOiCES from a Distance Challenge 2019
Yulong Liang, Lin Yang, Xuyang Wang, Yingjie Li, Chen Jia, Junjie Wang

The JHU ASR System for VOiCES from a Distance Challenge 2019
Yiming Wang, David Snyder, Hainan Xu, Vimal Manohar, Phani Sankar Nidadavolu, Daniel Povey, Sanjeev Khudanpur

The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge
Danwei Cai, Xiaoyi Qin, Weicheng Cai, Ming Li
















Speaker Recognition and Anti-Spoofing


Blind Channel Response Estimation for Replay Attack Detection
Anderson R. Avila, Jahangir Alam, Douglas O’Shaughnessy, Tiago H. Falk

Energy Separation-Based Instantaneous Frequency Estimation for Cochlear Cepstral Feature for Replay Spoof Detection
Ankur T. Patil, Rajul Acharya, Pulikonda Aditya Sai, Hemant A. Patil

Optimization of False Acceptance/Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems
Victoria Mingote, Antonio Miguel, Dayana Ribas, Alfonso Ortega, Eduardo Lleida

Deep Hashing for Speaker Identification and Retrieval
Lei Fan, Qing-Yuan Jiang, Ya-Qi Yu, Wu-Jun Li

Adversarial Optimization for Dictionary Attacks on Speaker Verification
Mirko Marras, Paweł Korus, Nasir Memon, Gianni Fenu

An Adaptive-Q Cochlear Model for Replay Spoofing Detection
Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Haizhou Li

An End-to-End Text-Independent Speaker Verification Framework with a Keyword Adversarial Network
Sungrack Yun, Janghoon Cho, Jungyun Eum, Wonil Chang, Kyuwoong Hwang

Shortcut Connections Based Deep Speaker Embeddings for End-to-End Speaker Verification System
Soonshin Seo, Daniel Jun Rim, Minkyu Lim, Donghyun Lee, Hosung Park, Junseok Oh, Changmin Kim, Ji-Hwan Kim

Device Feature Extractor for Replay Spoofing Detection
Chang Huai You, Jichen Yang, Huy Dat Tran

Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training
Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian, Kai Yu

A Study of x-Vector Based Speaker Recognition on Short Utterances
A. Kanagasundaram, S. Sridharan, G. Sriram, S. Prachi, C. Fookes

Tied Mixture of Factor Analyzers Layer to Combine Frame Level Representations in Neural Speaker Embeddings
Nanxin Chen, Jesús Villalba, Najim Dehak

Biologically Inspired Adaptive-Q Filterbanks for Replay Spoofing Attack Detection
Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps

On Robustness of Unsupervised Domain Adaptation for Speaker Recognition
Pierre-Michel Bousquet, Mickael Rouvier

Large-Scale Speaker Retrieval on Random Speaker Variability Subspace
Suwon Shon, Younggun Lee, Taesu Kim


Rich Transcription and ASR Systems


Meeting Transcription Using Asynchronous Distant Microphones
Takuya Yoshioka, Dimitrios Dimitriadis, Andreas Stolcke, William Hinthorn, Zhuo Chen, Michael Zeng, Xuedong Huang

Detection and Recovery of OOVs for Improved English Broadcast News Captioning
Samuel Thomas, Kartik Audhkhasi, Zoltán Tüske, Yinghui Huang, Michael Picheny

Improving Large Vocabulary Urdu Speech Recognition System Using Deep Neural Networks
Muhammad Umar Farooq, Farah Adeeba, Sahar Rauf, Sarmad Hussain

Hybrid Arbitration Using Raw ASR String and NLU Information — Taking the Best of Both Embedded World and Cloud World
Min Tang

Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach
György Szaszák, Máté Ákos Tündik

The Airbus Air Traffic Control Speech Recognition 2018 Challenge: Towards ATC Automatic Transcription and Call Sign Detection
Thomas Pellegrini, Jérôme Farinas, Estelle Delpech, François Lancelot

Kite: Automatic Speech Recognition for Unmanned Aerial Vehicles
Dan Oneață, Horia Cucu

Exploring Methods for the Automatic Detection of Errors in Manual Transcription
Xiaofei Wang, Jinyi Yang, Ruizhi Li, Samik Sadhu, Hynek Hermansky

Improved Low-Resource Somali Speech Recognition by Semi-Supervised Acoustic and Language Model Training
Astik Biswas, Raghav Menon, Ewald van der Westhuizen, Thomas Niesler

The Althingi ASR System
Inga R. Helgadóttir, Anna Björk Nikulásdóttir, Michal Borský, Judy Y. Fong, Róbert Kjaran, Jón Guðnason

CRIM’s Speech Transcription and Call Sign Detection System for the ATC Airbus Challenge Task
Vishwa Gupta, Lise Rebout, Gilles Boulianne, Pierre-André Ménard, Jahangir Alam


Speech and Language Analytics for Medical Applications


Optimizing Speech-Input Length for Speaker-Independent Depression Classification
Tomasz Rutowski, Amir Harati, Yang Lu, Elizabeth Shriberg

A New Approach for Automating Analysis of Responses on Verbal Fluency Tests from Subjects At-Risk for Schizophrenia
Mary Pietrowicz, Carla Agurto, Raquel Norel, Elif Eyigoz, Guillermo Cecchi, Zarina R. Bilgrami, Cheryl Corcoran

Comparison of Telephone Recordings and Professional Microphone Recordings for Early Detection of Parkinson’s Disease, Using Mel-Frequency Cepstral Coefficients with Gaussian Mixture Models
Laetitia Jeancolas, Graziella Mangone, Jean-Christophe Corvol, Marie Vidailhet, Stéphane Lehéricy, Badr-Eddine Benkelfat, Habib Benali, Dijana Petrovska-Delacrétaz

Spectral Subspace Analysis for Automatic Assessment of Pathological Speech Intelligibility
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard

An Investigation of Therapeutic Rapport Through Prosody in Brief Psychodynamic Psychotherapy
Carolina De Pasquale, Charlie Cullen, Brian Vaughan

Feature Representation of Pathophysiology of Parkinsonian Dysarthria
Alice Rueda, J.C. Vásquez-Correa, Cristian David Rios-Urrego, Juan Rafael Orozco-Arroyave, Sridhar Krishnan, Elmar Nöth

Neural Transfer Learning for Cry-Based Diagnosis of Perinatal Asphyxia
Charles C. Onu, Jonathan Lebensold, William L. Hamilton, Doina Precup

Investigating the Variability of Voice Quality and Pain Levels as a Function of Multiple Clinical Parameters
Hui-Ting Hong, Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee

Assessing Parkinson’s Disease from Speech Using Fisher Vectors
José Vicente Egas López, Juan Rafael Orozco-Arroyave, Gábor Gosztolya

Feature Space Visualization with Spatial Similarity Maps for Pathological Speech Data
Philipp Klumpp, J.C. Vásquez-Correa, Tino Haderlein, Elmar Nöth

Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions Using Speech and Language
Sandeep Nallan Chakravarthula, Haoqi Li, Shao-Yen Tseng, Maija Reblin, Panayiotis Georgiou

Automatic Assessment of Language Impairment Based on Raw ASR Output
Ying Qin, Tan Lee, Anthony Pak Hin Kong


Speech Perception in Adverse Listening Conditions


Effects of Spectral and Temporal Cues to Mandarin Concurrent-Vowels Identification for Normal-Hearing and Hearing-Impaired Listeners
Zhen Fu, Xihong Wu, Jing Chen

Disfluencies and Human Speech Transcription Errors
Vicky Zayats, Trang Tran, Richard Wright, Courtney Mansfield, Mari Ostendorf

The Influence of Distraction on Speech Processing: How Selective is Selective Attention?
Sandra I. Parhammer, Miriam Ebersberg, Jenny Tippmann, Katja Stärk, Andreas Opitz, Barbara Hinger, Sonja Rossi

Subjective Evaluation of Communicative Effort for Younger and Older Adults in Interactive Tasks with Energetic and Informational Masking
Valerie Hazan, Outi Tuomainen, Linda Taschenberger

Perceiving Older Adults Producing Clear and Lombard Speech
Chris Davis, Jeesun Kim

Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users
T. Arias-Vergara, Juan Rafael Orozco-Arroyave, Milos Cernak, S. Gollwitzer, M. Schuster, Elmar Nöth

Effects of Urgent Speech and Congruent/Incongruent Text on Speech Intelligibility in Noise and Reverberation
Nao Hodoshima

Quantifying Cochlear Implant Users’ Ability for Speaker Identification Using CI Auditory Stimuli
Nursadul Mamun, Ria Ghosh, John H.L. Hansen

Lexically Guided Perceptual Learning of a Vowel Shift in an Interactive L2 Listening Context
E. Felker, Mirjam Ernestus, Mirjam Broersma

Talker Intelligibility and Listening Effort with Temporally Modified Speech
Maximillian Paulus, Valerie Hazan, Patti Adank

R2SPIN: Re-Recording the Revised Speech Perception in Noise Test
Lauren Ward, Catherine Robinson, Matthew Paradis, Katherine M. Tucker, Ben Shirley

Contributions of Consonant-Vowel Transitions to Mandarin Tone Identification in Simulated Electric-Acoustic Hearing
Fei Chen


Speech Enhancement: Single Channel 1


Monaural Speech Enhancement with Dilated Convolutions
Shadi Pirhosseinloo, Jonathan S. Brumberg

Noise Adaptive Speech Enhancement Using Domain Adversarial Training
Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang

Environment-Dependent Attention-Driven Recurrent Convolutional Neural Network for Robust Speech Enhancement
Meng Ge, Longbiao Wang, Nan Li, Hao Shi, Jianwu Dang, Xiangang Li

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement Using Variational Autoencoders
Manuel Pariente, Antoine Deleforge, Emmanuel Vincent

Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction
Ju Lin, Sufeng Niu, Zice Wei, Xiang Lan, Adriaan J. van Wijngaarden, Melissa C. Smith, Kuang-Ching Wang

Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric
Ryandhimas E. Zezario, Szu-Wei Fu, Xugang Lu, Hsin-Min Wang, Yu Tsao

Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement
Fu-Kai Chuang, Syu-Siang Wang, Jeih-weih Hung, Yu Tsao, Shih-Hau Fang

Investigation of Cost Function for Supervised Monaural Speech Separation
Yun Liu, Hui Zhang, Xueliang Zhang, Yuhang Cao

Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han, Anyan Shi

Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement
Xianyun Wang, Changchun Bao

Progressive Speech Enhancement with Residual Connections
Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega, Eduardo Lleida



Emotion Modeling and Analysis


Cross-Corpus Speech Emotion Recognition Using Semi-Supervised Transfer Non-Negative Matrix Factorization with Adaptation Regularization
Hui Luo, Jiqing Han

Modeling User Context for Valence Prediction from Narratives
Aniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner, Giuseppe Riccardi

Front-End Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition
Rupayan Chakraborty, Ashish Panda, Meghna Pandharipande, Sonal Joshi, Sunil Kumar Kopparapu

The Contribution of Acoustic Features Analysis to Model Emotion Perceptual Process for Language Diversity
Xingfeng Li, Masato Akagi

Design and Development of a Multi-Lingual Speech Corpora (TaMaR-EmoDB) for Emotion Analysis
Rajeev Rajan, Haritha U.G., Sujitha A.C., Rejisha T. M.

Speech Emotion Recognition with a Reject Option
Kusha Sridhar, Carlos Busso

Development of Emotion Rankers Based on Intended and Perceived Emotion Labels
Zhenghao Jin, Houwei Cao

Emotion Recognition from Natural Phone Conversations in Individuals with and without Recent Suicidal Ideation
John Gideon, Heather T. Schatten, Melvin G. McInnis, Emily Mower Provost

An Acoustic and Lexical Analysis of Emotional Valence in Spontaneous Speech: Autobiographical Memory Recall in Older Adults
Deniece S. Nazareth, Ellen Tournier, Sarah Leimkötter, Esther Janse, Dirk Heylen, Gerben J. Westerhof, Khiet P. Truong

Does the Lombard Effect Improve Emotional Communication in Noise? — Analysis of Emotional Speech Acted in Noise
Yi Zhao, Atsushi Ando, Shinji Takaki, Junichi Yamagishi, Satoshi Kobashikawa

Linear Discriminant Differential Evolution for Feature Selection in Emotional Speech Recognition
Soumaya Gharsellaoui, Sid Ahmed Selouani, Mohammed Sidi Yakoub

Multi-Modal Learning for Speech Emotion Recognition: An Analysis and Comparison of ASR Outputs with Ground Truth Transcription
Saurabh Sahu, Vikramjit Mitra, Nadee Seneviratne, Carol Espy-Wilson



Speech and Audio Classification 2


Residual + Capsule Networks (ResCap) for Simultaneous Single-Channel Overlapped Keyword Recognition
Yan Xiong, Visar Berisha, Chaitali Chakrabarti

A Study for Improving Device-Directed Speech Detection Toward Frictionless Human-Machine Interaction
Che-Wei Huang, Roland Maas, Sri Harish Mallidi, Björn Hoffmeister

Unsupervised Methods for Audio Classification from Lecture Discussion Recordings
Hang Su, Borislav Dzodzo, Xixin Wu, Xunying Liu, Helen Meng

Neural Whispered Speech Detection with Imbalanced Learning
Takanori Ashihara, Yusuke Shinohara, Hiroshi Sato, Takafumi Moriya, Kiyoaki Matsui, Takaaki Fukutomi, Yoshikazu Yamaguchi, Yushi Aono

Deep Learning for Orca Call Type Identification — A Fully Unsupervised Approach
Christian Bergler, Manuel Schmitt, Rachael Xi Cheng, Andreas Maier, Volker Barth, Elmar Nöth

Open-Vocabulary Keyword Spotting with Audio and Text Embeddings
Niccolò Sacchi, Alexandre Nanchen, Martin Jaggi, Milos Cernak

ToneNet: A CNN Model of Tone Classification of Mandarin Chinese
Qiang Gao, Shutao Sun, Yaping Yang

Temporal Convolution for Real-Time Keyword Spotting on Mobile Devices
Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha

Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation
Zhiying Huang, Shiliang Zhang, Ming Lei

Music Genre Classification Using Duplicated Convolutional Layers in Neural Networks
Hansi Yang, Wei-Qiang Zhang

A Storyteller’s Tale: Literature Audiobooks Genre Classification Using CNN and RNN Architectures
Nehory Carmi, Azaria Cohen, Mireille Avigal, Anat Lerner




Lexicon and Language Model for Speech Recognition


Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?
Lyan Verwimp, Jerome R. Bellegarda

Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen

Character-Aware Sub-Word Level Language Modeling for Uyghur and Turkish ASR
Chang Liu, Zhen Zhang, Pengyuan Zhang, Yonghong Yan

Connecting and Comparing Language Model Interpolation Techniques
Ernest Pusateri, Christophe Van Gysel, Rami Botros, Sameer Badaskar, Mirko Hannemann, Youssef Oualil, Ilya Oparin

Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation
Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Haihua Xu, Eng Siong Chng

Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models
Jianwei Yu, Max W.Y. Lam, Shoukang Hu, Xixin Wu, Xu Li, Yuewen Cao, Xunying Liu, Helen Meng

Improving Automatically Induced Lexicons for Highly Agglutinating Languages Using Data-Driven Morphological Segmentation
Wiehan Agenbag, Thomas Niesler

Attention-Based Word Vector Prediction with LSTMs and its Application to the OOV Problem in ASR
Alejandro Coucheiro-Limeres, Fernando Fernández-Martínez, Rubén San-Segundo, Javier Ferreiros-López

Code-Switching Sentence Generation by Bert and Generative Adversarial Networks
Yingying Gao, Junlan Feng, Ying Liu, Leijing Hou, Xin Pan, Yong Ma

Unified Verbalization for Speech Recognition & Synthesis Across Languages
Sandy Ritchie, Richard Sproat, Kyle Gorman, Daan van Esch, Christian Schallhart, Nikos Bampounis, Benoît Brard, Jonas Fromseier Mortensen, Millie Holt, Eoin Mahon

Better Morphology Prediction for Better Speech Systems
Dravyansh Sharma, Melissa Wilson, Antoine Bruguier


First and Second Language Acquisition


Vietnamese Learners Tackling the German /ʃt/ in Perception
Anke Sennema, Silke Hamann

An Articulatory-Acoustic Investigation into GOOSE-Fronting in German-English Bilinguals Residing in London, UK
Scott Lewis, Adib Mehrabi, Esther de Leeuw

Multimodal Articulation-Based Pronunciation Error Detection with Spectrogram and Acoustic Features
Sabrina Jenne, Ngoc Thang Vu

Using Prosody to Discover Word Order Alternations in a Novel Language
Anouschka Foltz, Sarah Cooper, Tamsin M. McKelvey

Speaking Rate, Information Density, and Information Rate in First-Language and Second-Language Speech
Ann R. Bradlow

Articulation Rate as a Metric in Spoken Language Assessment
Calbert Graham, Francis Nolan

Learning Alignment for Multimodal Emotion Recognition from Speech
Haiyang Xu, Hui Zhang, Kun Han, Yun Wang, Yiping Peng, Xiangang Li

Liquid Deletion in French Child-Directed Speech
Sharon Peperkamp, Monica Hegde, Maria Julia Carbajal

Towards Detection of Canonical Babbling by Citizen Scientists: Performance as a Function of Clip Length
Amanda Seidl, Anne S. Warlaumont, Alejandrina Cristia

Nasal Consonant Discrimination in Infant- and Adult-Directed Speech
Bogdan Ludusan, Annett Jorschick, Reiko Mazuka

No Distributional Learning in Adults from Attended Listening to Non-Speech
Ellen Marklund, Johan Sjons, Lisa Gustavsson, Elísabet Eir Cortes

A Computational Model of Early Language Acquisition from Audiovisual Experiences of Young Infants
Okko Räsänen, Khazar Khorrami

The Production of Chinese Affricates /ts/ and /tsh/ by Native Urdu Speakers
Dan Du, Jinsong Zhang


Speech and Audio Classification 3


Multi-Stream Network with Temporal Attention for Environmental Sound Classification
Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff

Neural Network Distillation on IoT Platforms for Sound Event Detection
Gianmarco Cerutti, Rahul Prasad, Alessio Brutti, Elisabetta Farella

Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models
Xue Bai, Jun Du, Zi-Rui Wang, Chin-Hui Lee

Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection
Ke-Xin He, Yu-Han Shen, Wei-Qiang Zhang

Sound Event Detection in Multichannel Audio Using Convolutional Time-Frequency-Channel Squeeze and Excitation
Wei Xia, Kazuhito Koishida

A Robust Framework for Acoustic Scene Classification
Lam Pham, Ian McLoughlin, Huy Phan, Ramaswamy Palaniappan

Compression of Acoustic Event Detection Models with Quantized Distillation
Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

An End-to-End Audio Classification System Based on Raw Waveforms and Mix-Training Strategy
Jiaxu Chen, Jing Hao, Kai Chen, Di Xie, Shicai Yang, Shiliang Pu

Few-Shot Audio Classification with Attentional Graph Neural Networks
Shilei Zhang, Yong Qin, Kewei Sun, Yonghua Lin

Semi-Supervised Audio Classification with Consistency-Based Regularization
Kangkang Lu, Chuan-Sheng Foo, Kah Kuan Teh, Huy Dat Tran, Vijay Ramaseshan Chandrasekhar

















Speaker and Language Recognition 2


Adversarial Regularization for End-to-End Robust Speaker Verification
Qing Wang, Pengcheng Guo, Sining Sun, Lei Xie, John H.L. Hansen

Combining Speaker Recognition and Metric Learning for Speaker-Dependent Representation Learning
João Monteiro, Jahangir Alam, Tiago H. Falk

VAE-Based Regularization for Deep Speaker Embedding
Yang Zhang, Lantian Li, Dong Wang

Language Recognition Using Triplet Neural Networks
Victoria Mingote, Diego Castan, Mitchell McLaren, Mahesh Kumar Nandwana, Alfonso Ortega, Eduardo Lleida, Antonio Miguel

Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification
Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim

End-to-End Losses Based on Speaker Basis Vectors and All-Speaker Hard Negative Mining for Speaker Verification
Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu

An Effective Deep Embedding Learning Architecture for Speaker Verification
Yiheng Jiang, Yan Song, Ian McLoughlin, Zhifu Gao, Li-Rong Dai

Far-Field End-to-End Text-Dependent Speaker Verification Based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation
Xiaoyi Qin, Danwei Cai, Ming Li

Two-Stage Training for Chinese Dialect Recognition
Zongze Ren, Guofu Yang, Shugong Xu

Investigation on Blind Bandwidth Extension with a Non-Linear Function and its Evaluation of x-Vector-Based Speaker Verification
Ryota Kaminishi, Haruna Miyamoto, Sayaka Shiota, Hitoshi Kiya

Auto-Encoding Nearest Neighbor i-Vectors for Speaker Verification
Umair Khan, Miquel India, Javier Hernando

Towards a Fault-Tolerant Speaker Verification System: A Regularization Approach to Reduce the Condition Number
Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei

Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments
Hassan Taherian, Zhong-Qiu Wang, DeLiang Wang

Joint Optimization of Neural Acoustic Beamforming and Dereverberation with x-Vectors for Robust Speaker Verification
Joon-Young Yang, Joon-Hyuk Chang

A New Time-Frequency Attention Mechanism for TDNN and CNN-LSTM-TDNN, with Application to Language Identification
Xiaoxiao Miao, Ian McLoughlin, Yonghong Yan


Medical Applications and Visual ASR


An Attention-Based Hybrid Network for Automatic Detection of Alzheimer’s Disease from Narrative Speech
Jun Chen, Ji Zhu, Jieping Ye

Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition
Pingchuan Ma, Stavros Petridis, Maja Pantic

“Computer, Test My Hearing”: Accurate Speech Audiometry with Smart Speakers
Jasper Ooster, Pia Nancy Porysek Moreta, Jörg-Hendrik Bach, Inga Holube, Bernd T. Meyer

Synchronising Audio and Ultrasound by Learning Cross-Modal Embeddings
Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals

Automatic Hierarchical Attention Neural Network for Detecting AD
Yilin Pan, Bahman Mirheidari, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen

Deep Sensing of Breathing Signal During Conversational Speech
Venkata Srikanth Nallanthighal, Aki Härmä, Helmer Strik

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
Fadi Biadsy, Ron J. Weiss, Pedro J. Moreno, Dimitri Kanvesky, Ye Jia

Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition
Shansong Liu, Shoukang Hu, Yi Wang, Jianwei Yu, Rongfeng Su, Xunying Liu, Helen Meng

Video-Driven Speech Reconstruction Using Generative Adversarial Networks
Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic

On the Use of Pitch Features for Disordered Speech Recognition
Shansong Liu, Shoukang Hu, Xunying Liu, Helen Meng

Large-Scale Visual Speech Recognition
Brendan Shillingford, Yannis Assael, Matthew W. Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, Hasim Sak, Kanishka Rao, Lorrayne Bennett, Marie Mulville, Misha Denil, Ben Coppin, Ben Laurie, Andrew Senior, Nando de Freitas




Speech Enhancement: Multi-Channel and Intelligibility


On Mitigating Acoustic Feedback in Hearing Aids with Frequency Warping by All-Pass Networks
Ching-Hua Lee, Kuan-Lin Chen, Fred Harris, Bhaskar D. Rao, Harinath Garudadri

Deep Multitask Acoustic Echo Cancellation
Amin Fazel, Mostafa El-Khamy, Jungwon Lee

Deep Learning for Joint Acoustic Echo and Noise Cancellation with Nonlinear Distortions
Hao Zhang, Ke Tan, DeLiang Wang

Harmonic Beamformers for Non-Intrusive Speech Intelligibility Prediction
Charlotte Sørensen, Jesper B. Boldt, Mads G. Christensen

Convolutional Neural Network-Based Speech Enhancement for Cochlear Implant Recipients
Nursadul Mamun, Soheil Khorram, John H.L. Hansen

Validation of the Non-Intrusive Codebook-Based Short Time Objective Intelligibility Metric for Processed Speech
Charlotte Sørensen, Jesper B. Boldt, Mads G. Christensen

Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-Based ASR System
Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Katsuhiko Yamamoto, Toshio Irino

A Novel Method to Correct Steering Vectors in MVDR Beamformer for Noise Robust ASR
Suliang Bu, Yunxin Zhao, Mei-Yuh Hwang

End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform
Hyeonseung Lee, Hyung Yong Kim, Woo Hyun Kang, Jeunghun Kim, Nam Soo Kim

Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information
Rongzhi Gu, Lianwu Chen, Shi-Xiong Zhang, Jimeng Zheng, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman


Speaker Recognition 3


End-to-End Neural Speaker Diarization with Permutation-Free Objectives
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe

Self Multi-Head Attention for Speaker Recognition
Miquel India, Pooyan Safari, Javier Hernando

Phonetically-Aware Embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation
Ignacio Viñals, Dayana Ribas, Victoria Mingote, Jorge Llombart, Pablo Gimeno, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

Variational Domain Adversarial Learning for Speaker Verification
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien

A Unified Framework for Speaker and Utterance Verification
Tianchi Liu, Maulik Madhavi, Rohan Kumar Das, Haizhou Li

Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems
Mahesh Kumar Nandwana, Luciana Ferrer, Mitchell McLaren, Diego Castan, Aaron Lawson

Factorization of Discriminatively Trained i-Vector Extractor for Speaker Recognition
Ondřej Novotný, Oldřich Plchot, Ondřej Glembek, Lukáš Burget

End-to-End Speaker Identification in Noisy and Reverberant Environments Using Raw Waveform Convolutional Neural Networks
Daniele Salvati, Carlo Drioli, Gian Luca Foresti

Whisper to Neutral Mapping Using Cosine Similarity Maximization in i-Vector Space for Speaker Verification
Abinay Reddy Naini, Achuth Rao M.V., Prasanta Kumar Ghosh

Mixup Learning Strategies for Text-Independent Speaker Verification
Yingke Zhu, Tom Ko, Brian Mak

Optimizing a Speaker Embedding Extractor Through Backend-Driven Regularization
Luciana Ferrer, Mitchell McLaren

The NEC-TT 2018 Speaker Verification System
Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda

Autoencoder-Based Semi-Supervised Curriculum Learning for Out-of-Domain Speaker Verification
Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei

Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment
Danwei Cai, Xiaoyi Qin, Ming Li

The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation
Danwei Cai, Weicheng Cai, Ming Li


NN Architectures for ASR


Pretraining by Backtranslation for End-to-End ASR in Low-Resource Settings
Matthew Wiesner, Adithya Renduchintala, Shinji Watanabe, Chunxi Liu, Najim Dehak, Sanjeev Khudanpur

Cross-Attention End-to-End ASR for Two-Party Conversations
Suyoun Kim, Siddharth Dalmia, Florian Metze

Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees
Jan Chorowski, Adrian Łańcucki, Bartosz Kostka, Michał Zapotoczny

An Online Attention-Based Model for Speech Recognition
Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu

Self-Attention Transducers for End-to-End Speech Recognition
Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengqi Wen

Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation
Sheng Li, Dabre Raj, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai

Extending an Acoustic Data-Driven Phone Set for Spontaneous Speech Recognition
Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon

Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition
Takafumi Moriya, Jian Wang, Tomohiro Tanaka, Ryo Masumura, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono

Real to H-Space Encoder for Speech Recognition
Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori

Ectc-Docd: An End-to-End Structure with CTC Encoder and OCD Decoder for Speech Recognition
Cheng Yi, Feng Wang, Bo Xu

End-to-End Multi-Speaker Speech Recognition Using Speaker Embeddings and Transfer Learning
Pavel Denisov, Ngoc Thang Vu


Speech Synthesis: Text Processing, Prosody, and Emotion


Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Shubham Toshniwal, Karen Livescu

Spontaneous Conversational Speech Synthesis from Found Data
Éva Székely, Gustav Eje Henter, Jonas Beskow, Joakim Gustafson

Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech
Viacheslav Klimkov, Srikanth Ronanki, Jonas Rohnke, Thomas Drugman

Speech Driven Backchannel Generation Using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction
Nusrah Hussain, Engin Erzin, T. Metin Sezgin, Yücel Yemez

Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model
Tomoki Koriyama, Takao Kobayashi

Bootstrapping a Text Normalization System for an Inflected Language. Numbers as a Test Case
Anna Björk Nikulásdóttir, Jón Guðnason

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS
Haohan Guo, Frank K. Soong, Lei He, Lei Xie

Duration Modeling with Global Phoneme-Duration Vectors
Jinfu Ni, Yoshinori Shiga, Hisashi Kawai

Improving Speech Synthesis with Discourse Relations
Adèle Aubin, Alessandra Cervone, Oliver Watts, Simon King

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis Through Audio Analysis
Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, Thierry Dutoit

Pre-Trained Text Representations for Improving Front-End Text Processing in Mandarin Text-to-Speech Synthesis
Bing Yang, Jiaqi Zhong, Shan Liu

A Mandarin Prosodic Boundary Prediction Model Based on Multi-Task Learning
Huashan Pan, Xiulin Li, Zhiqiang Huang

Dual Encoder Classifier Models as Constraints in Neural Text Normalization
Ajda Gokcen, Hao Zhang, Richard Sproat

Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis
Jingbei Li, Zhiyong Wu, Runnan Li, Pengpeng Zhi, Song Yang, Helen Meng

Automated Emotion Morphing in Speech Based on Diffeomorphic Curve Registration and Highway Networks
Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman


Speech and Voice Disorders 2


Use of Beiwe Smartphone App to Identify and Track Speech Decline in Amyotrophic Lateral Sclerosis (ALS)
Kathryn P. Connaghan, Jordan R. Green, Sabrina Paganoni, James Chan, Harli Weber, Ella Collins, Brian Richburg, Marziye Eshghi, J.P. Onnela, James D. Berry

Profiling Speech Motor Impairments in Persons with Amyotrophic Lateral Sclerosis: An Acoustic-Based Approach
Hannah P. Rowe, Jordan R. Green

Diagnosing Dysarthria with Long Short-Term Memory Networks
Alex Mayle, Zhiwei Mou, Razvan Bunescu, Sadegh Mirshekarian, Li Xu, Chang Liu

Modification of Devoicing Error in Cleft Lip and Palate Speech
Protima Nomo Sudro, S.R. Mahadeva Prasanna

Reduced Task Adaptation in Alternating Motion Rate Tasks as an Early Marker of Bulbar Involvement in Amyotrophic Lateral Sclerosis
Marziye Eshghi, Panying Rong, Antje S. Mefferd, Kaila L. Stipancic, Yana Yunusova, Jordan R. Green

Towards the Speech Features of Early-Stage Dementia: Design and Application of the Mandarin Elderly Cognitive Speech Database
Tianqi Wang, Quanlei Yan, Jingshen Pan, Feiqi Zhu, Rongfeng Su, Yi Guo, Lan Wang, Nan Yan

Acoustic Characteristics of Lexical Tone Disruption in Mandarin Speakers After Brain Damage
Wenjun Chen, Jeroen van de Weijer, Shuangshuang Zhu, Qian Qian, Manna Wang

Intragestural Variation in Natural Sentence Production: Essential Tremor Patients Treated with DBS
Anne Hermes, Doris Mücke, Tabea Thies, Michael T. Barbe

Nasal Air Emission in Sibilant Fricatives of Cleft Lip and Palate Speech
Sishir Kalita, Protima Nomo Sudro, S.R. Mahadeva Prasanna, S. Dandapat

Parallel vs. Non-Parallel Voice Conversion for Esophageal Speech
Luis Serrano, Sneha Raman, David Tavarez, Eva Navas, Inma Hernaez

Hypernasality Severity Detection Using Constant Q Cepstral Coefficients
Akhilesh Kumar Dubey, S.R. Mahadeva Prasanna, S. Dandapat

Automatic Depression Level Detection via ℓp-Norm Pooling
Mingyue Niu, Jianhua Tao, Bin Liu, Cunhang Fan

Comparison of Speech Tasks and Recording Devices for Voice Based Automatic Classification of Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis
Suhas B.N., Deep Patel, Nithin Rao, Yamini Belur, Pradeep Reddy, Nalini Atchayaram, Ravi Yadav, Dipanjan Gope, Prasanta Kumar Ghosh


Speech and Audio Source Separation and Scene Analysis 3


A Modified Algorithm for Multiple Input Spectrogram Inversion
Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda

A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation
Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu

Evaluating Audiovisual Source Separation in the Context of Video Conferencing
Berkay İnan, Milos Cernak, Helmut Grabner, Helena Peic Tukuljac, Rodrigo C.G. Pena, Benjamin Ricaud

Influence of Speaker-Specific Parameters on Speech Separation Systems
David Ditter, Timo Gerkmann

CNN-LSTM Models for Multi-Speaker Source Separation Using Bayesian Hyper Parameter Optimization
Jeroen Zegers, Hugo Van hamme

Towards Joint Sound Scene and Polyphonic Sound Event Recognition
Helen L. Bear, Inês Nolasco, Emmanouil Benetos

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features
Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen

Probabilistic Permutation Invariant Training for Speech Separation
Midia Yousefi, Soheil Khorram, John H.L. Hansen

Which Ones Are Speaking? Speaker-Inferred Model for Multi-Talker Speech Separation
Jing Shi, Jiaming Xu, Bo Xu

End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa, Shouji Harada, Jiqing Han

End-to-End Music Source Separation: Is it Possible in the Waveform Domain?
Francesc Lluís, Jordi Pons, Xavier Serra



Search papers
Article
×

ISCA Medal 2019 Keynote Speech

Spoken Language Processing for Children’s Speech

Dynamics of Emotional Speech Exchanges in Multimodal Communication

End-to-End Speech Recognition

Speech Enhancement: Multi-Channel

Speech Production: Individual Differences and the Brain

Speech Signal Characterization 1

Neural Waveform Generation

Attention Mechanism for Speaker State Recognition

ASR Neural Network Training — 1

Zero-Resource ASR

Sociophonetics

Resources – Annotation – Evaluation

Speaker Recognition and Diarization

ASR for Noisy and Far-Field Speech

Social Signals Detection and Speaker Traits Analysis

Applications of Language Technologies

Speech and Audio Characterization and Segmentation

Neural Techniques for Voice Conversion and Waveform Generation

Model Adaptation for ASR

Dialogue Speech Understanding

Speech Production and Silent Interfaces

Speech Signal Characterization 2

Applications in Language Learning and Healthcare

Keynote 2: Tanja Schultz

The Second DIHARD Speech Diarization Challenge (DIHARD II)

The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge — O

The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge — P

The Zero Resource Speech Challenge 2019: TTS Without T

Speech Translation

Speaker Recognition 1

Dialogue Understanding

Speech in the Brain

Far-Field Speech Recognition

Speaker and Language Recognition 1

Speech Synthesis: Towards End-to-End

Semantic Analysis and Classification

Speech and Audio Source Separation and Scene Analysis 1

Speech Intelligibility

ASR Neural Network Architectures 1

Speech and Language Analytics for Mental Health

Dialogue Modelling

Speaker Recognition Evaluation

Speech Synthesis: Data and Evaluation

Model Training for ASR

Network Architectures for Emotion and Paralinguistics Recognition

Acoustic Phonetics

Speech Enhancement: Noise Attenuation

Language Learning and Databases

Emotion and Personality in Conversation

Voice Quality, Speech Perception, and Prosody

Speech Signal Characterization 3

Speech Synthesis: Pronunciation, Multilingual, and Low Resource

Cross-Lingual and Multilingual ASR

Spoken Term Detection, Confidence Measure, and End-to-End Speech Recognition

Speech Perception

Topics in Speech and Audio Signal Processing

Speech Processing and Analysis

Keynote 3: Manfred Kaltenbacher

The Interspeech 2019 Computational Paralinguistics Challenge (ComParE)

The VOiCES from a Distance Challenge — O

The VOiCES from a Distance Challenge — P

Voice Quality Characterization for Clinical Voice Assessment: Voice Production, Acoustics, and Auditory Perception

Prosody

Speech and Audio Classification 1

Singing and Multimodal Synthesis

ASR Neural Network Training — 2

Bilingualism, L2, and Non-Nativeness

Spoken Term Detection

Speech and Audio Source Separation and Scene Analysis 2

Speech Enhancement: Single Channel 2

Multimodal ASR

ASR Neural Network Architectures 2

Training Strategy for Speech Emotion Recognition

Voice Conversion for Style, Accent, and Emotion

Speaker Recognition 2

Speaker Recognition and Anti-Spoofing

Rich Transcription and ASR Systems

Speech and Language Analytics for Medical Applications

Speech Perception in Adverse Listening Conditions

Speech Enhancement: Single Channel 1

Speech Recognition and Beyond

Emotion Modeling and Analysis

Articulatory Phonetics

Speech and Audio Classification 2

Speech Coding and Evaluation

Feature Extraction for ASR

Lexicon and Language Model for Speech Recognition

First and Second Language Acquisition

Speech and Audio Classification 3

Speech and Speaker Recognition

Speech Annotation and Labelling

Speech Synthesis

Keynote 4: Mirella Lapata

Privacy in Speech and Audio Interfaces

Speech Technologies for Code-Switching in Multilingual Communities

Speech Synthesis: Articulatory and Physical Approaches

Sequence-to-Sequence Speech Recognition

Search Methods for Speech Recognition

Audio Signal Characterization

Speech and Voice Disorders 1

Neural Networks for Language Modeling

Representation Learning of Emotion and Paralinguistics

World’s Languages and Varieties

Adaptation and Accommodation in Conversation

Speaker and Language Recognition 2

Medical Applications and Visual ASR

Turn Management in Dialogue

Corpus Annotation and Evaluation

Speech Enhancement: Multi-Channel and Intelligibility

Speaker Recognition 3

NN Architectures for ASR

Speech Synthesis: Text Processing, Prosody, and Emotion

Speech and Voice Disorders 2

Speech and Audio Source Separation and Scene Analysis 3

Speech-to-Text and Speech Assessment