doi: 10.21437/Interspeech.2017
ISSN: 2958-1796
The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection
Tomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, Kong Aik Lee
Experimental Analysis of Features for Replay Attack Detection — Results on the ASVspoof 2017 Challenge
Roberto Font, Juan M. Espín, María José Cano
Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection
Hemant A. Patil, Madhu R. Kamble, Tanvina B. Patel, Meet H. Soni
Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion
Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, Ming Li
Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features
Sarfaraz Jelil, Rohan Kumar Das, S.R. Mahadeva Prasanna, Rohit Sinha
Audio Replay Attack Detection Using High-Frequency Features
Marcin Witkowski, Stanisław Kacprzak, Piotr Żelasko, Konrad Kowalczyk, Jakub Gałka
Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing
Xianliang Wang, Yanhong Xiao, Xuan Zhu
Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech
Emre Yılmaz, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel, David Van Leeuwen
Exploiting Untranscribed Broadcast Data for Improved Code-Switching Detection
Emre Yılmaz, Henk van den Heuvel, David Van Leeuwen
Jee haan, I’d like both, por favor: Elicitation of a Code-Switched Corpus of Hindi–English and Spanish–English Human–Machine Dialog
Vikram Ramanarayanan, David Suendermann-Oeft
On Building Mixed Lingual Speech Synthesis Systems
SaiKrishna Rallabandi, Alan W. Black
Speech Synthesis for Mixed-Language Navigation Instructions
Khyathi Raghavi Chandu, SaiKrishna Rallabandi, Sunayana Sitaram, Alan W. Black
Addressing Code-Switching in French/Algerian Arabic Speech
Djegdjiga Amazouz, Martine Adda-Decker, Lori Lamel
Metrics for Modeling Code-Switching Across Corpora
Gualberto Guzmán, Joseph Ricard, Jacqueline Serigos, Barbara E. Bullock, Almeida Jacqueline Toribio
Synthesising isiZulu-English Code-Switch Bigrams Using Word Embeddings
Ewald van der Westhuizen, Thomas Niesler
Crowdsourcing Universal Part-of-Speech Tags for Code-Switching
Victor Soto, Julia Hirschberg
Audio Replay Attack Detection with Deep Learning Frameworks
Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, Vadim Shchemelinin
Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017
Zhe Ji, Zhi-Yi Li, Peng Li, Maobo An, Shengxiang Gao, Dan Wu, Faru Zhao
A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification
Lantian Li, Yixiang Chen, Dong Wang, Thomas Fang Zheng
Replay Attack Detection Using DNN for Channel Discrimination
Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland
ResNet and Model Fusion for Automatic Spoofing Detection
Zhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu
SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017
K.N.R.K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala
Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features
William Hartmann, Roger Hsiao, Tim Ng, Jeff Ma, Francis Keith, Man-Hung Siu
Student-Teacher Training with Diverse Decision Tree Ensembles
Jeremy H.M. Wong, Mark J.F. Gales
Embedding-Based Speaker Adaptive Training of Deep Neural Networks
Xiaodong Cui, Vaibhava Goel, George Saon
Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer
Jeff Ma, Francis Keith, Tim Ng, Man-Hung Siu, Owen Kimball
English Conversational Telephone Speech Recognition by Humans and Machines
George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall
Comparing Human and Machine Errors in Conversational Speech Transcription
Andreas Stolcke, Jasha Droppo
Multimodal Markers of Persuasive Speech: Designing a Virtual Debate Coach
Volha Petukhova, Manoj Raju, Harry Bunt
Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum Disorder
Daniel Bone, Julia Mertens, Emily Zane, Sungbok Lee, Shrikanth S. Narayanan, Ruth Grossman
A Stepwise Analysis of Aggregated Crowdsourced Labels Describing Multimodal Emotional Behaviors
Alec Burmania, Carlos Busso
An Information Theoretic Analysis of the Temporal Synchrony Between Head Gestures and Prosodic Patterns in Spontaneous Speech
Gaurav Fotedar, Prasanta Kumar Ghosh
Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression Techniques
D.-Y. Huang, Wan Ding, Mingyu Xu, Huaiping Ming, Minghui Dong, Xinguo Yu, Haizhou Li
Co-Production of Speech and Pointing Gestures in Clear and Perturbed Interactive Tasks: Multimodal Designation Strategies
Marion Dohen, Benjamin Roustan
Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing
Peter Guzewich, Stephen A. Zahorian
Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System Distance
Philipp Bulling, Klaus Linhard, Arthur Wolf, Gerhard Schmidt
A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) Systems
Jan Franzen, Tim Fingscheidt
Speech Enhancement Based on Harmonic Estimation Combined with MMSE to Improve Speech Intelligibility for Cochlear Implant Recipients
Dongmei Wang, John H.L. Hansen
Improving Speech Intelligibility in Binaural Hearing Aids by Estimating a Time-Frequency Mask with a Weighted Least Squares Classifier
David Ayllón, Roberto Gil-Pita, Manuel Rosa-Zurera
Simulations of High-Frequency Vocoder on Mandarin Speech Recognition for Acoustic Hearing Preserved Cochlear Implant
Tsung-Chen Wu, Tai-Shih Chi, Chia-Fone Lee
Phonetic Correlates of Pharyngeal and Pharyngealized Consonants in Saudi, Lebanese, and Jordanian Arabic: An rt-MRI Study
Zainab Hermes, Marissa Barlaz, Ryan Shosted, Zhi-Pei Liang, Brad Sutton
Glottal Opening and Strategies of Production of Fricatives
Benjamin Elie, Yves Laprie
Acoustics and Articulation of Medial versus Final Coronal Stop Gemination Contrasts in Moroccan Arabic
Mohamed Yassine Frej, Christopher Carignan, Catherine T. Best
How are Four-Level Length Distinctions Produced? Evidence from Moroccan Arabic
Giuseppina Turco, Karim Shoul, Rachid Ridouane
Vowels in the Barunga Variety of North Australian Kriol
Caroline Jones, Katherine Demuth, Weicong Li, Andre Almeida
Nature of Contrast and Coarticulation: Evidence from Mizo Tones and Assamese Vowel Harmony
Indranil Dutta, Irfan S., Pamir Gogoi, Priyankoo Sarmah
The Influence of Synthetic Voice on the Evaluation of a Virtual Character
João Paulo Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell
Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network
Amelia J. Gully, Takenori Yoshimura, Damian T. Murphy, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
An HMM/DNN Comparison for Synchronized Text-to-Speech and Tongue Motion Synthesis
Sébastien Le Maguer, Ingmar Steiner, Alexander Hewer
VCV Synthesis Using Task Dynamics to Animate a Factor-Based Articulatory Model
Rachel Alexander, Tanner Sorensen, Asterios Toutios, Shrikanth S. Narayanan
Beyond the Listening Test: An Interactive Approach to TTS Evaluation
Joseph Mendelson, Matthew P. Aylett
Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis
Beiming Cao, Myungjong Kim, Jan van Santen, Ted Mau, Jun Wang
Approaches for Neural-Network Language Model Adaptation
Min Ma, Michael Nirschl, Fadi Biadsy, Shankar Kumar
A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models
Youssef Oualil, Dietrich Klakow
Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
X. Chen, A. Ragni, X. Liu, Mark J.F. Gales
Fast Neural Network Language Model Lookups at N-Gram Speeds
Yinghui Huang, Abhinav Sethy, Bhuvana Ramabhadran
Empirical Exploration of Novel Architectures and Objectives for Language Models
Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon
Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward Networks
Karel Beneš, Murali Karthick Baskar, Lukáš Burget
Dominant Distortion Classification for Pre-Processing of Vowels in Remote Biomedical Voice Analysis
Amir Hossein Poorjam, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen
Automatic Paraphasia Detection from Aphasic Speech: A Preliminary Study
Duc Le, Keli Licata, Emily Mower Provost
Evaluation of the Neurological State of People with Parkinson’s Disease Using i-Vectors
N. Garcia, Juan Rafael Orozco-Arroyave, L.F. D’Haro, Najim Dehak, Elmar Nöth
Objective Severity Assessment from Disordered Voice Using Estimated Glottal Airflow
Yu-Ren Chien, Michal Borský, Jón Guðnason
Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-Based Approach
Florian B. Pokorny, Björn Schuller, Peter B. Marschik, Raymond Brueckner, Pär Nyström, Nicholas Cummins, Sven Bölte, Christa Einspieler, Terje Falck-Ytter
Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease
J.C. Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth
Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs
Linxue Bai, Peter Jančovič, Martin Russell, Philip Weber, Steve Houghton
An Investigation of Crowd Speech for Room Occupancy Estimation
Siyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Ngoc Le
Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech Signals
Karthika Vijayan, Jitendra Kumar Dhiman, Chandra Sekhar Seelamantula
Musical Speech: A New Methodology for Transcribing Speech Prosody
Alexsandro R. Meireles, Antônio R.M. Simões, Antonio Celso Ribeiro, Beatriz Raposo de Medeiros
Estimation of Place of Articulation of Fricatives from Spectral Characteristics for Speech Training
K.S. Nataraj, Prem C. Pandey, Hirak Dasgupta
Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source
Tom Bäckström
End-to-End Acoustic Feedback in Language Learning for Correcting Devoiced French Final-Fricatives
Sucheta Ghosh, Camille Fauth, Yves Laprie, Aghilas Sini
Dialect Perception by Older Children
Ewa Jacewicz, Robert A. Fox
Perception of Non-Contrastive Variations in American English by Japanese Learners: Flaps are Less Favored Than Stops
Kiyoko Yoneyama, Mafuyu Kitahara, Keiichi Tajima
L1 Perceptions of L2 Prosody: The Interplay Between Intonation, Rhythm, and Speech Rate and Their Contribution to Accentedness and Comprehensibility
Lieke van Maastricht, Tim Zee, Emiel Krahmer, Marc Swerts
Effects of Pitch Fall and L1 on Vowel Length Identification in L2 Japanese
Izumi Takiguchi
A Preliminary Study of Prosodic Disambiguation by Chinese EFL Learners
Yuanyuan Zhang, Hongwei Ding
Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home
Chanwoo Kim, Ananya Misra, Kean Chin, Thad Hughes, Arun Narayanan, Tara N. Sainath, Michiel Bacchiani
Neural Network-Based Spectrum Estimation for Online WPE Dereverberation
Keisuke Kinoshita, Marc Delcroix, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani
Factorial Modeling for Effective Suppression of Directional Noise
Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven J. Rennie
On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones
Yan-Hui Tu, Jun Du, Lei Sun, Feng Ma, Chin-Hui Lee
Acoustic Modeling for Google Home
Bo Li, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Haşim Sak, Golan Pundak, Kean Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon
On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech Recognition
Seyedmahdad Mirsamadi, John H.L. Hansen
Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System
Masanori Morise, Genta Miyashita, Kenji Ozawa
Robust Source-Filter Separation of Speech Signal in the Phase Domain
Erfan Loweimi, Jon Barker, Oscar Saz Torralba, Thomas Hain
A Time-Warping Pitch Tracking Algorithm Considering Fast f0 Changes
Simon Stone, Peter Steiner, Peter Birkholz
A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and fo Estimation
Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda
Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments
Avinash Kumar, S. Shahnawazuddin, Gayadhar Pradhan
Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech Synthesis
Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh
Wavelet Speech Enhancement Based on Robust Principal Component Analysis
Chia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang, Jeih-Weih Hung, Ying-Hui Lai, Hsin-Min Wang, Yu Tsao
Vowel Onset Point Detection Using Sonority Information
Bidisha Sharma, S.R. Mahadeva Prasanna
Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies
Unto K. Laine
Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural Networks
Christian Kroos, Mark D. Plumbley
Multilingual i-Vector Based Statistical Modeling for Music Genre Classification
Jia Dai, Wei Xue, Wenju Liu
Indoor/Outdoor Audio Classification Using Foreground Speech Segmentation
Banriskhem K. Khonglah, K.T. Deepak, S.R. Mahadeva Prasanna
Attention Based CLDNNs for Short-Duration Acoustic Scene Classification
Jinxi Guo, Ning Xu, Li-Jia Li, Abeer Alwan
Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event Detection
Xianjun Xia, Roberto Togneri, Ferdous Sohel, David Huang
Enhanced Feature Extraction for Speech Detection in Media Audio
Inseon Jang, ChungHyun Ahn, Jeongil Seo, Younseon Jang
Audio Classification Using Class-Specific Learned Descriptors
Sukanya Sonowal, Tushar Sandhan, Inkyu Choi, Nam Soo Kim
Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery
Janek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, Bhiksha Raj
Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural Networks
Matthias Zöhrer, Franz Pernkopf
Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi
Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, Morgan Sonderegger
A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the ‘Color’ of Whispered Phonemes and Deep Neural Network
G. Nisha Meenakshi, Prasanta Kumar Ghosh
Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition
Ian Williams, Petar Aleksic
Comparison of Decoding Strategies for CTC Acoustic Models
Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel
Phone Duration Modeling for LVCSR Using Neural Networks
Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur
Towards Better Decoding and Language Model Integration in Sequence to Sequence Models
Jan Chorowski, Navdeep Jaitly
Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling
Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu
Binary Deep Neural Networks for Speech Recognition
Xu Xiang, Yanmin Qian, Kai Yu
Hierarchical Constrained Bayesian Optimization for Feature, Acoustic Model and Decoder Parameter Optimization
Akshay Chandrashekaran, Ian Lane
Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition
Shohei Toyama, Daisuke Saito, Nobuaki Minematsu
Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks
Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas Raykar, Lili Kotlerman, Guy Lev
Estimation of Gap Between Current Language Models and Human Performance
Xiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow
A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation Recovery
Anna Moró, György Szaszák
Factors Affecting the Intelligibility of Low-Pass Filtered Speech
Lei Wang, Fei Chen
Phonetic Restoration of Temporally Reversed Speech
Shi-yu Wang, Fei Chen
Simultaneous Articulatory and Acoustic Distortion in L1 and L2 Listening: Locally Time-Reversed “Fast” Speech
Mako Ishida
Lexically Guided Perceptual Learning in Mandarin Chinese
L. Ann Burchfield, San-hei Kenny Luk, Mark Antoniou, Anne Cutler
The Effect of Spectral Profile on the Intelligibility of Emotional Speech in Noise
Chris Davis, Chee Seng Chong, Jeesun Kim
Whether Long-Term Tracking of Speech Rate Affects Perception Depends on Who is Talking
Merel Maslowski, Antje S. Meyer, Hans Rutger Bosker
Emotional Thin-Slicing: A Proposal for a Short- and Long-Term Division of Emotional Speech
Daniel Oliveira Peres, Dominic Watt, Waldemar Ferreira Netto
Predicting Epenthetic Vowel Quality from Acoustics
Adriana Guevara-Rukoz, Erika Parlato-Oliveira, Shi Yu, Yuki Hirose, Sharon Peperkamp, Emmanuel Dupoux
The Effect of Spectral Tilt on Size Discrimination of Voiced Speech Sounds
Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson
Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car
Jaime Lorenzo-Trueba, Cassia Valentini Botinhao, Gustav Eje Henter, Junichi Yamagishi
The Relative Cueing Power of F0 and Duration in German Prominence Perception
Oliver Niebuhr, Jana Winkler
Perception and Acoustics of Vowel Nasality in Brazilian Portuguese
Luciana Marques, Rebecca Scarborough
Sociophonetic Realizations Guide Subsequent Lexical Access
Jonny Kim, Katie Drager
Critical Articulators Identification from RT-MRI of the Vocal Tract
Samuel Silva, António Teixeira
Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images
Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan
Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance Priors
Sasan Asadiabadi, Engin Erzin
An Objective Critical Distance Measure Based on the Relative Level of Spectral Valley
T.V. Ananthapadmanabha, A.G. Ramakrishnan, Shubham Sharma
Database of Volumetric and Real-Time Vocal Tract MRI for Speech Science
Tanner Sorensen, Zisis Skordilis, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Jangwon Kim, Adam Lammert, Vikram Ramanarayanan, Louis Goldstein, Dani Byrd, Krishna Nayak, Shrikanth S. Narayanan
The Influence on Realization and Perception of Lexical Tones from Affricate’s Aspiration
Chong Cao, Yanlu Xie, Qi Zhang, Jinsong Zhang
Audiovisual Recalibration of Vowel Categories
Matthias K. Franken, Frank Eisner, Jan-Mathijs Schoffelen, Daniel J. Acheson, Peter Hagoort, James M. McQueen
The Effect of Gesture on Persuasive Speech
Judith Peters, Marieke Hoetjes
Auditory-Visual Integration of Talker Gender in Cantonese Tone Perception
Wei Lai
Event-Related Potentials Associated with Somatosensory Effect in Audio-Visual Speech Perception
Takayuki Ito, Hiroki Ohashi, Eva Montas, Vincent L. Gracco
When a Dog is a Cat and How it Changes Your Pupil Size: Pupil Dilation in Response to Information Mismatch
Lena F. Renner, Marcin Włodarczak
Cross-Modal Analysis Between Phonation Differences and Texture Images Based on Sentiment Correlations
Win Thuzar Kyaw, Yoshinori Sagisaka
Wireless Neck-Surface Accelerometer and Microphone on Flex Circuit with Application to Noise-Robust Monitoring of Lombard Speech
Daryush D. Mehta, Patrick C. Chwalek, Thomas F. Quatieri, Laura J. Brattain
Video-Based Tracking of Jaw Movements During Speech: Preliminary Results and Future Directions
Andrea Bandini, Aravind Namasivayam, Yana Yunusova
Accurate Synchronization of Speech and EGG Signal Using Phase Information
Sunil Kumar S.B., K. Sreenivasa Rao, Tanumay Mandal
The Acquisition of Focal Lengthening in Stockholm Swedish
Anna Sara H. Romøren, Aoju Chen
Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition
Shiyu Zhou, Yuanyuan Zhao, Shuang Xu, Bo Xu
CTC Training of Multi-Phone Acoustic Models for Speech Recognition
Olivier Siohan
An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation
Sibo Tong, Philip N. Garner, Hervé Bourlard
2016 BUT Babel System: Multilingual BLSTM Acoustic Model with i-Vector Based Adaptation
Martin Karafiát, Murali Karthick Baskar, Pavel Matějka, Karel Veselý, František Grézl, Lukáš Burget, Jan Černocký
Optimizing DNN Adaptation for Recognition of Enhanced Speech
Marco Matassoni, Alessio Brutti, Daniele Falavigna
Deep Least Squares Regression for Speaker Adaptation
Younggwan Kim, Hyungjun Lim, Jahyun Goo, Hoirin Kim
Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition
Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson
Generalized Distillation Framework for Speaker Normalization
Neethu Mariam Joy, Sandeep Reddy Kothinti, S. Umesh, Basil Abraham
Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models
Lahiru Samarakoon, Brian Mak, Khe Chai Sim
Factorised Representations for Neural Network Adaptation to Diverse Acoustic Environments
Joachim Fainberg, Steve Renals, Peter Bell
An RNN Model of Text Normalization
Richard Sproat, Navdeep Jaitly
Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels
Asaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran
Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis
Yusuke Ijima, Nobukatsu Hojo, Ryo Masumura, Taichi Asami
Global Syllable Vectors for Building TTS Front-End with Deep Learning
Jinfu Ni, Yoshinori Shiga, Hisashi Kawai
Prosody Control of Utterance Sequence for Information Delivering
Ishin Fukuoka, Kazuhiko Iwata, Tetsunori Kobayashi
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer
Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai
Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction
Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li, Bin Liu
Discrete Duration Model for Speech Synthesis
Bo Chen, Tianling Bian, Kai Yu
Comparison of Modeling Target in LSTM-RNN Duration Model
Bo Chen, Jiahao Lai, Kai Yu
Learning Word Vector Representations Based on Acoustic Counts
M. Sam Ribeiro, Oliver Watts, Junichi Yamagishi
Synthesising Uncertainty: The Interplay of Vocal Effort and Hesitation Disfluencies
Éva Székely, Joseph Mendelson, Joakim Gustafson
Prosograph: A Tool for Prosody Visualisation of Large Speech Corpora
Alp Öktem, Mireia Farrús, Leo Wanner
ChunkitApp: Investigating the Relevant Units of Online Speech Processing
Svetlana Vetchinnikova, Anna Mauranen, Nina Mikušová
Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision Control
Markus Jochim
HomeBank: A Repository for Long-Form Real-World Audio Recordings of Children
Anne S. Warlaumont, Mark VanDam, Elika Bergelson, Alejandrina Cristia
A System for Real Time Collaborative Transcription Correction
Peter Bell, Joachim Fainberg, Catherine Lai, Mark Sinclair
MoPAReST — Mobile Phone Assisted Remote Speech Therapy Platform
Chitralekha Bhat, Anjali Kant, Bhavik Vachhani, Sarita Rautara, Ashok Kumar Sinha, Sunil Kumar Kopparapu
An Apparatus to Investigate Western Opera Singing Skill Learning Using Performance and Result Biofeedback, and Measuring its Neural Correlates
Aurore Jaumard-Hakoun, Samy Chikhi, Takfarinas Medani, Angelika Nair, Gérard Dreyfus, François-Benoît Vialatte
PercyConfigurator — Perception Experiments as a Service
Christoph Draxler
System for Speech Transcription and Post-Editing in Microsoft Word
Askars Salimbajevs, Indra Ikauniece
Emojive! Collecting Emotion Data from Speech and Facial Expression Using Mobile Game App
Ji Ho Park, Nayeon Lee, Dario Bertero, Anik Dey, Pascale Fung
Mylly — The Mill: A New Platform for Processing Speech and Text Corpora Easily and Efficiently
Mietta Lennes, Jussi Piitulainen, Martin Matthiesen
Visual Learning 2: Pronunciation App Using Ultrasound, Video, and MRI
Kyori Suzuki, Ian Wilson, Hayato Watanabe
Elicitation Design for Acoustic Depression Classification: An Investigation of Articulation Effort, Linguistic Complexity, and Word Affect
Brian Stasak, Julien Epps, Roland Goecke
Robustness Over Time-Varying Channels in DNN-HMM ASR Based Human-Robot Interaction
José Novoa, Jorge Wuth, Juan Pablo Escudero, Josué Fredes, Rodrigo Mahu, Richard M. Stern, Nestor Becerra Yoma
Analysis of Engagement and User Experience with a Laughter Responsive Social Robot
Bekir Berker Türker, Zana Buçinca, Engin Erzin, Yücel Yemez, Metin Sezgin
Automatic Classification of Autistic Child Vocalisations: A Novel Database and Results
Alice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M. Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Maurice Gerczuk, Björn Schuller
Crowd-Sourced Design of Artificial Attentive Listeners
Catharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Joseph Mendelson, Jonas Beskow, Joakim Gustafson
Studying the Link Between Inter-Speaker Coordination and Speech Imitation Through Human-Machine Interactions
Leonardo Lancia, Thierry Chaminade, Noël Nguyen, Laurent Prévot
Adjusting the Frame: Biphasic Performative Control of Speech Rhythm
Samuel Delalez, Christophe d’Alessandro
Attentional Factors in Listeners’ Uptake of Gesture Cues During Speech Processing
Raheleh Saryazdi, Craig G. Chambers
Motion Analysis in Vocalized Surprise Expressions
Carlos Ishi, Takashi Minato, Hiroshi Ishiguro
Enhancing Backchannel Prediction Using Word Embeddings
Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel
A Computational Model for Phonetically Responsive Spoken Dialogue Systems
Eran Raveh, Ingmar Steiner, Bernd Möbius
Incremental Dialogue Act Recognition: Token- vs Chunk-Based Classification
Eustace Ebhotemhen, Volha Petukhova, Dietrich Klakow
Clear Speech — Mere Speech? How Segmental and Prosodic Speech Reduction Shape the Impression That Speakers Create on Listeners
Oliver Niebuhr
Relationships Between Speech Timing and Perceived Hostility in a French Corpus of Political Debates
Charlotte Kouklia, Nicolas Audibert
Towards Speaker Characterization: Identifying and Predicting Dimensions of Person Attribution
Laura Fernández Gallardo, Benjamin Weiss
Prosodic Analysis of Attention-Drawing Speech
Carlos Ishi, Jun Arai, Norihiro Hagita
Perceptual and Acoustic CorreLates of Gender in the Prepubertal Voice
Adrian P. Simpson, Riccarda Funk, Frederik Palmer
To See or not to See: Interlocutor Visibility and Likeability Influence Convergence in Intonation
Katrin Schweitzer, Michael Walsh, Antje Schweitzer
Acoustic Correlates of Parental Role and Gender Identity in the Speech of Expecting Parents
Melanie Weirich, Adrian P. Simpson
A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced Domains
Rubén Solera-Ureña, Helena Moniz, Fernando Batista, Vera Cabarrão, Anna Pompili, Ramon Fernandez Astudillo, Joana Campos, Ana Paiva, Isabel Trancoso
Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions
Rachael Tatman, Conner Kasten
A Comparison of Sequence-to-Sequence Models for Speech Recognition
Rohit Prabhavalkar, Kanishka Rao, Tara N. Sainath, Bo Li, Leif Johnson, Navdeep Jaitly
CTC in the Context of Generalized Full-Sum HMM Training
Albert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney
Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan
Multitask Learning with CTC and Segmental CRF for Speech Recognition
Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith
Direct Acoustics-to-Word Models for English Conversational Speech Recognition
Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo
Reducing the Computational Complexity of Two-Dimensional LSTMs
Bo Li, Tara N. Sainath
Functional Principal Component Analysis of Vocal Tract Area Functions
Jorge C. Lucero
Analysis of Acoustic-to-Articulatory Speech Inversion Across Different Accents and Languages
Ganesh Sivaraman, Carol Espy-Wilson, Martijn Wieling
Integrated Mechanical Model of [r]-[l] and [b]-[m]-[w] Producing Consonant Cluster [br]
Takayuki Arai
A Speaker Adaptive DNN Training Approach for Speaker-Independent Acoustic Inversion
Leonardo Badino, Luca Franceschi, Raman Arora, Michele Donini, Massimiliano Pontil
Acoustic-to-Articulatory Mapping Based on Mixture of Probabilistic Canonical Correlation Analysis
Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu
Test-Retest Repeatability of Articulatory Strategies Using Real-Time Magnetic Resonance Imaging
Tanner Sorensen, Asterios Toutios, Johannes Töger, Louis Goldstein, Shrikanth S. Narayanan
Deep Neural Network Embeddings for Text-Independent Speaker Verification
David Snyder, Daniel Garcia-Romero, Daniel Povey, Sanjeev Khudanpur
Tied Variational Autoencoder Backends for i-Vector Speaker Recognition
Jesús Villalba, Niko Brümmer, Najim Dehak
Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck Features
Shivesh Ranjan, John H.L. Hansen
Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel Information
Suwon Shon, Seongkyu Mun, Wooil Kim, Hanseok Ko
Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker Verification
Abbas Khosravani, Mohammad Mehdi Homayounpour
DNN Bottleneck Features for Speaker Clustering
Jesús Jorrín, Paola García, Luis Buera
Creak as a Feature of Lexical Stress in Estonian
Kätlin Aare, Pärtel Lippus, Juraj Šimko
Cross-Speaker Variation in Voice Source Correlates of Focus and Deaccentuation
Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl
Acoustic Characterization of Word-Final Glottal Stops in Mizo and Assam Sora
Sishir Kalita, Wendy Lalhminghlui, Luke Horo, Priyankoo Sarmah, S.R. Mahadeva Prasanna, Samarendra Dandapat
Iterative Optimal Preemphasis for Improved Glottal-Flow Estimation by Iterative Adaptive Inverse Filtering
Parham Mokhtari, Hiroshi Ando
Automatic Measurement of Pre-Aspiration
Yaniv Sheena, Míša Hejná, Yossi Adi, Joseph Keshet
Acoustic and Electroglottographic Study of Breathy and Modal Vowels as Produced by Heritage and Native Gujarati Speakers
Kiranpreet Nara
An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis
Xin Wang, Shinji Takaki, Junichi Yamagishi
Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure Information
Viacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Thomas Merritt, Thomas Drugman
Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement
Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura
DNN-SPACE: DNN-HMM-Based Generative Model of Voice F0 Contours for Statistical Phrase/Accent Command Estimation
Nobukatsu Hojo, Yasuhito Ohsugi, Yusuke Ijima, Hirokazu Kameoka
Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis
Zofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson
Increasing Recall of Lengthening Detection via Semi-Automatic Classification
Simon Betz, Jana Voße, Sina Zarrieß, Petra Wagner
Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
Aharon Satt, Shai Rozenberg, Ron Hoory
Interaction and Transition Model for Speech Emotion Recognition in Dialogue
Ruo Zhang, Ando Atsushi, Satoshi Kobashikawa, Yushi Aono
Progressive Neural Networks for Transfer Learning in Emotion Recognition
John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost
Jointly Predicting Arousal, Valence and Dominance with Multi-Task Learning
Srinivas Parthasarathy, Carlos Busso
Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network
Duc Le, Zakaria Aldeneh, Emily Mower Provost
Towards Speech Emotion Recognition “in the Wild” Using Aggregated Corpora and Deep Multi-Task Learning
Jaebok Kim, Gwenn Englebienne, Khiet P. Truong, Vanessa Evers
Speaker-Dependent WaveNet Vocoder
Akira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda
Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension
Yu Gu, Zhen-Hua Ling
Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis
Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis
Srikanth Ronanki, Oliver Watts, Simon King
Statistical Voice Conversion with WaveNet-Based Waveform Generation
Kazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda
Google’s Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders
Vincent Wan, Yannis Agiomyrgiannakis, Hanna Silen, Jakub Vít
A Comparison of Sentence-Level Speech Intelligibility Metrics
Alexander Kain, Max Del Giudice, Kris Tjaden
An Auditory Model of Speaker Size Perception for Voiced Speech Sounds
Toshio Irino, Eri Takimoto, Toshie Matsui, Roy D. Patterson
The Recognition of Compounds: A Computational Account
L. ten Bosch, L. Boves, M. Ernestus
Humans do not Maximize the Probability of Correct Decision When Recognizing DANTALE Words in Noise
Mohsen Zareian Jahromi, Jan Østergaard, Jesper Jensen
Single-Ended Prediction of Listening Effort Based on Automatic Speech Recognition
Rainer Huber, Constantin Spille, Bernd T. Meyer
Modeling Categorical Perception with the Receptive Fields of Auditory Neurons
Chris Neufeld
A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation
Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee
Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources
Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Kateřina Žmolíková, Tomohiro Nakatani
Time-Frequency Masking for Blind Source Separation with Preserved Spatial Cues
Shadi Pirhosseinloo, Kostas Kokkinakis
Variational Recurrent Neural Networks for Speech Separation
Jen-Tzung Chien, Kuan-Ting Kuo
Detecting Overlapped Speech on Short Timeframes Using Deep Learning
Valentin Andrei, Horia Cucu, Corneliu Burileanu
Ideal Ratio Mask Estimation Using Deep Neural Networks for Monaural Speech Segregation in Noisy Reverberant Conditions
Xu Li, Junfeng Li, Yonghong Yan
The Vocative Chant and Beyond: German Calling Melodies Under Routine and Urgent Contexts
Sergio I. Quiroz, Marzena Żygis
Comparing Languages Using Hierarchical Prosodic Analysis
Juraj Šimko, Antti Suni, Katri Hiovain, Martti Vainio
Intonation Facilitates Prediction of Focus Even in the Presence of Lexical Tones
Martin Ho Kwan Ip, Anne Cutler
Mind the Peak: When Museum is Temporarily Understood as Musical in Australian English
Katharina Zahner, Heather Kember, Bettina Braun
Pashto Intonation Patterns
Luca Rognoni, Judith Bishop, Miriam Corris
A New Model of Final Lowering in Spontaneous Monologue
Kikuo Maekawa
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
Adversarial Auto-Encoders for Speech Based Emotion Recognition
Saurabh Sahu, Rahul Gupta, Ganesh Sivaraman, Wael AbdAlmageed, Carol Espy-Wilson
An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture Regression
Ting Dang, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah
Capturing Long-Term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition
Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin McInnis, Emily Mower Provost
Voice-to-Affect Mapping: Inferences on Language Voice Baseline Settings
Ailbhe Ní Chasaide, Irena Yanushevskaya, Christer Gobl
Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech
Michael Neumann, Ngoc Thang Vu
Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities
Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
Learning Latent Representations for Speech Generation and Transformation
Wei-Ning Hsu, Yu Zhang, James Glass
Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus
Tetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu
Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks
Takuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino
A Mouth Opening Effect Based on Pole Modification for Expressive Singing Voice Transformation
Luc Ardaillon, Axel Roebel
Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and Conversion
Seyed Hamidreza Mohammadi, Alexander Kain
Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping
Haşim Sak, Matt Shannon, Kanishka Rao, Françoise Beaufays
Highway-LSTM and Recurrent Highway Networks for Speech Recognition
Golan Pundak, Tara N. Sainath
Improving Speech Recognition by Revising Gated Recurrent Units
Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Stochastic Recurrent Neural Network for Speech Recognition
Jen-Tzung Chien, Chen Shen
Frame and Segment Level Recurrent Neural Networks for Phone Classification
Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf
Deep Learning-Based Telephony Speech Recognition in the Wild
Kyu J. Han, Seongjun Hahm, Byung-Hak Kim, Jungsuk Kim, Ian Lane
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016
Kong Aik Lee, SRE’16 I4U Group
The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation System
Pedro A. Torres-Carrasquillo, Fred Richardson, Shahan Nercessian, Douglas Sturim, William Campbell, Youngjune Gwon, Swaroop Vattam, Najim Dehak, Harish Mallidi, Phani Sankar Nidadavolu, Ruizhi Li, Reda Dehak
Nuance - Politecnico di Torino’s 2016 NIST Speaker Recognition Evaluation System
Daniele Colibro, Claudio Vair, Emanuele Dalmasso, Kevin Farrell, Gennady Karvitsky, Sandro Cumani, Pietro Laface
UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation
Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H.L. Hansen
Analysis and Description of ABC Submission to NIST SRE 2016
Oldřich Plchot, Pavel Matějka, Anna Silnova, Ondřej Novotný, Mireia Diez Sánchez, Johan Rohdin, Ondřej Glembek, Niko Brümmer, Albert Swart, Jesús Jorrín-Prieto, Paola García, Luis Buera, Patrick Kenny, Jahangir Alam, Gautam Bhattacharya
The 2016 NIST Speaker Recognition Evaluation
Seyed Omid Sadjadi, Timothée Kheyrkhah, Audrey Tong, Craig Greenberg, Douglas Reynolds, Elliot Singer, Lisa Mason, Jaime Hernandez-Cordero
A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis
Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino
Speaking Style Conversion from Normal to Lombard Speech Using a Glottal Vocoder and Bayesian GMMs
Ana Ramírez López, Shreyas Seshadri, Lauri Juvela, Okko Räsänen, Paavo Alku
Reducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech System
Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities
Alexander Sorin, Slava Shechtman, Asaf Rendel
Modeling Laryngeal Muscle Activation Noise for Low-Order Physiological Based Speech Synthesis
Rodrigo Manríquez, Sean D. Peterson, Pavel Prado, Patricio Orio, Matías Zañartu
Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis
Felipe Espic, Cassia Valentini Botinhao, Simon King
Similar Prosodic Structure Perceived Differently in German and English
Heather Kember, Ann-Kathrin Grohe, Katharina Zahner, Bettina Braun, Andrea Weber, Anne Cutler
Disambiguate or not? — The Role of Prosody in Unambiguous and Potentially Ambiguous Anaphora Production in Strictly Mandarin Parallel Structures
Luying Hou, Bert Le Bruyn, René Kager
Acoustic Properties of Canonical and Non-Canonical Stress in French, Turkish, Armenian and Brazilian Portuguese
Angeliki Athanasopoulou, Irene Vogel, Hossep Dolatian
Phonological Complexity, Segment Rate and Speech Tempo Perception
Leendert Plug, Rachel Smith
On the Duration of Mandarin Tones
Jing Yang, Yu Zhang, Aijun Li, Li Xu
The Formant Dynamics of Long Close Vowels in Three Varieties of Swedish
Otto Ewald, Eva Liina Asu, Susanne Schötz
Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children’s Speech
Yao Qian, Keelan Evanini, Xinhao Wang, Chong Min Lee, Matthew Mulholland
Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW
Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu
Off-Topic Spoken Response Detection Using Siamese Convolutional Neural Networks
Chong Min Lee, Su-Youn Yoon, Xihao Wang, Matthew Mulholland, Ikkyu Choi, Keelan Evanini
Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active Learning
Vipul Arora, Aditi Lahiri, Henning Reetz
Detection of Mispronunciations and Disfluencies in Children Reading Aloud
Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão
Automatic Assessment of Non-Native Prosody by Measuring Distances on Prosodic Label Sequences
David Escudero-Mancebo, César González-Ferreras, Lourdes Aguilar, Eva Estebas-Vilaplana
Inferring Stance from Prosody
Nigel G. Ward, Jason C. Carlson, Olac Fuentes, Diego Castan, Elizabeth E. Shriberg, Andreas Tsiartas
Exploring Dynamic Measures of Stance in Spoken Interaction
Gina-Anne Levow, Richard A. Wright
Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields
Valentin Barriere, Chloé Clavel, Slim Essid
Transfer Learning Between Concepts for Human Behavior Modeling: An Application to Sincerity and Deception Prediction
Qinyi Luo, Rahul Gupta, Shrikanth S. Narayanan
The Sound of Deception — What Makes a Speaker Credible?
Anne Schröder, Simon Stone, Peter Birkholz
Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection
Gideon Mendels, Sarah Ita Levitan, Kai-Zhan Lee, Julia Hirschberg
A Generative Model for Score Normalization in Speaker Recognition
Albert Swart, Niko Brümmer
Content Normalization for Text-Dependent Speaker Verification
Subhadeep Dey, Srikanth Madikeri, Petr Motlicek, Marc Ferras
End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances
Chunlei Zhang, Kazuhito Koishida
Adversarial Network Bottleneck Features for Noise Robust Speaker Verification
Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo
What Does the Speaker Embedding Encode?
Shuai Wang, Yanmin Qian, Kai Yu
Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification
Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong Aik Lee
DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances
Jinghua Zhong, Wenping Hu, Frank K. Soong, Helen Meng
Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions
Ville Vestman, Dhananjaya Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen
Deep Speaker Embeddings for Short-Duration Speaker Verification
Gautam Bhattacharya, Jahangir Alam, Patrick Kenny
Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems
Soo Jin Park, Gary Yeung, Jody Kreiman, Patricia A. Keating, Abeer Alwan
Gain Compensation for Fast i-Vector Extraction Over Short Duration
Kong Aik Lee, Haizhou Li
Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification
Hee-soo Heo, Jee-weon Jung, IL-ho Yang, Sung-hyun Yoon, Ha-jin Yu
Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least Squares
Chen Chen, Jiqing Han, Yilin Pan
Deep Speaker Feature Learning for Text-Independent Speaker Verification
Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang
Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification
Pierre-Michel Bousquet, Mickael Rouvier
Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition
Alan McCree, Gregory Sell, Daniel Garcia-Romero
Improving the Effectiveness of Speaker Verification Domain Adaptation with Inadequate In-Domain Data
Bengt J. Borgström, Elliot Singer, Douglas Reynolds, Seyed Omid Sadjadi
i-Vector DNN Scoring and Calibration for Noise Robust Speaker Verification
Zhili Tan, Man-Wai Mak
Analysis of Score Normalization in Multilingual Speaker Recognition
Pavel Matějka, Ondřej Novotný, Oldřich Plchot, Lukáš Burget, Mireia Diez Sánchez, Jan Černocký
Alternative Approaches to Neural Network Based Speaker Verification
Anna Silnova, Lukáš Burget, Jan Černocký
A Distribution Free Formulation of the Total Variability Model
Ruchir Travadi, Shrikanth S. Narayanan
Domain Mismatch Modeling of Out-Domain i-Vectors for PLDA Speaker Verification
Md. Hafizur Rahman, Ivan Himawan, David Dean, Sridha Sridharan
An Exploration of Dropout with LSTMs
Gaofeng Cheng, Vijayaditya Peddinti, Daniel Povey, Vimal Manohar, Sanjeev Khudanpur, Yonghong Yan
Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee
Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling
Dung T. Tran, Marc Delcroix, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani
Forward-Backward Convolutional LSTM for Acoustic Modeling
Shigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani
Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Sercan Ö. Arık, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates
Deep Activation Mixture Model for Speech Recognition
Chunyang Wu, Mark J.F. Gales
Ensembles of Multi-Scale VGG Acoustic Models
Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura
Training Context-Dependent DNN Acoustic Models Using Probabilistic Sampling
Tamás Grósz, Gábor Gosztolya, László Tóth
A Comparative Evaluation of GMM-Free State Tying Methods for ASR
Tamás Grósz, Gábor Gosztolya, László Tóth
Backstitch: Counteracting Finite-Sample Bias via Negative Steps
Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur
Node Pruning Based on Entropy of Weights and Node Activity for Small-Footprint Acoustic Model Based on Deep Neural Networks
Ryu Takeda, Kazuhiro Nakadai, Kazunori Komatani
End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow
Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani
An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix Multiplication
Khe Chai Sim, Arun Narayanan
Parallel Neural Network Features for Improved Tandem Acoustic Modeling
Zoltán Tüske, Wilfried Michel, Ralf Schlüter, Hermann Ney
Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis
Qingming Tang, Weiran Wang, Karen Livescu
Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks
Ryo Masumura, Taichi Asami, Hirokazu Masataki, Ryo Ishii, Ryuichiro Higashinaka
Improving Prediction of Speech Activity Using Multi-Participant Respiratory State
Marcin Włodarczak, Kornel Laskowski, Mattias Heldner, Kätlin Aare
Turn-Taking Offsets and Dialogue Context
Peter A. Heeman, Rebecca Lunsford
Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue Systems
Angelika Maier, Julian Hough, David Schlangen
End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech
Yuichi Ishimoto, Takehiro Teraoka, Mika Enomoto
Turn-Taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents
Chaoran Liu, Carlos Ishi, Hiroshi Ishiguro
Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTC
Hirofumi Inaguma, Koji Inoue, Masato Mimura, Tatsuya Kawahara
Entrainment in Multi-Party Spoken Dialogues at Multiple Linguistic Levels
Zahra Rahimi, Anish Kumar, Diane Litman, Susannah Paletz, Mingzhi Yu
Measuring Synchrony in Task-Based Dialogues
Justine Reverdy, Carl Vogel
Sequence to Sequence Modeling for User Simulation in Dialog Systems
Paul Crook, Alex Marin
Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions
Vikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft
Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls
Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono
Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning
Stefan Ultes, Paweł Budzianowski, Iñigo Casanueva, Nikola Mrkšić, Lina Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gašić, Steve Young
Analysis of the Relationship Between Prosodic Features of Fillers and its Forms or Occurrence Positions
Shizuka Nakamura, Ryosuke Nakanishi, Katsuya Takanashi, Tatsuya Kawahara
Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions
Syeda Narjis Fatima, Engin Erzin
An Automatically Aligned Corpus of Child-Directed Speech
Micha Elsner, Kiwako Ito
A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences
Ocke-Schwen Bohn, Trine Askjær-Jørgensen
On the Role of Temporal Variability in the Acquisition of the German Vowel Length Contrast
Felicitas Kleber
A Data-Driven Approach for Perceptually Validated Acoustic Features for Children’s Sibilant Fricative Productions
Patrick F. Reidy, Mary E. Beckman, Jan Edwards, Benjamin Munson
Proficiency Assessment of ESL Learner’s Sentence Prosody with TTS Synthesized Voice as Reference
Yujia Xiao, Frank K. Soong
Mechanisms of Tone Sandhi Rule Application by Non-Native Speakers
Si Chen, Yunjuan He, Chun Wah Yuen, Bei Li, Yike Yang
Changes in Early L2 Cue-Weighting of Non-Native Speech: Evidence from Learners of Mandarin Chinese
Seth Wiener
Directing Attention During Perceptual Training: A Preliminary Study of Phonetic Learning in Southern Min by Mandarin Speakers
Ying Chen, Eric Pederson
Prosody Analysis of L2 English for Naturalness Evaluation Through Speech Modification
Dean Luo, Ruxin Luo, Lixin Wang
Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production
Gintarė Grigonytė, Gerold Schneider
Lexical Adaptation to a Novel Accent in German: A Comparison Between German, Swedish, and Finnish Listeners
Adriana Hanulíková, Jenny Ekström
Qualitative Differences in L3 Learners’ Neurophysiological Response to L1 versus L2 Transfer
Alejandra Keidel Fernández, Thomas Hörberg
Articulation Rate in Swedish Child-Directed Speech Increases as a Function of the Age of the Child Even When Surprisal is Controlled for
Johan Sjons, Thomas Hörberg, Robert Östling, Johannes Bjerva
The Relationship Between the Perception and Production of Non-Native Tones
Kaile Zhang, Gang Peng
MMN Responses in Adults After Exposure to Bimodal and Unimodal Frequency Distributions of Rotated Speech
Ellen Marklund, Elísabet Eir Cortes, Johan Sjons
Float Like a Butterfly Sting Like a Bee: Changes in Speech Preceded Parkinsonism Diagnosis for Muhammad Ali
Visar Berisha, Julie Liss, Timothy Huston, Alan Wisler, Yishan Jiao, Jonathan Eig
Cepstral and Entropy Analyses in Vowels Excerpted from Continuous Speech of Dysphonic and Control Speakers
Antonella Castellana, Andreas Selamtzis, Giampiero Salvi, Alessio Carullo, Arianna Astolfi
Classification of Bulbar ALS from Kinematic Features of the Jaw and Lips: Towards Computer-Mediated Assessment
Andrea Bandini, Jordan R. Green, Lorne Zinman, Yana Yunusova
Zero Frequency Filter Based Analysis of Voice Disorders
Nagaraj Adiga, Vikram C.M., Keerthi Pullela, S.R. Mahadeva Prasanna
Hypernasality Severity Analysis in Cleft Lip and Palate Speech Using Vowel Space Area
Nikitha K., Sishir Kalita, C.M. Vikram, M. Pushpavathi, S.R. Mahadeva Prasanna
Automatic Prediction of Speech Evaluation Metrics for Dysarthric Speech
Imed Laaridh, Waad Ben Kheder, Corinne Fredouille, Christine Meunier
Apkinson — A Mobile Monitoring Solution for Parkinson’s Disease
Philipp Klumpp, Thomas Janu, Tomás Arias-Vergara, J.C. Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth
Dysprosody Differentiate Between Parkinson’s Disease, Progressive Supranuclear Palsy, and Multiple System Atrophy
Jan Hlavnička, Tereza Tykalová, Roman Čmejla, Jiří Klempíř, Evžen Růžička, Jan Rusz
Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks
Ming Tu, Visar Berisha, Julie Liss
Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition
Bhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu
Prediction of Speech Delay from Acoustic Measurements
Jason Lilley, Madhavi Ratnagiri, H. Timothy Bunnell
The Frequency Range of “The Ling Six Sounds” in Standard Chinese
Aijun Li, Hua Zhang, Wen Sun
Production of Sustained Vowels and Categorical Perception of Tones in Mandarin Among Cochlear-Implanted Children
Wentao Gu, Jiao Yin, James Mahshie
Audio Content Based Geotagging in Multimedia
Anurag Kumar, Benjamin Elizalde, Bhiksha Raj
Time Delay Histogram Based Speech Source Separation Using a Planar Array
Zhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan
Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech Sequence
Gayadhar Pradhan, Avinash Kumar, S. Shahnawazuddin
A Contrast Function and Algorithm for Blind Separation of Audio Signals
Wei Gao, Roberto Togneri, Victor Sreeram
Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source
Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li
Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern
Feng Guo, Yuhang Cao, Zheng Liu, Jiaen Liang, Baoqing Li, Xiaobing Yuan
A Mask Estimation Method Integrating Data Field Model for Speech Enhancement
Xianyun Wang, Changchun Bao, Feng Bao
Improved End-of-Query Detection for Streaming Speech Recognition
Matt Shannon, Gabor Simko, Shuo-Yiin Chang, Carolina Parada
Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AED
Di He, Zuofu Cheng, Mark Hasegawa-Johnson, Deming Chen
Improving Source Separation via Multi-Speaker Representations
Jeroen Zegers, Hugo Van hamme
Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector
Bing Yang, Hong Liu, Cheng Pang
Subband Selection for Binaural Speech Source Localization
Girija Ramesan Karthik, Prasanta Kumar Ghosh
Unmixing Convolutive Mixtures by Exploiting Amplitude Co-Modulation: Methods and Evaluation on Mandarin Speech Recordings
Bo-Rui Chen, Huang-Yi Lee, Yi-Wen Liu
Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection
Fei Tao, Carlos Busso
Domain-Specific Utterance End-Point Detection for Speech Recognition
Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Björn Hoffmeister
Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant Environments
Vinay Kothapally, John H.L. Hansen
A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement
Yi-Chiao Wu, Hsin-Te Hwang, Syu-Siang Wang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang
Multi-Target Ensemble Learning for Monaural Speech Separation
Hui Zhang, Xueliang Zhang, Guanglai Gao
Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search
Atsunori Ogawa, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani
Subjective Intelligibility of Deep Neural Network-Based Speech Enhancement
Femke B. Gelderblom, Tron V. Tronstad, Erlend Magnus Viggen
Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility
Maria Koutsogiannaki, Holly Francois, Kihyun Choo, Eunmi Oh
On the Influence of Modifying Magnitude and Phase Spectrum to Enhance Noisy Speech Signals
Hans-Günter Hirsch, Michael Gref
MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement
Robert Rehr, Timo Gerkmann
Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement
Ricard Marxer, Jon Barker
A Fully Convolutional Neural Network for Speech Enhancement
Se Rim Park, Jin Won Lee
Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization
Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino
A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation
Danny Websdale, Ben Milner
Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification
Daniel Michelsanti, Zheng-Hua Tan
Speech Enhancement Using Bayesian Wavenet
Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florêncio, Mark Hasegawa-Johnson
Binaural Reverberant Speech Separation Based on Deep Neural Networks
Xueliang Zhang, DeLiang Wang
On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening Enhancement
Tudor-Cătălin Zorilă, Yannis Stylianou
Applications of the BBN Sage Speech Processing Platform
Ralf Meermeier, Sean Colbath
Bob Speaks Kaldi
Milos Cernak, Alain Komaty, Amir Mohammadi, André Anjos, Sébastien Marcel
Real Time Pitch Shifting with Formant Structure Preservation Using the Phase Vocoder
Michał Lenarczyk
A Signal Processing Approach for Speaker Separation Using SFF Analysis
Nivedita Chennupati, B.H.V.S. Narayana Murthy, B. Yegnanarayana
Speech Recognition and Understanding on Hardware-Accelerated DSP
Georg Stemmer, Munir Georges, Joachim Hofer, Piotr Rozen, Josef Bauer, Jakub Nowicki, Tobias Bocklet, Hannah R. Colett, Ohad Falik, Michael Deisher, Sylvia J. Downing
MetaLab: A Repository for Meta-Analyses on Language Development, and More
Sho Tsuji, Christina Bergmann, Molly Lewis, Mika Braginsky, Page Piccinini, Michael C. Frank, Alejandrina Cristia
Evolving Recurrent Neural Networks That Process and Classify Raw Audio in a Streaming Fashion
Adrien Daniel
Combining Gaussian Mixture Models and Segmental Feature Models for Speaker Recognition
Milana Milošević, Ulrike Glavitsch
“Did you laugh enough today?” — Deep Neural Networks for Mobile and Wearable Laughter Trackers
Gerhard Hagerer, Nicholas Cummins, Florian Eyben, Björn Schuller
Low-Frequency Ultrasonic Communication for Speech Broadcasting in Public Transportation
Kwang Myung Jeon, Nam Kyun Kim, Chan Woong Kwak, Jung Min Moon, Hong Kook Kim
Real-Time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson
Sean U.N. Wood, Jean Rouat
Reading Validation for Pronunciation Evaluation in the Digitala Project
Aku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo
Conversing with Social Agents That Smile and Laugh
Catherine Pelachaud
Team ELISA System for DARPA LORELEI Speech Evaluation 2016
Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondřej Glembek, Murali Karthick Baskar, Martin Karafiát, Lukáš Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan May, Kevin Knight, Shrikanth S. Narayanan
First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe Region
Péter Mihajlik, Lili Szabó, Balázs Tarján, András Balog, Krisztina Rábai
The Motivation and Development of MPAi, a Māori Pronunciation Aid
C.I. Watson, P.J. Keegan, M.A. Maclagan, R. Harlow, J. King
On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic Modeling
Siyuan Feng, Tan Lee
Deep Auto-Encoder Based Multi-Task Learning Using Probabilistic Transcriptions
Amit Das, Mark Hasegawa-Johnson, Karel Veselý
Areal and Phylogenetic Features for Multilingual Speech Synthesis
Alexander Gutkin, Richard Sproat
SLPAnnotator: Tools for Implementing Sign Language Phonetic Annotation
Kathleen Currie Hall, Scott Mackie, Michael Fry, Oksana Tkachman
The LENA System Applied to Swedish: Reliability of the Adult Word Count Estimate
Iris-Corinna Schwarz, Noor Botros, Alekzandra Lord, Amelie Marcusson, Henrik Tidelius, Ellen Marklund
What do Babies Hear? Analyses of Child- and Adult-Directed Speech
Marisa Casillas, Andrei Amatuni, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Elika Bergelson
A New Workflow for Semi-Automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language Environments
Marisa Casillas, Elika Bergelson, Anne S. Warlaumont, Alejandrina Cristia, Melanie Soderstrom, Mark VanDam, Han Sloetjes
Top-Down versus Bottom-Up Theories of Phonological Acquisition: A Big Data Approach
Christina Bergmann, Sho Tsuji, Alejandrina Cristia
Which Acoustic and Phonological Factors Shape Infants’ Vowel Discrimination? Exploiting Natural Variation in InPhonDB
Sho Tsuji, Alejandrina Cristia
The ABAIR Initiative: Bringing Spoken Irish into the Digital Space
Ailbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl
Very Low Resource Radio Browsing for Agile Developmental and Humanitarian Monitoring
Armin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John Quinn, Thomas Niesler
Extracting Situation Frames from Non-English Speech: Evaluation Framework and Pilot Results
Nikolaos Malandrakis, Ondřej Glembek, Shrikanth S. Narayanan
Eliciting Meaningful Units from Speech
Daniil Kocharov, Tatiana Kachkovskaia, Pavel Skrelin
Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications
Saurabhchand Bhati, Shekhar Nayak, K. Sri Rama Murty
Machine Assisted Analysis of Vowel Length Contrasts in Wolof
Elodie Gauthier, Laurent Besacier, Sylvie Voisin
Leveraging Text Data for Word Segmentation for Underresourced Languages
Thomas Glarner, Benedikt Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach
Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization
Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu
Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource Languages
Basil Abraham, S. Umesh, Neethu Mariam Joy
Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages
Basil Abraham, Tejaswi Seeram, S. Umesh
Building an ASR Corpus Using Althingi’s Parliamentary Speeches
Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jón Guðnason
Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software
Tanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister
Building ASR Corpora Using Eyra
Jón Guðnason, Matthías Pétursson, Róbert Kjaran, Simon Klüpfel, Anna Björk Nikulásdóttir
Rapid Development of TTS Corpora for Four South African Languages
Daniel van Niekerk, Charl van Heerden, Marelie Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha
Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages
Alexander Gutkin
Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili
Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King
Multi-Task Learning for Mispronunciation Detection on Singapore Children’s Mandarin Speech
Rong Tong, Nancy F. Chen, Bin Ma
Relating Unsupervised Word Segmentation to Reported Vocabulary Acquisition
Elin Larsen, Alejandrina Cristia, Emmanuel Dupoux
Modelling the Informativeness of Non-Verbal Cues in Parent-Child Interaction
Mats Wirén, Kristina N. Björkenstam, Robert Östling
Computational Simulations of Temporal Vocalization Behavior in Adult-Child Interaction
Ellen Marklund, David Pagmar, Tove Gerholm, Lisa Gustavsson
Approximating Phonotactic Input in Children’s Linguistic Environments from Orthographic Transcripts
Sofia Strömbergsson, Jens Edlund, Jana Götze, Kristina Nilsson Björkenstam
Learning Weakly Supervised Multimodal Phoneme Embeddings
Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux
Personalized Quantification of Voice Attractiveness in Multidimensional Merit Space
Yasunari Obuchi
The Role of Temporal Amplitude Modulations in the Political Arena: Hillary Clinton vs. Donald Trump
Hans Rutger Bosker
Perceptual Ratings of Voice Likability Collected Through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing
Laura Fernández Gallardo, Rafael Zequeira Jiménez, Sebastian Möller
Attractiveness of French Voices for German Listeners — Results from Native and Non-Native Read Speech
Jürgen Trouvain, Frank Zimmerer
Social Attractiveness in Dialogs
Antje Schweitzer, Natalie Lewandowski, Daniel Duran
A Gender Bias in the Acoustic-Melodic Features of Charismatic Speech?
Eszter Novák-Tót, Oliver Niebuhr, Aoju Chen
Pitch Convergence as an Effect of Perceived Attractiveness and Likability
Jan Michalsky, Heike Schoormann
Does Posh English Sound Attractive?
Li Jiao, Chengxia Wang, Cristiane Hsu, Peter Birkholz, Yi Xu
Large-Scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings
Timo Baumann
Aerodynamic Features of French Fricatives
Rosario Signorello, Sergio Hassid, Didier Demolin
Inter-Speaker Variability: Speaker Normalisation and Quantitative Estimation of Articulatory Invariants in Speech Production for French
Antoine Serrurier, Pierre Badin, Louis-Jean Boë, Laurent Lamalle, Christiane Neuschaefer-Rube
Comparison of Basic Beatboxing Articulations Between Expert and Novice Artists Using Real-Time Magnetic Resonance Imaging
Nimisha Patil, Timothy Greer, Reed Blaylock, Shrikanth S. Narayanan
Speaker-Specific Biomechanical Model-Based Investigation of a Simple Speech Task Based on Tagged-MRI
Keyi Tang, Negar M. Harandi, Jonghye Woo, Georges El Fakhri, Maureen Stone, Sidney Fels
Sounds of the Human Vocal Tract
Reed Blaylock, Nimisha Patil, Timothy Greer, Shrikanth S. Narayanan
A Simulation Study on the Effect of Glottal Boundary Conditions on Vocal Tract Formants
Yasufumi Uezu, Tokihiko Kaburagi
A Robust and Alternative Approach to Zero Frequency Filtering Method for Epoch Extraction
P. Gangamohan, B. Yegnanarayana
Improving YANGsaf F0 Estimator with Adaptive Kalman Filter
Kanru Hua
A Spectro-Temporal Demodulation Technique for Pitch Estimation
Jitendra Kumar Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula
Robust Method for Estimating F0 of Complex Tone Based on Pitch Perception of Amplitude Modulated Signal
Kenichiro Miwa, Masashi Unoki
Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution Spectra
Simon Graf, Tobias Herbig, Markus Buck, Gerhard Schmidt
Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals
Masanori Morise
Prosodic Event Recognition Using Convolutional Neural Networks with Context Information
Sabrina Stehwien, Ngoc Thang Vu
Prosodic Facilitation and Interference While Judging on the Veracity of Synthesized Statements
Ramiro H. Gálvez, Štefan Beňuš, Agustín Gravano, Marian Trnka
An Investigation of Pitch Matching Across Adjacent Turns in a Corpus of Spontaneous German
Margaret Zellers, Antje Schweitzer
The Relationship Between F0 Synchrony and Speech Convergence in Dyadic Interaction
Sankar Mukherjee, Alessandro D’Ausilio, Noël Nguyen, Luciano Fadiga, Leonardo Badino
The Role of Linguistic and Prosodic Cues on the Prediction of Self-Reported Satisfaction in Contact Centre Phone Calls
Jordi Luque, Carlos Segura, Ariadna Sánchez, Martí Umbert, Luis Angel Galindo
Cross-Linguistic Study of the Production of Turn-Taking Cues in American English and Argentine Spanish
Pablo Brusco, Juan Manuel Pérez, Agustín Gravano
Emotional Features for Speech Overlaps Classification
Olga Egorow, Andreas Wendemuth
Computing Multimodal Dyadic Behaviors During Spontaneous Diagnosis Interviews Toward Automatic Categorization of Autism Spectrum Disorder
Chin-Po Chen, Xian-Hong Tseng, Susan Shur-Fen Gau, Chi-Chun Lee
Deriving Dyad-Level Interaction Representation Using Interlocutors Structural and Expressive Multimodal Behavior Features
Yun-Shao Lin, Chi-Chun Lee
Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective
Raymond Brueckner, Maximilian Schmitt, Maja Pantic, Björn Schuller
Optimized Time Series Filters for Detecting Laughter and Filler Events
Gábor Gosztolya
Visual, Laughter, Applause and Spoken Expression Features for Predicting Engagement Within TED Talks
Fasih Haider, Fahim A. Salim, Saturnino Luz, Carl Vogel, Owen Conlan, Nick Campbell
Large-Scale Domain Adaptation via Teacher-Student Learning
Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong
Improving Children’s Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion
W. Ahmad, S. Shahnawazuddin, H.K. Kathania, Gayadhar Pradhan, A.B. Samaddar
RNN-LDA Clustering for Feature Based DNN Adaptation
Xurong Xie, Xunying Liu, Tan Lee, Lan Wang
Robust Online i-Vectors for Unsupervised Adaptation of DNN Acoustic Models: A Study in the Context of Digital Voice Assistants
Harish Arsikere, Sri Garimella
Semi-Supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control
Ajay Srinivasamurthy, Petr Motlicek, Ivan Himawan, György Szaszák, Youssef Oualil, Hartmut Helmke
Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition
Taesup Kim, Inchul Song, Yoshua Bengio
An Entrained Rhythm’s Frequency, Not Phase, Influences Temporal Sampling of Speech
Hans Rutger Bosker, Anne Kösem
Context Regularity Indexed by Auditory N1 and P2 Event-Related Potentials
Xiao Wang, Yanhui Zhang, Gang Peng
Discovering Language in Marmoset Vocalization
Sakshi Verma, K.L. Prateek, Karthik Pandia, Nauman Dawalatabad, Rogier Landman, Jitendra Sharma, Mriganka Sur, Hema A. Murthy
Subject-Independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response During Speech Perception
Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura
The Phonological Status of the French Initial Accent and its Role in Semantic Processing: An Event-Related Potentials Study
Noémie te Rietmolen, Radouane El Yagoubi, Alain Ghio, Corine Astésano
A Neuro-Experimental Evidence for the Motor Theory of Speech Perception
Bin Zhao, Jianwu Dang, Gaoyan Zhang
Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR
Purvi Agrawal, Sriram Ganapathy
Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition
Masato Mimura, Yoshiaki Bando, Kazuki Shimada, Shinsuke Sakai, Kazuyoshi Yoshii, Tatsuya Kawahara
Recognizing Multi-Talker Speech with Permutation Invariant Training
Dong Yu, Xuankai Chang, Yanmin Qian
Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information
Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken’ichi Furuya, Shinji Watanabe, Jonathan Le Roux
Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR
Erfan Loweimi, Jon Barker, Thomas Hain
Robust Speech Recognition via Anchor Word Representations
Brian King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, Sree Hari Krishnan Parthasarathi, Björn Hoffmeister
Towards Zero-Shot Frame Semantic Parsing for Domain Scaling
Ankur Bapna, Gokhan Tür, Dilek Hakkani-Tür, Larry Heck
ClockWork-RNN Based Architectures for Slot Filling
Despoina Georgiadou, Vassilios Diakoloukas, Vassilios Tsiaras, Vassilios Digalakis
Investigating the Effect of ASR Tuning on Named Entity Recognition
Mohamed Ameur Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset
Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding
Marco Dinarelli, Vedran Vukotic, Christian Raymond
Minimum Semantic Error Cost Training of Deep Long Short-Term Memory Networks for Topic Spotting on Conversational Speech
Zhong Meng, Biing-Hwang Juang
Topic Identification for Speech Without ASR
Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog
Bing Liu, Ian Lane
Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates
Heriberto Cuayáhuitl, Seunghak Yu
Towards End-to-End Spoken Dialogue Systems with Turn Embeddings
Ali Orkan Bayer, Evgeny A. Stepanov, Giuseppe Riccardi
Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer Interaction
Oleg Akhtiamov, Maxim Sidorov, Alexey A. Karpov, Wolfgang Minker
Rushing to Judgement: How do Laypeople Rate Caller Engagement in Thin-Slice Videos of Human–Machine Dialog?
Vikram Ramanarayanan, Chee Wee Leong, David Suendermann-Oeft
Hyperarticulation of Corrections in Multilingual Dialogue Systems
Ivan Kraljevski, Diane Hirschfeld
Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion
Benjamin Milde, Christoph Schmidt, Joachim Köhler
Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework
Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur
Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text
Takahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi, Graham Neubig
Improved Subword Modeling for WFST-Based Speech Recognition
Peter Smit, Sami Virpioja, Mikko Kurimo
Pronunciation Learning with RNN-Transducers
Antoine Bruguier, Danushen Gnanapragasam, Leif Johnson, Kanishka Rao, Françoise Beaufays
Learning Similarity Functions for Pronunciation Variations
Einat Naaman, Yossi Adi, Joseph Keshet
Spoken Language Identification Using LSTM-Based Angular Proximity
G. Gelly, J.L. Gauvain
End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling
Ma Jin, Yan Song, Ian McLoughlin, Wu Guo, Li-Rong Dai
Dialect Recognition Based on Unsupervised Bottleneck Features
Qian Zhang, John H.L. Hansen
Investigating Scalability in Hierarchical Language Identification System
Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li
Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech
Yao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A. Pugh, Patrick L. Lange, Hillary R. Molloy, Frank K. Soong
QMDIS: QCRI-MIT Advanced Dialect Identification System
Sameer Khurana, Maryam Najafian, Ahmed Ali, Tuka Al Hanai, Yonatan Belinkov, James Glass
Detection of Replay Attacks Using Single Frequency Filtering Cepstral Coefficients
K.N.R.K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala
Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection
Hardik B. Sailor, Madhu R. Kamble, Hemant A. Patil
Independent Modelling of High and Low Energy Speech Frames for Spoofing Detection
Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah
Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data
Achintya Kr. Sarkar, Md. Sahidullah, Zheng-Hua Tan, Tomi Kinnunen
VoxCeleb: A Large-Scale Speaker Identification Dataset
Arsha Nagrani, Joon Son Chung, Andrew Zisserman
Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition Technology
Karen Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright
Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen
Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation
Takatomo Kano, Sakriani Sakti, Satoshi Nakamura
Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors
Nicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico
Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis
Quoc Truong Do, Sakriani Sakti, Satoshi Nakamura
NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation
Eunah Cho, Jan Niehues, Alex Waibel
Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings
Lukas Drude, Reinhold Haeb-Umbach
Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures
Kateřina Žmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani
Eigenvector-Based Speech Mask Estimation Using Logistic Regression
Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf
Real-Time Speech Enhancement with GCC-NMF
Sean U.N. Wood, Jean Rouat
Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment
Youna Ji, Jun Byun, Young-cheol Park
Glottal Model Based Speech Beamforming for ad-hoc Microphone Arrays
Yang Zhang, Dinei Florêncio, Mark Hasegawa-Johnson
Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features
Yuanyuan Liu, Tan Lee, P.C. Ching, Thomas K.T. Law, Kathy Y.S. Lee
Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech
Emre Yılmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik
Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech
Daniel Smith, Alex Sneddon, Lauren Ward, Andreas Duenser, Jill Freyne, David Silvera-Tawil, Angela Morgan
On Improving Acoustic Models for TORGO Dysarthric Speech Database
Neethu Mariam Joy, S. Umesh, Basil Abraham
Glottal Source Features for Automatic Speech-Based Depression Assessment
Olympia Simantiraki, Paulos Charonyktakis, Anastasia Pampouchidou, Manolis Tsiknakis, Martin Cooke
Speech Processing Approach for Diagnosing Dementia in an Early Stage
Roozbeh Sadeghian, J. David Schaffer, Stephen A. Zahorian
Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals
Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro
Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features
Salil Deena, Raymond W.M. Ng, Pranava Madhyastha, Lucia Specia, Thomas Hain
Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition
Mittul Singh, Youssef Oualil, Dietrich Klakow
Sparse Non-Negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap
Ciprian Chelba, Diamantino Caseiro, Fadi Biadsy
Multi-Scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken Interactions
Manoj Kumar, Daniel Bone, Kelly McWilliams, Shanna Williams, Thomas D. Lyon, Shrikanth S. Narayanan
Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition
Weiwu Zhu
Developing On-Line Speaker Diarization System
Dimitrios Dimitriadis, Petr Fousek
Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing
Shreyas Seshadri, Ulpu Remes, Okko Räsänen
Automatic Evaluation of Children Reading Aloud on Sentences and Pseudowords
Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão
Off-Topic Spoken Response Detection with Word Embeddings
Su-Youn Yoon, Chong Min Lee, Ikkyu Choi, Xinhao Wang, Matthew Mulholland, Keelan Evanini
Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models
Wei Li, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee
Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides
Shoko Tsujimura, Kazumasa Yamamoto, Seiichi Nakagawa
Multiview Representation Learning via Deep CCA for Silent Speech Recognition
Myungjong Kim, Beiming Cao, Ted Mau, Jun Wang
Use of Graphemic Lexicons for Spoken Language Assessment
K.M. Knill, Mark J.F. Gales, K. Kyriakopoulos, A. Ragni, Y. Wang
Distilling Knowledge from an Ensemble of Models for Punctuation Prediction
Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li
A Mostly Data-Driven Approach to Inverse Text Normalization
Ernest Pusateri, Bharat Ram Ambati, Elizabeth Brooks, Ondrej Platek, Donald McAllaster, Venki Nagesha
Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering Approach
Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen, Boon Pang Lim
Experiments in Character-Level Neural Network Models for Punctuation
William Gale, Sarangarajan Parthasarathy
Multi-Channel Apollo Mission Speech Transcripts Calibration
Lakshmish Kaushik, Abhijeet Sangwan, John H.L. Hansen
Calibration Approaches for Language Detection
Mitchell McLaren, Luciana Ferrer, Diego Castan, Aaron Lawson
Bidirectional Modelling for Short Duration Language Identification
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai
Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition
Antonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida
Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels
Sungrack Yun, Hye Jin Jang, Taesu Kim
Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering
Ignacio Viñals, Alfonso Ortega, Jesús Villalba, Antonio Miguel, Eduardo Lleida
LSTM Neural Network-Based Speaker Segmentation Using Acoustic and Language Modelling
Miquel India, José A.R. Fonollosa, Javier Hernando
Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization
Adrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut, Jean-François Bonastre
Homogeneity Measure Impact on Target and Non-Target Trials in Forensic Voice Comparison
Moez Ajili, Jean-François Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn
Null-Hypothesis LLR: A Proposal for Forensic Automatic Speaker Recognition
Yosef A. Solewicz, Michael Jessen, David van der Vloed
The Opensesame NIST 2016 Speaker Recognition Evaluation System
Gang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao
IITG-Indigo System for NIST 2016 SRE Challenge
Nagendra Kumar, Rohan Kumar Das, Sarfaraz Jelil, Dhanush B.K., H. Kashyap, K. Sri Rama Murty, Sriram Ganapathy, Rohit Sinha, S.R. Mahadeva Prasanna
Locally Weighted Linear Discriminant Analysis for Robust Speaker Verification
Abhinav Misra, Shivesh Ranjan, John H.L. Hansen
Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition
Suwon Shon, Seongkyu Mun, Hanseok Ko
Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings
Shane Settle, Keith Levin, Herman Kamper, Karen Livescu
Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection
Daisuke Kaneko, Ryota Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh
Fast and Accurate OOV Decoder on High-Level Features
Yuri Khokhlov, Natalia Tomashenko, Ivan Medennikov, Aleksei Romanenko
Exploring the Use of Significant Words Language Modeling for Spoken Document Retrieval
Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen
Incorporating Acoustic Features for Spontaneous Speech Driven Content Retrieval
Hiroto Tasaki, Tomoyosi Akiba
Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification
Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-Yi Lee, Lin-Shan Lee
Automatic Alignment Between Classroom Lecture Utterances and Slide Components
Masatoshi Tsuchiya, Ryo Minamiguchi
Compensating Gender Variability in Query-by-Example Search on Speech Using Voice Conversion
Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo
Zero-Shot Learning Across Heterogeneous Overlapping Domains
Anjishnu Kumar, Pavankumar Reddy Muddireddy, Markus Dreyer, Björn Hoffmeister
Hierarchical Recurrent Neural Network for Story Segmentation
Emiru Tsunoo, Peter Bell, Steve Renals
Evaluating Automatic Topic Segmentation as a Segment Retrieval Task
Abdessalam Bouchekif, Delphine Charlet, Géraldine Damnati, Nathalie Camelin, Yannick Estève
Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps
Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon
A Relevance Score Estimation for Spoken Term Detection Based on RNN-Generated Pronunciation Embeddings
Jan Švec, Josef V. Psutka, Luboš Šmídl, Jan Trmal
Predicting Automatic Speech Recognition Performance Over Communication Channels from Instrumental Speech Quality and Intelligibility Scores
Laura Fernández Gallardo, Sebastian Möller, John Beerends
Speech Intelligibility in Cars: The Effect of Speaking Style, Noise and Listener Age
Cassia Valentini Botinhao, Junichi Yamagishi
Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio
Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani
Intelligibilities of Mandarin Chinese Sentences with Spectral “Holes”
Yafan Chen, Yong Xu, Jun Yang
The Effect of Situation-Specific Non-Speech Acoustic Cues on the Intelligibility of Speech in Noise
Lauren Ward, Ben Shirley, Yan Tang, William J. Davies
On the Use of Band Importance Weighting in the Short-Time Objective Intelligibility Measure
Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen
Listening in the Dips: Comparing Relevant Features for Speech Recognition in Humans and Machines
Constantin Spille, Bernd T. Meyer
Mental Representation of Japanese Mora; Focusing on its Intrinsic Duration
Kosuke Sugai
Temporal Dynamics of Lateral Channel Formation in /l/: 3D EMA Data from Australian English
Jia Ying, Christopher Carignan, Jason A. Shaw, Michael Proctor, Donald Derrick, Catherine T. Best
Vowel and Consonant Sequences in three Bavarian Dialects of Austria
Nicola Klingler, Sylvia Moosmüller, Hannes Scheutz
Acoustic Cues to the Singleton-Geminate Contrast: The Case of Libyan Arabic Sonorants
Amel Issa
Mel-Cepstral Distortion of German Vowels in Different Information Density Contexts
Erika Brandt, Frank Zimmerer, Bistra Andreeva, Bernd Möbius
Effect of Formant and F0 Discontinuity on Perceived Vowel Duration: Impacts for Concatenative Speech Synthesis
Tomáš Bořil, Pavel Šturm, Radek Skarnitzl, Jan Volín
An Ultrasound Study of Alveolar and Retroflex Consonants in Arrernte: Stressed and Unstressed Syllables
Marija Tabain, Richard Beare
Reshaping the Transformed LF Model: Generating the Glottal Source from the Waveshape Parameter Rd
Christer Gobl
Kinematic Signatures of Prosody in Lombard Speech
Štefan Beňuš, Juraj Šimko, Mona Lehtinen
What do Finnish and Central Bavarian Have in Common? Towards an Acoustically Based Quantity Typology
Markus Jochim, Felicitas Kleber
Locating Burst Onsets Using SFF Envelope and Phase Information
Bhanu Teja Nellore, RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana
A Preliminary Phonetic Investigation of Alphabetic Words in Mandarin Chinese
Hongwei Ding, Yuanyuan Zhang, Hongchao Liu, Chu-Ren Huang
A Quantitative Measure of the Impact of Coarticulation on Phone Discriminability
Thomas Schatz, Rory Turnbull, Francis Bach, Emmanuel Dupoux
Sinusoidal Partials Tracking for Singing Analysis Using the Heuristic of the Minimal Frequency and Magnitude Difference
Kin Wah Edward Lin, Hans Anderson, Clifford So, Simon Lui
Audio Scene Classification with Deep Recurrent Neural Networks
Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Alfred Mertins
Automatic Time-Frequency Analysis of Echolocation Signals Using the Matched Gaussian Multitaper Spectrogram
Maria Sandsten, Isabella Reinhold, Josefin Starkhammar
Classification-Based Detection of Glottal Closure Instants from Speech Signals
Jindřich Matoušek, Daniel Tihelka
A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural Network
Xiaoke Qi, Jianhua Tao
Laryngeal Articulation During Trumpet Performance: An Exploratory Study
Luis M.T. Jesus, Bruno Rocha, Andreia Hall
Matrix of Polynomials Model Based Polynomial Dictionary Learning Method for Acoustic Impulse Response Modeling
Jian Guan, Xuan Wang, Pengming Feng, Jing Dong, Wenwu Wang
Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features
Rakib Hyder, Shabnam Ghaffarzadegan, Zhe Feng, John H.L. Hansen, Taufiq Hasan
An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification
Xue Feng, Brigitte Richardson, Scott Amman, James Glass
Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging
Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
An Audio Based Piano Performance Evaluation Method Using Deep Neural Network Based Acoustic Modeling
Jing Pan, Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, Manman Zhu
Music Tempo Estimation Using Sub-Band Synchrony
Shreyan Chowdhury, Tanaya Guha, Rajesh M. Hegde
A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification
Yun Wang, Florian Metze
A Note Based Query By Humming System Using Convolutional Neural Network
Naziba Mostafa, Pascale Fung
Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification
Hardik B. Sailor, Dharmesh M. Agrawal, Hemant A. Patil
Novel Shifted Real Spectrum for Exact Signal Reconstruction
Meet H. Soni, Rishabh Tak, Hemant A. Patil
Manual and Automatic Transcriptions in Dementia Detection from Speech
Jochen Weiner, Mathis Engelbart, Tanja Schultz
An Affect Prediction Approach Through Depression Severity Parameter Incorporation in Neural Networks
Rahul Gupta, Saurabh Sahu, Carol Espy-Wilson, Shrikanth S. Narayanan
Cross-Database Models for the Classification of Dysarthria Presence
Stephanie Gillespie, Yash-Yee Logan, Elliot Moore, Jacqueline Laures-Gore, Scott Russell, Rupal Patel
Acoustic Evaluation of Nasality in Cerebellar Syndromes
M. Novotný, Jan Rusz, K. Spálenka, Jiří Klempíř, D. Horáková, Evžen Růžička
Emotional Speech of Mentally and Physically Disabled Individuals: Introducing the EmotAsS Database and First Findings
Simone Hantke, Hesam Sagha, Nicholas Cummins, Björn Schuller
Phonological Markers of Oxytocin and MDMA Ingestion
Carla Agurto, Raquel Norel, Rachel Ostrand, Gillinder Bedi, Harriet de Wit, Matthew J. Baggott, Matthew G. Kirkpatrick, Margaret Wardle, Guillermo A. Cecchi
An Avatar-Based System for Identifying Individuals Likely to Develop Dementia
Bahman Mirheidari, Daniel Blackburn, Kirsty Harkness, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen
Cross-Domain Classification of Drowsiness in Speech: The Case of Alcohol Intoxication and Sleep Deprivation
Yue Zhang, Felix Weninger, Björn Schuller
Depression Detection Using Automatic Transcriptions of De-Identified Speech
Paula Lopez-Otero, Laura Docio-Fernandez, Alberto Abad, Carmen Garcia-Mateo
An N-Gram Based Approach to the Automatic Diagnosis of Alzheimer’s Disease from Spoken Language
Sebastian Wankerl, Elmar Nöth, Stefan Evert
Exploiting Intra-Annotator Rating Consistency Through Copeland’s Method for Estimation of Ground Truth Labels in Couples’ Therapy
Karel Mundnich, Md. Nasir, Panayiotis Georgiou, Shrikanth S. Narayanan
Rhythmic Characteristics of Parkinsonian Speech: A Study on Mandarin and Polish
Massimo Pettorino, Wentao Gu, Paweł Półrola, Ping Fan
Trisyllabic Tone 3 Sandhi Patterns in Mandarin Produced by Cantonese Speakers
Jung-Yueh Tu, Janice Wing-Sze Wong, Jih-Ho Cha
Intonation of Contrastive Topic in Estonian
Heete Sahkai, Meelis Mihkla
Reanalyze Fundamental Frequency Peak Delay in Mandarin
Lixia Hao, Wei Zhang, Yanlu Xie, Jinsong Zhang
How Does the Absence of Shared Knowledge Between Interlocutors Affect the Production of French Prosodic Forms?
Amandine Michelas, Cecile Cau, Maud Champagne-Lavau
Three Dimensions of Sentence Prosody and Their (Non-)Interactions
Michael Wagner, Michael McAuliffe
Using Prosody to Classify Discourse Relations
Janine Kleinhans, Mireia Farrús, Agustín Gravano, Juan Manuel Pérez, Catherine Lai, Leo Wanner
Canonical Correlation Analysis and Prediction of Perceived Rhythmic Prominences and Pitch Tones in Speech
Elizabeth Godoy, James R. Williamson, Thomas F. Quatieri
Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions
Sofoklis Kakouros, Okko Räsänen, Paavo Alku
Creaky Voice as a Function of Tonal Categories and Prosodic Boundaries
Jianjing Kuang
The Acoustics of Word Stress in Czech as a Function of Speaking Style
Radek Skarnitzl, Anders Eriksson
What You See is What You Get Prosodically Less — Visibility Shapes Prosodic Prominence Production in Spontaneous Interaction
Petra Wagner, Nataliya Bryhadyr
Focus Acoustics in Mandarin Nominals
Yu-Yin Hsu, Anqi Xu
Exploring Multidimensionality: Acoustic and Articulatory Correlates of Swedish Word Accents
Malin Svensson Lundmark, Gilbert Ambrazaitis, Otto Ewald
The Perception of English Intonation Patterns by German L2 Speakers of English
Karin Puga, Robert Fuchs, Jane Setter, Peggy Mok
The Perception of Emotions in Noisified Nonsense Speech
Emilia Parada-Cabaleiro, Alice Baird, Anton Batliner, Nicholas Cummins, Simone Hantke, Björn Schuller
Attention Networks for Modeling Behaviors in Addiction Counseling
James Gibson, Doğan Can, Panayiotis Georgiou, David C. Atkins, Shrikanth S. Narayanan
Computational Analysis of Acoustic Descriptors in Psychotic Patients
Torsten Wörtwein, Tadas Baltrušaitis, Eugene Laksana, Luciana Pennant, Elizabeth S. Liebson, Dost Öngür, Justin T. Baker, Louis-Philippe Morency
Modeling Perceivers Neural-Responses Using Lobe-Dependent Convolutional Neural Network to Improve Speech Emotion Recognition
Ya-Tse Wu, Hsuan-Yu Chen, Yu-Hsien Liao, Li-Wei Kuo, Chi-Chun Lee
Implementing Gender-Dependent Vowel-Level Analysis for Boosting Speech-Based Depression Recognition
Bogdan Vlasenko, Hesam Sagha, Nicholas Cummins, Björn Schuller
Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets
Farhad Bin Siddique, Pascale Fung
Emotion Category Mapping to Emotional Space by Cross-Corpus Emotion Labeling
Yoshiko Arimoto, Hiroki Mori
Big Five vs. Prosodic Features as Cues to Detect Abnormality in SSPNET-Personality Corpus
Cedric Fayet, Arnaud Delhay, Damien Lolive, Pierre-François Marteau
Speech Rate Comparison When Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Hayakawa Akira, Carl Vogel, Saturnino Luz, Nick Campbell
Approaching Human Performance in Behavior Estimation in Couples Therapy Using Deep Sentence Embeddings
Shao-Yen Tseng, Brian Baucom, Panayiotis Georgiou
Complexity in Speech and its Relation to Emotional Bond in Therapist-Patient Interactions During Suicide Risk Assessment Interviews
Md. Nasir, Brian Baucom, Craig J. Bryan, Shrikanth S. Narayanan, Panayiotis Georgiou
An Investigation of Emotion Dynamics and Kalman Filtering for Speech-Based Emotion Prediction
Zhaocheng Huang, Julien Epps
Zero-Shot Learning for Natural Language Understanding Using Domain-Independent Sequential Structure and Question Types
Kugatsu Sadamitsu, Yukinori Homma, Ryuichiro Higashinaka, Yoshihiro Matsuo
Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification
Naoki Sawada, Ryo Masumura, Hiromitsu Nishizaki
Internal Memory Gate for Recurrent Neural Networks with Application to Spoken Language Understanding
Mohamed Morchid
Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions
Mandy Korpusik, Zachary Collins, James Glass
Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations
Titouan Parcollet, Mohamed Morchid, Georges Linarès
ASR Error Management for Improving Spoken Language Understanding
Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato De Mori
Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks
Mingbo Ma, Kai Zhao, Liang Huang, Bing Xiang, Bowen Zhou
To Plan or not to Plan? Discourse Planning in Slot-Value Informed Sequence to Sequence Models for Language Generation
Neha Nayak, Dilek Hakkani-Tür, Marilyn Walker, Larry Heck
Online Adaptation of an Attention-Based Neural Network for Natural Language Generation
Matthieu Riou, Bassam Jabaian, Stéphane Huet, Fabrice Lefèvre
Spanish Sign Language Recognition with Different Topology Hidden Markov Models
Carlos-D. Martínez-Hinarejos, Zuzanna Parcheta
OpenMM: An Open-Source Multimodal Feature Extraction Tool
Michelle Renee Morales, Stefan Scherer, Rivka Levitan
Speaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement Recognition
Yuyun Huang, Emer Gilmartin, Nick Campbell
Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks
Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
CAB: An Energy-Based Speaker Clustering Model for Rapid Adaptation in Non-Parallel Voice Conversion
Toru Nakashika
Phoneme-Discriminative Features for Dysarthric Speech Conversion
Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki
Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion
Jie Wu, D.-Y. Huang, Lei Xie, Haizhou Li
Speaker Dependent Approach for Enhancing a Glossectomy Patient’s Speech via GMM-Based Voice Conversion
Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi
Generative Adversarial Network-Based Postfilter for STFT Spectrograms
Takuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
Generative Adversarial Network-Based Glottal Waveform Model for Statistical Parametric Speech Synthesis
Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
Emotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional Data
Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki
Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors
Rama Doddipatla, Norbert Braunschweiler, Ranniery Maia
Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion
Runnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai
Segment Level Voice Conversion with Recurrent Neural Networks
Miguel Varela Ramos, Alan W. Black, Ramon Fernandez Astudillo, Isabel Trancoso, Nuno Fonseca
Creating a Voice for MiRo, the World’s First Commercial Biomimetic Robot
Roger K. Moore, Ben Mitchinson
A Thematicity-Based Prosody Enrichment Tool for CTS
Mónica Domínguez, Mireia Farrús, Leo Wanner
WebSubDub — Experimental System for Creating High-Quality Alternative Audio Track for TV Broadcasting
Martin Grůber, Jindřich Matoušek, Zdeněk Hanzlíček, Jakub Vít, Daniel Tihelka
Voice Conservation and TTS System for People Facing Total Laryngectomy
Markéta Jůzová, Daniel Tihelka, Jindřich Matoušek, Zdeněk Hanzlíček
TBT (Toolkit to Build TTS): A High Performance Framework to Build Multiple Language HTS Voice
Atish Shankar Ghone, Rachana Nerpagar, Pranaw Kumar, Arun Baby, Aswin Shanmugam, Sasikumar M., Hema A. Murthy
SIAK — A Game for Foreign Language Pronunciation Learning
Reima Karhila, Sari Ylinen, Seppo Enarvi, Kalle Palomäki, Aleksander Nikulin, Olli Rantula, Vertti Viitanen, Krupakar Dhinakaran, Anna-Riikka Smolander, Heini Kallio, Katja Junttila, Maria Uther, Perttu Hämäläinen, Mikko Kurimo
Integrating the Talkamatic Dialogue Manager with Alexa
Staffan Larsson, Alex Berman, Andreas Krona, Fredrik Kronlid
A Robust Medical Speech-to-Speech/Speech-to-Sign Phraselator
Farhia Ahmed, Pierrette Bouillon, Chelle Destefano, Johanna Gerlach, Sonia Halimi, Angela Hooper, Manny Rayner, Hervé Spechbach, Irene Strasly, Nikos Tsourakis
Towards an Autarkic Embedded Cognitive User Interface
Frank Duckhorn, Markus Huber, Werner Meyer, Oliver Jokisch, Constanze Tschöpe, Matthias Wolff
Nora the Empathetic Psychologist
Genta Indra Winata, Onno Kampman, Yang Yang, Anik Dey, Pascale Fung
Modifying Amazon’s Alexa ASR Grammar and Lexicon — A Case Study
Hassan Alam, Aman Kumar, Manan Vyas, Tina Werner, Rachmat Hartono
The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring
Björn Schuller, Stefan Steidl, Anton Batliner, Elika Bergelson, Jarek Krajewski, Christoph Janott, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Guillermo Hidalgo, Sebastian Schnieder, Clemens Heiser, Winfried Hohenhorst, Michael Herzog, Maximilian Schmitt, Kun Qian, Yue Zhang, George Trigeorgis, Panagiotis Tzirakis, Stefanos Zafeiriou
Description of the Upper Respiratory Tract Infection Corpus (URTIC)
Jarek Krajewski, Sebastian Schieder, Anton Batliner
Description of the Munich-Passau Snore Sound Corpus (MPSSC)
Christoph Janott, Anton Batliner
Description of the Homebank Child/Adult Addressee Corpus (HB-CHAAC)
Elika Bergelson, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont
It Sounds Like You Have a Cold! Testing Voice Features for the Interspeech 2017 Computational Paralinguistics Cold Challenge
Mark Huckvale, András Beke
End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum
Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, Ming Li
Infected Phonemes: How a Cold Impairs Speech on a Phonetic Level
Johannes Wagner, Thiago Fraga-Silva, Yvan Josse, Dominik Schiller, Andreas Seiderer, Elisabeth André
Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy Condition
Akshay Kalkunte Suresh, Srinivasa Raghavan K.M., Prasanta Kumar Ghosh
An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance Based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN
Tin Lay Nwe, Huy Dat Tran, Wen Zheng Terence Ng, Bin Ma
Acoustic Analysis of Detailed Three-Dimensional Shape of the Human Nasal Cavity and Paranasal Sinuses
Tatsuya Kitamura, Hironori Takemoto, Hisanori Makinae, Tetsutaro Yamaguchi, Kotaro Maki
A Semi-Polar Grid Strategy for the Three-Dimensional Finite Element Simulation of Vowel-Vowel Sequences
Marc Arnela, Saeed Dabbaghchian, Oriol Guasch, Olov Engwall
A Fast Robust 1D Flow Model for a Self-Oscillating Coupled 2D FEM Vocal Fold Simulation
Arvind Vasudevan, Victor Zappi, Peter Anderson, Sidney Fels
Waveform Patterns in Pitch Glides Near a Vocal Tract Resonance
Tiina Murtola, Jarmo Malinen
A Unified Numerical Simulation of Vowel Production That Comprises Phonation and the Emitted Sound
Niyazi Cem Degirmenci, Johan Jansson, Johan Hoffman, Marc Arnela, Patricia Sánchez-Martín, Oriol Guasch, Sten Ternström
Synthesis of VV Utterances from Muscle Activation to Sound with a 3D Model
Saeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch
A Dual Source-Filter Model of Snore Audio for Snorer Group Classification
Achuth Rao M.V., Shivani Yadav, Prasanta Kumar Ghosh
An ‘End-to-Evolution’ Hybrid Approach for Snore Sound Classification
Michael Freitag, Shahin Amiriparian, Nicholas Cummins, Maurice Gerczuk, Björn Schuller
Snore Sound Classification Using Image-Based Deep Spectrum Features
Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, Björn Schuller
Exploring Fusion Methods and Feature Space for the Classification of Paralinguistic Information
David Tavarez, Xabier Sarasola, Agustin Alonso, Jon Sanchez, Luis Serrano, Eva Navas, Inma Hernáez
DNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification
Gábor Gosztolya, Róbert Busa-Fekete, Tamás Grósz, László Tóth
Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold
Heysem Kaya, Alexey A. Karpov
The INTERSPEECH 2017 Computational Paralinguistics Challenge: A Summary of Results
Stefan Steidl
Discussion
Björn Schuller, Anton Batliner
Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition
Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu
Optimizing Expected Word Error Rate via Sampling for Speech Recognition
Matt Shannon
Annealed f-Smoothing as a Mechanism to Speed up Neural Network Training
Tara N. Sainath, Vijayaditya Peddinti, Olivier Siohan, Arun Narayanan
Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting
Zhong Meng, Biing-Hwang Juang
Exploiting Eigenposteriors for Semi-Supervised Training of DNN Acoustic Models with Sequence Discrimination
Pranay Dighe, Afsaneh Asaei, Hervé Bourlard
Discriminative Autoencoders for Acoustic Modeling
Ming-Han Yang, Hung-Shin Lee, Yu-Ding Lu, Kuan-Yu Chen, Yu Tsao, Berlin Chen, Hsin-Min Wang
Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement
Zbyněk Zajíc, Marek Hrúz, Luděk Müller
Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold Using Deep Neural Networks with an Evaluation on Speaker Segmentation
Arindam Jati, Panayiotis Georgiou
A Triplet Ranking-Based Neural Network for Speaker Diarization and Linking
Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier
Estimating Speaker Clustering Quality Using Logistic Regression
Yishai Cohen, Itshak Lapidot
Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker Diarization
Guillaume Wisniewksi, Hervé Bredin, G. Gelly, Claude Barras
pyannote.metrics: A Toolkit for Reproducible Evaluation, Diagnostic, and Error Analysis of Speaker Diarization Systems
Hervé Bredin
A Rescoring Approach for Keyword Search Using Lattice Context Information
Zhipeng Chen, Ji Wu
The Kaldi OpenKWS System: Improving Low Resource Keyword Search
Jan Trmal, Matthew Wiesner, Vijayaditya Peddinti, Xiaohui Zhang, Pegah Ghahremani, Yiming Wang, Vimal Manohar, Hainan Xu, Daniel Povey, Sanjeev Khudanpur
The STC Keyword Search System for OpenKWS 2016 Evaluation
Yuri Khokhlov, Ivan Medennikov, Aleksei Romanenko, Valentin Mendelev, Maxim Korenevsky, Alexey Prudnikov, Natalia Tomashenko, Alexander Zatvornitsky
Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting
Ming Sun, David Snyder, Yixin Gao, Varun Nagaraja, Mike Rodehorst, Sankaran Panchapagesan, Nikko Strom, Spyros Matsoukas, Shiv Vitaladevuni
Symbol Sequence Search from Telephone Conversation
Masayuki Suzuki, Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, Kenneth W. Church, Mark Drake
Similarity Learning Based Query Modeling for Keyword Search
Batuhan Gundogdu, Murat Saraclar
Deep Recurrent Neural Network Based Monaural Speech Separation Using Recurrent Temporal Restricted Boltzmann Machines
Suman Samui, Indrajit Chakrabarti, Soumya K. Ghosh
Improved Codebook-Based Speech Enhancement Based on MBE Model
Qizheng Huang, Changchun Bao, Xianyun Wang
Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection
Zhuo Chen, Yan Huang, Jinyu Li, Yifan Gong
Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech Recognition
Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen
SEGAN: Speech Enhancement Generative Adversarial Network
Santiago Pascual, Antonio Bonafonte, Joan Serrà
Concatenative Resynthesis Using Twin Networks
Soumi Maiti, Michael I. Mandel
Combining Residual Networks with LSTMs for Lipreading
Themos Stafylakis, Georgios Tzimiropoulos
Improving Computer Lipreading via DNN Sequence Discriminative Training Techniques
Kwanchiva Thangthai, Richard Harvey
Improving Speaker-Independent Lipreading with Domain-Adversarial Training
Michael Wand, Jürgen Schmidhuber
Turbo Decoders for Audio-Visual Continuous Speech Recognition
Ahmed Hussen Abdelaziz
DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface
Tamás Gábor Csapó, Tamás Grósz, Gábor Gosztolya, László Tóth, Alexandra Markó
Visually Grounded Learning of Keyword Prediction from Untranscribed Speech
Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
Deep Neural Factorization for Speech Recognition
Jen-Tzung Chien, Chen Shen
Semi-Supervised DNN Training with Word Selection for ASR
Karel Veselý, Lukáš Burget, Jan Černocký
Gaussian Prediction Based Attention for Online End-to-End Speech Recognition
Junfeng Hou, Shiliang Zhang, Li-Rong Dai
Efficient Knowledge Distillation from an Ensemble of Teachers
Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran
An Analysis of “Attention” in Sequence-to-Sequence Models
Rohit Prabhavalkar, Tara N. Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly
Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
Hagen Soltau, Hank Liao, Haşim Sak
CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances
Jinxi Guo, Usha Amrutha Nookala, Abeer Alwan
Curriculum Learning Based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker Recognition
Shivesh Ranjan, Abhinav Misra, John H.L. Hansen
i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition
Shivangi Mahto, Hitoshi Yamamoto, Takafumi Koshinaka
Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification
Qiongqiong Wang, Takafumi Koshinaka
Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural Networks
Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Marcel Kockmann
Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data
Diego Castan, Mitchell McLaren, Luciana Ferrer, Aaron Lawson, Alicia Lozano-Diez
CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube
K. Abidi, M.A. Menacer, Kamel Smaïli
PRAV: A Phonetically Rich Audio Visual Corpus
Abhishek Narwekar, Prasanta Kumar Ghosh
NTCD-TIMIT: A New Database and Baseline for Noise-Robust Audio-Visual Speech Recognition
Ahmed Hussen Abdelaziz
The Extended SPaRKy Restaurant Corpus: Designing a Corpus with Variable Information Density
David M. Howcroft, Dietrich Klakow, Vera Demberg
Automatic Construction of the Finnish Parliament Speech Corpus
André Mansikkaniemi, Peter Smit, Mikko Kurimo
Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech
Omnia Abdo, Sherif Abdou, Mervat Fashal
What is the Relevant Population? Considerations for the Computation of Likelihood Ratios in Forensic Voice Comparison
Vincent Hughes, Paul Foulkes
Voice Disguise vs. Impersonation: Acoustic and Perceptual Measurements of Vocal Flexibility in Non Experts
Véronique Delvaux, Lise Caucheteux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies
Schwa Realization in French: Using Automatic Speech Processing to Study Phonological and Socio-Linguistic Factors in Large Corpora
Yaru Wu, Martine Adda-Decker, Cécile Fougeron, Lori Lamel
The Social Life of Setswana Ejectives
Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Justus Roux
How Long is Too Long? How Pause Features After Requests Affect the Perceived Willingness of Affirmative Answers
Lea S. Kohtz, Oliver Niebuhr
Shadowing Synthesized Speech — Segmental Analysis of Phonetic Convergence
Iona Gessinger, Eran Raveh, Sébastien Le Maguer, Bernd Möbius, Ingmar Steiner
Occupancy Detection in Commercial and Residential Environments Using Audio Signal
Shabnam Ghaffarzadegan, Attila Reiss, Mirko Ruhs, Robert Duerichen, Zhe Feng
Data Augmentation, Missing Feature Mask and Kernel Classification for Through-the-Wall Acoustic Surveillance
Huy Dat Tran, Wen Zheng Terence Ng, Yi Ren Leng
Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition
Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Gabor Simko, Carolina Parada
Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages
Arun Baby, Jeena J. Prakash, Rupak Vignesh, Hema A. Murthy
Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries
Yu-Hsuan Wang, Cheng-Tao Chung, Hung-Yi Lee
Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks
Ruiqing Yin, Hervé Bredin, Claude Barras
Improved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder
Cong-Thanh Do, Yannis Stylianou
Factored Deep Convolutional Neural Networks for Noise Robust Speech Recognition
Masakiyo Fujimoto
Global SNR Estimation of Speech Signals for Unknown Noise Conditions Using Noise Adapted Non-Linear Regression
Pavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan
Joint Training of Multi-Channel-Condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech Recognition
Fengpei Ge, Kehuang Li, Bo Wu, Sabato Marco Siniscalchi, Yonghong Yan, Chin-Hui Lee
Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling
Dung T. Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani
Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition
Yu Zhang, Pengyuan Zhang, Yonghong Yan
To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple Histories
Hengguan Huang, Brian Mak
End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition
Suyoun Kim, Ian Lane
Robust Speech Recognition Based on Binaural Auditory Processing
Anjali Menon, Chanwoo Kim, Richard M. Stern
Adaptive Multichannel Dereverberation for Automatic Speech Recognition
Joe Caroselli, Izhak Shafran, Arun Narayanan, Richard Rose
The Effects of Real and Placebo Alcohol on Deaffrication
Urban Zihlmann
Polyglot and Speech Corpus Tools: A System for Representing, Integrating, and Querying Speech Corpora
Michael McAuliffe, Elias Stengel-Eskin, Michaela Socolof, Morgan Sonderegger
Mapping Across Feature Spaces in Forensic Voice Comparison: The Contribution of Auditory-Based Voice Quality to (Semi-)Automatic System Testing
Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo
Effect of Language, Speaking Style and Speaker on Long-Term F0 Estimation
Pablo Arantes, Anders Eriksson, Suska Gutzeit
Stability of Prosodic Characteristics Across Age and Gender Groups
Jan Volín, Tereza Tykalová, Tomáš Bořil
Electrophysiological Correlates of Familiar Voice Recognition
Julien Plante-Hébert, Victor J. Boucher, Boutheina Jemel
Developing an Embosi (Bantu C25) Speech Variant Dictionary to Model Vowel Elision and Morpheme Deletion
Jamison Cooper-Leavitt, Lori Lamel, Annie Rialland, Martine Adda-Decker, Gilles Adda
Rd as a Control Parameter to Explore Affective Correlates of the Tense-Lax Continuum
Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl
Cross-Linguistic Distinctions Between Professional and Non-Professional Speaking Styles
Plínio A. Barbosa, Sandra Madureira, Philippe Boula de Mareüil
Perception and Production of Word-Final /ʁ/ in French
Cedric Gendrot
Glottal Source Estimation from Coded Telephone Speech Using a Deep Neural Network
N.P. Narendra, Manu Airaksinen, Paavo Alku
Automatic Labelling of Prosodic Prominence, Phrasing and Disfluencies in French Speech by Simulating the Perception of Naïve and Expert Listeners
George Christodoulides, Mathieu Avanzi, Anne Catherine Simon
Don’t Count on ASR to Transcribe for You: Breaking Bias with Two Crowds
Michael Levit, Yan Huang, Shuangyu Chang, Yifan Gong
Effects of Training Data Variety in Generating Glottal Pulses from Acoustic Features with DNNs
Manu Airaksinen, Paavo Alku
Towards Intelligent Crowdsourcing for Audio Data Annotation: Integrating Active Learning in the Real World
Simone Hantke, Zixing Zhang, Björn Schuller
Principles for Learning Controllable TTS from Annotated and Latent Variation
Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi
Sampling-Based Speech Parameter Generation Using Moment-Matching Networks
Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari
Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets
Vincent Pollet, Enrico Zovato, Sufian Irhimeh, Pier Batzu
Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data
Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg
Bias and Statistical Significance in Evaluating Speech Synthesis with Mean Opinion Scores
Andrew Rosenberg, Bhuvana Ramabhadran
Phase Modeling Using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis
Nagaraj Adiga, S.R. Mahadeva Prasanna
Evaluation of a Silent Speech Interface Based on Magnetic Sensing and Deep Learning for a Phonetically Rich Vocabulary
Jose A. Gonzalez, Lam A. Cheah, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, Ed Holdsworth
Predicting Head Pose from Speech with a Conditional Variational Autoencoder
David Greenwood, Stephen Laycock, Iain Matthews
Real-Time Reactive Speech Synthesis: Incorporating Interruptions
Mirjam Wester, David A. Braude, Blaise Potard, Matthew P. Aylett, Francesca Shaw
A Neural Parametric Singing Synthesizer
Merlijn Blaauw, Jordi Bonada
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang, R.J. Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous
Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System
Tim Capes, Paul Coles, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Melvyn Hunt, Jiangchuan Li, Matthias Neeracher, Kishore Prahallad, Tuomo Raitio, Ramya Rasipuram, Greg Townsend, Becci Williamson, David Winarsky, Zhizheng Wu, Hepeng Zhang
An Expanded Taxonomy of Semiotic Classes for Text Normalization
Daan van Esch, Richard Sproat
Complex-Valued Restricted Boltzmann Machine for Direct Learning of Frequency Spectra
Toru Nakashika, Shinji Takaki, Junichi Yamagishi
Soundtracing for Realtime Speech Adjustment to Environmental Conditions in 3D Simulations
Bartosz Ziółko, Tomasz Pȩdzimąż, Szymon Pałka
Vocal-Tract Model with Static Articulators: Lips, Teeth, Tongue, and More
Takayuki Arai
Remote Articulation Test System Based on WebRTC
Ikuyo Masuda-Katsuse
The ModelTalker Project: A Web-Based Voice Banking Pipeline for ALS/MND Patients
H. Timothy Bunnell, Jason Lilley, Kathleen McGrath
Visible Vowels: A Tool for the Visualization of Vowel Variation
Wilbert Heeringa, Hans Van de Velde
Article |
---|