doi: 10.21437/Interspeech.2016
ISSN: 2958-1796
A 50-Year Retrospective on Speech and Language Processing
John Makhoul
Improving English Conversational Telephone Speech Recognition
Ivan Medennikov, Alexey Prudnikov, Alexander Zatvornitskiy
The IBM 2016 English Conversational Telephone Speech Recognition System
George Saon, Tom Sercu, Steven Rennie, Hong-Kwang J. Kuo
Small-Footprint Deep Neural Networks with Highway Connections for Speech Recognition
Liang Lu, Steve Renals
Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention
Dong Yu, Wayne Xiong, Jasha Droppo, Andreas Stolcke, Guoli Ye, Jinyu Li, Geoffrey Zweig
Lower Frame Rate Neural Network Acoustic Models
Golan Pundak, Tara N. Sainath
Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling
Gakuto Kurata, Brian Kingsbury
Automatic Scoring of Monologue Video Interviews Using Multimodal Cues
Lei Chen, Gary Feng, Michelle Martin-Raugh, Chee Wee Leong, Christopher Kitchen, Su-Youn Yoon, Blair Lehman, Harrison Kell, Chong Min Lee
The Sound of Disgust: How Facial Expression May Influence Speech Production
Chee Seng Chong, Jeesun Kim, Chris Davis
Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions
Zhaojun Yang, Shrikanth S. Narayanan
Audiovisual Speech Scene Analysis in the Context of Competing Sources
Attigodu C. Ganesh, Frédéric Berthommier, Jean-Luc Schwartz
Head Motion Generation with Synthetic Speech: A Data Driven Approach
Najmeh Sadoughi, Carlos Busso
The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic Attitudes
Jeesun Kim, Chris Davis
Introduction to Poster Presentation of Part II
Jeesun Kim, Gérard Bailly
The Unit of Speech Encoding: The Case of Romanian
Irene Vogel, Laura Spinu
The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented German
Jeanin Jügler, Frank Zimmerer, Jürgen Trouvain, Bernd Möbius
Organizing Syllables into Sandhi Domains — Evidence from F0 and Duration Patterns in Shanghai Chinese
Bijun Ling, Jie Liang
Automatic Analysis of Phonetic Speech Style Dimensions
Neville Ryant, Mark Liberman
The Acoustic Manifestation of Prominence in Stressless Languages
Angeliki Athanasopoulou, Irene Vogel
The Rhythmic Constraint on Prosodic Boundaries in Mandarin Chinese Based on Corpora of Silent Reading and Speech Perception
Wei Lai, Jiahong Yuan, Ya Li, Xiaoying Xu, Mark Liberman
Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions
Fu-Sheng Tsai, Ya-Ling Hsu, Wei-Chen Chen, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee
Predicting Severity of Voice Disorder from DNN-HMM Acoustic Posteriors
Tan Lee, Yuanyuan Liu, Yu Ting Yeung, Thomas K.T. Law, Kathy Y.S. Lee
Long-Term Stability of Tracheoesophageal Voices
Klaske E. van Sluis, Michiel W.M. van den Brekel, Frans J.M. Hilgers, Rob J.J.H. van Son
Detecting Mild Cognitive Impairment from Spontaneous Speech by Correlation-Based Phonetic Feature Selection
Gábor Gosztolya, László Tóth, Tamás Grósz, Veronika Vincze, Ildikó Hoffmann, Gréta Szatlóczki, Magdolna Pákáski, János Kálmán
Towards an Automated Screening Tool for Developmental Speech and Language Impairments
Jen J. Gong, Maryann Gong, Dina Levy-Lambert, Jordan R. Green, Tiffany P. Hogan, John V. Guttag
Spectral Enhancement of Cleft Lip and Palate Speech
Vikram C.M., Nagaraj Adiga, S.R. Mahadeva Prasanna
Assessing Level-Dependent Segmental Contribution to the Intelligibility of Speech Processed by Single-Channel Noise-Suppression Algorithms
Tian Guan, Guangxing Chu, Fei Chen, Feng Yang
Effectiveness of Near-End Speech Enhancement Under Equal-Loudness and Equal-Level Constraints
Tudor-Cătălin Zorilă, Sheila Flanagan, Brian C.J. Moore, Yannis Stylianou
Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence
Bidisha Sharma, S.R. Mahadeva Prasanna
Relative Contributions of Amplitude and Phase to the Intelligibility Advantage of Ideal Binary Masked Sentences
Lei Wang, Shufeng Zhu, Diliang Chen, Yong Feng, Fei Chen
Predicting Binaural Speech Intelligibility from Signals Estimated by a Blind Source Separation Algorithm
Qingju Liu, Yan Tang, Philip J.B. Jackson, Wenwu Wang
Automated Pause Insertion for Improved Intelligibility Under Reverberation
Petko N. Petkov, Norbert Braunschweiler, Yannis Stylianou
Automatic Classification of Phonation Modes in Singing Voice: Towards Singing Style Characterisation and Application to Ethnomusicological Recordings
Jean-Luc Rouas, Leonidas Ioannidis
Novel Nonlinear Prediction Based Features for Spoofed Speech Detection
Himanshu N. Bhavsar, Tanvina B. Patel, Hemant A. Patil
Robust Vowel Landmark Detection Using Epoch-Based Features
Sri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty, B. Yegnanarayana
Sensitivity of Quantitative RT-MRI Metrics of Vocal Tract Dynamics to Image Reconstruction Settings
Johannes Töger, Yongwan Lim, Sajan Goud Lingala, Shrikanth S. Narayanan, Krishna S. Nayak
Sound Pattern Matching for Automatic Prosodic Event Detection
Milos Cernak, Afsaneh Asaei, Pierre-Edouard Honnet, Philip N. Garner, Hervé Bourlard
Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep Learning
Mostafa Shahin, Julien Epps, Beena Ahmed
Development of Mandarin Onset-Rime Detection in Relation to Age and Pinyin Instruction
Fei Chen, Nan Yan, Xunan Huang, Hao Zhang, Lan Wang, Gang Peng
Joint Effect of Dialect and Mandarin on English Vowel Production: A Case Study in Changsha EFL Learners
Xinyi Wen, Yuan Jia
Effects of L1 Phonotactic Constraints on L2 Word Segmentation Strategies
Tamami Katayama
Putting German [ʃ] and [ç] in Two Different Boxes: Native German vs L2 German of French Learners
Jane Wottawa, Martine Adda-Decker, Frédéric Isel
Naturalness Judgement of L2 English Through Dubbing Practice
Dean Luo, Ruxin Luo, Lixin Wang
Audiovisual Training Effects for Japanese Children Learning English /r/-/l/
Yasuaki Shinohara
L2 Acquisition and Production of the English Rhotic Pharyngeal Gesture
Sarah Harper, Louis Goldstein, Shrikanth S. Narayanan
Auditory-Visual Perception of VCVs Produced by People with Down Syndrome: Preliminary Results
Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen
Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech
Emre Yılmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik
Evaluation of a Phone-Based Anomaly Detection Approach for Dysarthric Speech
Imed Laaridh, Corinne Fredouille, Christine Meunier
Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation
Chitralekha Bhat, Bhavik Vachhani, Sunil Kopparapu
Impaired Categorical Perception of Mandarin Tones and its Relationship to Language Ability in Autism Spectrum Disorders
Fei Chen, Nan Yan, Xiaojie Pan, Feng Yang, Zhuanzhuan Ji, Lan Wang, Gang Peng
Perceived Naturalness of Electrolaryngeal Speech Produced Using sEMG-Controlled vs. Manual Pitch Modulation
K.F. Nagle, J.T. Heaton
Identifying Hearing Loss from Learned Speech Kernels
Shamima Najnin, Bonny Banerjee, Lisa Lucks Mendel, Masoumeh Heidari Kapourchali, Jayanta Kumar Dutta, Sungmin Lee, Chhayakanta Patro, Monique Pousson
Differential Effects of Velopharyngeal Dysfunction on Speech Intelligibility During Early and Late Stages of Amyotrophic Lateral Sclerosis
Panying Rong, Yana Yunusova, Jordan R. Green
The Production of Intervocalic Glides in Non Dysarthric Parkinsonian Speech
V. Delvaux, V. Roland, K. Huet, M. Piccaluga, M.C. Haelewyck, B. Harmegnies
Auditory Processing Impairments Under Background Noise in Children with Non-Syndromic Cleft Lip and/or Palate
Yang Feng, Zhang Lu
Modulation Spectral Features for Predicting Vocal Emotion Recognition by Simulated Cochlear Implants
Zhi Zhu, Ryota Miyauchi, Yukiko Araki, Masashi Unoki
Automatic Discrimination of Soft Voice Onset Using Acoustic Features of Breathy Voicing
Keiko Ochi, Koichi Mori, Naomi Sakai, Nobutaka Ono
Effect of Noise on Lexical Tone Perception in Cantonese-Speaking Amusics
Jing Shao, Caicai Zhang, Gang Peng, Yike Yang, William S.-Y. Wang
Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss
Yuki Takashima, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki, Nobuyuki Mitani, Kiyohiro Omori, Kaoru Nakazono
Perception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin
Yuling Gu, Boon Pang Lim, Nancy F. Chen
A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences
Feng-Long Xie, Frank K. Soong, Haifeng Li
Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-Embedded Non-Negative Matrix Factorization
Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki
Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
Yu Gu, Zhen-Hua Ling, Li-Rong Dai
Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features
Yi Yang, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu
Voice Conversion Based on Trajectory Model Training of Neural Networks Considering Global Variance
Naoki Hosaka, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
Comparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents
Sandesh Aryal, Ricardo Gutierrez-Osuna
Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data
Seyyed Saeed Sarfjoo, Cenk Demiroglu
Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams
Lifa Sun, Hao Wang, Shiyin Kang, Kun Li, Helen Meng
Acoustic Analysis of Syllables Across Indian Languages
Anusha Prakash, Jeena J. Prakash, Hema A. Murthy
Objective Evaluation Methods for Chinese Text-To-Speech Systems
Teng Zhang, Zhipeng Chen, Ji Wu, Sam Lai, Wenhui Lei, Carsten Isert
Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis
Yusuke Ijima, Taichi Asami, Hideyuki Mizuno
A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks
Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, Keiichi Tokuda
Text-to-Speech for Individuals with Vision Loss: A User Study
Monika Podsiadło, Shweta Chahar
Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks
Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, Junichi Yamagishi
Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis
Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg
A Portable Automatic PA-TA-KA Syllable Detection System to Derive Biomarkers for Neurological Disorders
Fei Tao, Louis Daudet, Christian Poellabauer, Sandra L. Schneider, Carlos Busso
Deep Neural Networks for i-Vector Language Identification of Short Utterances in Cars
Omid Ghahabi, Antonio Bonafonte, Javier Hernando, Asunción Moreno
Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features
Abraham Woubie, Jordi Luque, Javier Hernando
Open Language Interface for Voice Exploitation (OLIVE)
Aaron Lawson, Mitchell McLaren, Harry Bratt, Martin Graciarena, Horacio Franco, Christopher George, Allen Stauffer, Chris Bartels, Julien VanHout
A Multimodal Dialogue System for Air Traffic Control Trainees Based on Discrete-Event Simulation
Luboš Šmídl, Adam Chýlek, Jan Švec
Lig-Aikuma: A Mobile App to Collect Parallel Speech for Under-Resourced Language Studies
Elodie Gauthier, David Blachon, Laurent Besacier, Guy-Noël Kouarata, Martine Adda-Decker, Annie Rialland, Gilles Adda, Grégoire Bachman
ARET — Automatic Reading of Educational Texts for Visually Impaired Students
Martin Grůber, Jindřich Matoušek, Zdeněk Hanzlíček, Zdeněk Krňoul, Zbyněk Zajíc
Segmental Recurrent Neural Networks for End-to-End Speech Recognition
Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals
Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units
Markus Nussbaum-Thom, Jia Cui, Bhuvana Ramabhadran, Vaibhava Goel
Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition
Wei-Ning Hsu, Yu Zhang, Ann Lee, James Glass
Stimulated Deep Neural Network for Speech Recognition
Chunyang Wu, Penny Karanasou, Mark J.F. Gales, Khe Chai Sim
Phonetic Context Embeddings for DNN-HMM Phone Recognition
Leonardo Badino
Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Ying Zhang, Mohammad Pezeshki, Philémon Brakel, Saizheng Zhang, César Laurent, Yoshua Bengio, Aaron Courville
Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker
Guangsen Wang, Kong Aik Lee, Trung Hieu Nguyen, Hanwu Sun, Bin Ma
Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus
Md Jahangir Alam, Patrick Kenny, Vishwa Gupta
Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM
Achintya Kr. Sarkar, Zheng-Hua Tan
Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus
Tomi Kinnunen, Md. Sahidullah, Ivan Kukanov, Héctor Delgado, Massimiliano Todisco, Achintya Kr. Sarkar, Nicolai Bæk Thomsen, Ville Hautamäki, Nicholas Evans, Zheng-Hua Tan
Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification
Jianbo Ma, Saad Irtza, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah
i-Vector/HMM Based Text-Dependent Speaker Verification System for RedDots Challenge
Hossein Zeinali, Hossein Sameti, Lukáš Burget, Jan Černocký, Nooshin Maghsoodi, Pavel Matějka
Exploring Session Variability and Template Aging in Speaker Verification for Fixed Phrase Short Utterances
Rohan Kumar Das, Sarfaraz Jelil, S.R. Mahadeva Prasanna
Prediction of the Articulatory Movements of Unseen Phonemes of a Speaker Using the Speech Structure of Another Speaker
Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu
Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion
Ganesh Sivaraman, Vikramjit Mitra, Hosung Nam, Mark Tiede, Carol Espy-Wilson
Investigation of Speed-Accuracy Tradeoffs in Speech Production Using Real-Time Magnetic Resonance Imaging
Adam C. Lammert, Christine H. Shadle, Shrikanth S. Narayanan, Thomas F. Quatieri
Characterizing Vocal Tract Dynamics Across Speakers Using Real-Time MRI
Tanner Sorensen, Asterios Toutios, Louis Goldstein, Shrikanth S. Narayanan
Tracking Contours of Orofacial Articulators from Real-Time MRI of Speech
Mathieu Labrunie, Pierre Badin, Dirk Voit, Arun A. Joseph, Laurent Lamalle, Coriandre Vilain, Louis-Jean Boë, Jens Frahm
State-of-the-Art MRI Protocol for Comprehensive Assessment of Vocal Tract Structure and Function
Sajan Goud Lingala, Asterios Toutios, Johannes Töger, Yongwan Lim, Yinghua Zhu, Yoon-Chul Kim, Colin Vaz, Shrikanth S. Narayanan, Krishna S. Nayak
DBN-ivector Framework for Acoustic Emotion Recognition
Rui Xia, Yang Liu
An Investigation of Emotional Speech in Depression Classification
Brian Stasak, Julien Epps, Nicholas Cummins, Roland Goecke
Retrieving Categorical Emotions Using a Probabilistic Framework to Define Preference Learning Samples
Reza Lotfian, Carlos Busso
At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech
Maximilian Schmitt, Fabien Ringeval, Björn Schuller
Speech Emotion Recognition Using Affective Saliency
Arodami Chorianopoulou, Polychronis Koutsakis, Alexandros Potamianos
Laughter Valence Prediction in Motivational Interviewing Based on Lexical and Acoustic Cues
Rahul Gupta, Nishant Nath, Taruna Agrawal, Panayiotis Georgiou, David C. Atkins, Shrikanth S. Narayanan
Respiratory Belts and Whistles: A Preliminary Study of Breathing Acoustics for Turn-Taking
Marcin Włodarczak, Mattias Heldner
/r/ as Language Marker in Bilingual Speech Production and Perception
Constantijn Kaland, Vincenzo Galatà, Lorenzo Spreafico, Alessandro Vietti
Evaluation of Phonatory Behavior of German and French Speakers in Native and Non-Native Speech
Manfred Pützer, Frank Zimmerer, Wolfgang Wokurek, Jeanin Jügler
Today’s Most Frequently Used F0 Estimation Methods, and Their Accuracy in Estimating Male and Female Pitch in Clean Speech
Sofia Strömbergsson
A Praat-Based Algorithm to Extract the Amplitude Envelope and Temporal Fine Structure Using the Hilbert Transform
Lei He, Volker Dellwo
Likelihood Ratio Calculation in Acoustic-Phonetic Forensic Voice Comparison: Comparison of Three Statistical Modelling Approaches
Ewald Enzinger
A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer Functions
Xiaoke Qi, Jianhua Tao
Single-Channel Multi-Speaker Separation Using Deep Clustering
Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey
Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation
Hao Li, Shuai Nie, Xueliang Zhang, Hui Zhang
A Feature Study for Masking-Based Reverberant Speech Separation
Masood Delfarah, DeLiang Wang
Discriminative Layered Nonnegative Matrix Factorization for Speech Separation
Chung-Chien Hsu, Tai-Shih Chi, Jen-Tzung Chien
On Discriminative Framework for Single Channel Audio Source Separation
Arpita Gang, Pravesh Biyani
Generating Natural Video Descriptions via Multimodal Processing
Qin Jin, Junwei Liang, Xiaozhu Lin
Feature-Level Decision Fusion for Audio-Visual Word Prominence Detection
Martin Heckmann
Acoustic and Visual Analysis of Expressive Speech: A Case Study of French Acted Speech
Slim Ouni, Vincent Colotte, Sara Dahmani, Soumaya Azzi
Characterization of Audiovisual Dramatic Attitudes
Adela Barbulescu, Rémi Ronfard, Gérard Bailly
Conversational Engagement Recognition Using Auditory and Visual Cues
Yuyun Huang, Emer Gilmartin, Nick Campbell
An Acoustic Analysis of Child-Child and Child-Robot Interactions for Understanding Engagement during Speech-Controlled Computer Games
Theodora Chaspari, Jill Fain Lehman
Auditory-Visual Lexical Tone Perception in Thai Elderly Listeners with and without Hearing Impairment
Benjawan Kasisopa, Chutamanee Onsuwan, Charturong Tantibundhit, Nittayapa Klangpornkun, Suparak Techacharoenrungrueang, Sudaporn Luksaneeyanawin, Denis Burnham
Use of Agreement/Disagreement Classification in Dyadic Interactions for Continuous Emotion Recognition
Hossein Khaki, Engin Erzin
Microscopic Multilingual Matrix Test Predictions Using an ASR-Based Speech Recognition Model
Marc René Schädler, David Hülsmeier, Anna Warzybok, Sabine Hochmuth, Birger Kollmeier
DNN-Based Automatic Speech Recognition as a Model for Human Phoneme Perception
Mats Exter, Bernd T. Meyer
Undoing Misperceptions: A Microscopic Analysis of Consistent Confusions Through Signal Modifications
Attila Máté Tóth, Martin Cooke
Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs
Mahdie Karbasi, Ahmed Hussen Abdelaziz, Hendrik Meutzner, Dorothea Kolossa
Misperceptions Arising from Speech-in-Babble Interactions
Attila Máté Tóth, Martin Cooke, Jon Barker
Introducing Temporal Rate Coding for Speech in Cochlear Implants: A Microscopic Evaluation in Humans and Models
Anja Eichenauer, Mathias Dietz, Bernd T. Meyer, Tim Jürgens
Language Effects in Noise-Induced Word Misperceptions
Maria Luisa Garcia Lecumberri, Jon Barker, Ricard Marxer, Martin Cooke
Speech Reductions Cause a De-Weighting of Secondary Acoustic Cues
Léo Varnet, Fanny Meunier, Michel Hoen
Using Phonologically Weighted Levenshtein Distances for the Prediction of Microscopic Intelligibility
Lionel Fontan, Isabelle Ferrané, Jérôme Farinas, Julien Pinquier, Xavier Aumont
The Impact of Manner of Articulation on the Intelligibility of Voicing Contrast in Noise: Cross-Linguistic Implications
Mayuki Matsui
Directly Comparing the Listening Strategies of Humans and Machines
Michael I. Mandel
LSTM-Based NeuroCRFs for Named Entity Recognition
Marc-Antoine Rondeau, Yi Su
Exploring Word Mover’s Distance and Semantic-Aware Embedding Techniques for Extractive Broadcast News Summarization
Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu
Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linarès
Beyond Utterance Extraction: Summary Recombination for Speech Summarization
Jérémy Trione, Benoit Favre, Frederic Bechet
Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling
Bing Liu, Ian Lane
Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding
Aaron Jaech, Larry Heck, Mari Ostendorf
LatticeRnn: Recurrent Neural Networks Over Lattices
Faisal Ladhak, Ankur Gandhe, Markus Dreyer, Lambert Mathias, Ariya Rastrow, Björn Hoffmeister
Learning Document Representations Using Subspace Multinomial Model
Santosh Kesiraju, Lukáš Burget, Igor Szőke, Jan Černocký
Attention-Based Convolutional Neural Networks for Sentence Classification
Zhiwei Zhao, Youzheng Wu
Spoken Language Understanding in a Latent Topic-Based Subspace
Mohamed Morchid, Mohamed Bouaziz, Waad Ben Kheder, Killian Janod, Pierre-Michel Bousquet, Richard Dufour, Georges Linarès
Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM
Dilek Hakkani-Tür, Gokhan Tur, Asli Celikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, Ye-Yi Wang
Deep Stacked Autoencoders for Spoken Language Understanding
Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori
Labeled Data Generation with Encoder-Decoder LSTM for Semantic Slot Filling
Gakuto Kurata, Bing Xiang, Bowen Zhou
Exploring the Correlation of Pitch Accents and Semantic Slots for Spoken Language Understanding
Sabrina Stehwien, Ngoc Thang Vu
Analysis on Gated Recurrent Unit Based Question Detection Approach
Yaodong Tang, Zhiyong Wu, Helen Meng, Mingxing Xu, Lianhong Cai
Combining State-Level Spotting and Posterior-Based Acoustic Match for Improved Query-by-Example Spoken Term Detection
Shuji Oishi, Tatsuya Matsuba, Mitsuaki Makino, Atsuhiko Kai
A Novel Discriminative Score Calibration Method for Keyword Search
Zhiqiang Lv, Meng Cai, Wei-Qiang Zhang, Jia Liu
Segmented Dynamic Time Warping for Spoken Query-by-Example Search
Jorge Proença, Fernando Perdigão
Generating Complementary Acoustic Model Spaces in DNN-Based Sequence-to-Frame DTW Scheme for Out-of-Vocabulary Spoken Term Detection
Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh
Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting
Sankaran Panchapagesan, Ming Sun, Aparna Khare, Spyros Matsoukas, Arindam Mandal, Björn Hoffmeister, Shiv Vitaladevuni
Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder
Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, Lin-Shan Lee
Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting
Zhong Meng, Biing-Hwang Juang
Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions
Arseniy Gorin, Rasa Lileikytė, Guangpu Huang, Lori Lamel, Jean-Luc Gauvain, Antoine Laurent
STON: Efficient Subtitling in Dutch Using State-of-the-Art Tools
Lyan Verwimp, Brecht Desplanques, Kris Demuynck, Joris Pelemans, Marieke Lycke, Patrick Wambacq
An Automatic Training Tool for Air Traffic Control Training
Petr Stanislav, Luboš Šmídl, Jan Švec
Digitala: An Augmented Test and Review Process Prototype for High-Stakes Spoken Foreign Language Examination
Reima Karhila, Aku Rouhe, Peter Smit, André Mansikkaniemi, Heini Kallio, Erik Lindroos, Raili Hildén, Martti Vainio, Mikko Kurimo
Exploring Collections of Multimedia Archives Through Innovative Interfaces in the Context of Digital Humanities
Géraldine Damnati, Delphine Charlet, Marc Denjean
Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information
Yougen Yuan, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li
Novel Front-End Features Based on Neural Graph Embeddings for DNN-HMM and LSTM-CTC Acoustic Modeling
Yuzong Liu, Katrin Kirchhoff
Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition
Basil Abraham, S. Umesh, Neethu Mariam Joy
On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models
Tasha Nagamine, Michael L. Seltzer, Nima Mesgarani
Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling
Ehsan Variani, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani
Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
Tara N. Sainath, Bo Li
The Speakers in the Wild (SITW) Speaker Recognition Database
Mitchell McLaren, Luciana Ferrer, Diego Castan, Aaron Lawson
The 2016 Speakers in the Wild Speaker Recognition Evaluation
Mitchell McLaren, Luciana Ferrer, Diego Castan, Aaron Lawson
Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 Challenge
Ondřej Novotný, Pavel Matějka, Oldřich Plchot, Ondřej Glembek, Lukáš Burget, Jan Černocký
A Speaker Recognition System for the SITW Challenge
Oleg Kudashev, Sergey Novoselov, Konstantin Simonchik, Alexandr Kozlov
Speakers In The Wild (SITW): The QUT Speaker Recognition System
H. Ghaemmaghami, M.H. Rahman, Ivan Himawan, David Dean, Ahilan Kanagasundaram, Sridha Sridharan, Clinton Fookes
AUT System for SITW Speaker Recognition Challenge
Abbas Khosravani, Mohammad Mehdi Homayounpour
LIA System for the SITW Speaker Recognition Challenge
Waad Ben Kheder, Moez Ajili, Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre
Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge
Yi Liu, Yao Tian, Liang He, Jia Liu
Does the Importance of Word-Initial and Word-Final Information Differ in Native versus Non-Native Spoken-Word Recognition?
Odette Scharenborg, Juul Coumans, Sofoklis Kakouros, Roeland van Hout
The Effect of Sentence Accent on Non-Native Speech Perception in Noise
Odette Scharenborg, Elea Kolkman, Sofoklis Kakouros, Brechtje Post
The Effects of Modified Speech Styles on Intelligibility for Non-Native Listeners
Martin Cooke, Maria Luisa Garcia Lecumberri
The Influence of Language Experience on the Categorical Perception of Vowels: Evidence from Mandarin and Korean
Hao Zhang, Fei Chen, Nan Yan, Lan Wang, Feng Shi, Manwa L. Ng
Multiple Influences on Vocabulary Acquisition: Parental Input Dominates
Dominic W. Massaro
Can Intensive Exposure to Foreign Language Sounds Affect the Perception of Native Sounds?
Jian Gong, Maria Luisa Garcia Lecumberri, Martin Cooke
Privacy-Preserving Speech Analytics for Automatic Assessment of Student Collaboration
Nikoletta Bassiou, Andreas Tsiartas, Jennifer Smith, Harry Bratt, Colleen Richey, Elizabeth Shriberg, Cynthia D’Angelo, Nonye Alozie
Complexity in Prosody: A Nonlinear Dynamical Systems Approach for Dyadic Conversations; Behavior and Outcomes in Couples Therapy
Md. Nasir, Brian Baucom, Shrikanth S. Narayanan, Panayiotis Georgiou
Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language Models
Shao-Yen Tseng, Sandeep Nallan Chakravarthula, Brian Baucom, Panayiotis Georgiou
Speech Likability and Personality-Based Social Relations: A Round-Robin Analysis over Communication Channels
Laura Fernández Gallardo, Benjamin Weiss
Behavioral Coding of Therapist Language in Addiction Counseling Using Recurrent Neural Networks
Bo Xiao, Doğan Can, James Gibson, Zac E. Imel, David C. Atkins, Panayiotis Georgiou, Shrikanth S. Narayanan
Factor Analysis Based Speaker Normalisation for Continuous Emotion Prediction
Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection
Dhananjay Ram, Afsaneh Asaei, Hervé Bourlard
Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection
Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li
A Nonparametric Bayesian Approach for Spoken Term Detection by Example Query
Amir Hossein Harati Nejad Torbati, Joseph Picone
Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples
Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou Li
Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC
Yimeng Zhuang, Xuankai Chang, Yanmin Qian, Kai Yu
Interactive Spoken Content Retrieval by Deep Reinforcement Learning
Yen-Chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-Yi Lee, Lin-Shan Lee
Relating Estimated Cyclic Spectral Peak Frequency to Measured Epilarynx Length Using Magnetic Resonance Imaging
Elizabeth Godoy, Andrew Dumas, Jennifer Melot, Nicolas Malyska, Thomas F. Quatieri
Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model
Patrick Lumban Tobing, Tomoki Toda, Hirokazu Kameoka, Satoshi Nakamura
Formant Estimation and Tracking Using Deep Learning
Yehoshua Dissen, Joseph Keshet
Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data
Colin Vaz, Asterios Toutios, Shrikanth S. Narayanan
Majorisation-Minimisation Based Optimisation of the Composite Autoregressive System with Application to Glottal Inverse Filtering
Lauri Juvela, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
F0 Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech Recognition
Xiaoyun Wang, Xugang Lu, Hisashi Kawai, Seiichi Yamamoto
Vowels and Diphthongs in Cangnan Southern Min Chinese Dialect
Fang Hu, Chunyu Ge
Diphthongization of Nuclear Vowels and the Emergence of a Tetraphthong in Hetang Cantonese
Wenqi Hu, Fang Hu, Jian Jin
PhonVoc: A Phonetic and Phonological Vocoding Toolkit
Milos Cernak, Philip N. Garner
Vowels and Diphthongs in the Taiyuan Jin Chinese Dialect
Liping Xia, Fang Hu
The Effects of Prosody on French V-to-V Coarticulation: A Corpus-Based Study
Giuseppina Turco, Cécile Fougeron, Nicolas Audibert
An Acoustic Analysis of /r/ in Tyrolean
Vincenzo Galatà, Lorenzo Spreafico, Alessandro Vietti, Constantijn Kaland
Hyperarticulated Production of Korean Glides by Age Group
Seung-Eun Chang, Minsook Kim
Coda Stop and Taiwan Min Checked Tone Sound Changes
Ho-hsien Pan, Hsiao-tung Huang, Shao-ren Lyu
The Influence of Modality and Speaking Style on the Assimilation Type and Categorization Consistency of Non-Native Speech
Sarah E. Fenwick, Catherine T. Best, Chris Davis, Michael D. Tyler
Prosodic Convergence with Spoken Stimuli in Laboratory Data
Margaret Zellers
Effects of Stress on Fricatives: Evidence from Standard Modern Greek
Charalambos Themistocleous, Angelandria Savva, Andrie Aristodemou
Analysis of Chinese Syllable Durations in Running Speech of Japanese L2 Learners
Yue Sun, Shudon Hsiao, Yoshinori Sagisaka, Jinsong Zhang
Automatic Paragraph Segmentation with Lexical and Prosodic Features
Catherine Lai, Mireia Farrús, Johanna D. Moore
Automatic Glottal Inverse Filtering with Non-Negative Matrix Factorization
Manu Airaksinen, Lauri Juvela, Tom Bäckström, Paavo Alku
Speaker Identity and Voice Quality: Modeling Human Responses and Automatic Speaker Recognition
Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patricia Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, Abeer Alwan
Analysis of Glottal Stop in Assam Sora Language
Sishir Kalita, Luke Horo, Priyankoo Sarmah, S.R. Mahadeva Prasanna, S. Dandapat
Acoustic Differences Between English /t/ Glottalization and Phrasal Creak
Marc Garellek, Scott Seyfarth
The Acoustics of Lexical Stress in Italian as a Function of Stress Level and Speaking Style
Anders Eriksson, Pier Marco Bertinetto, Mattias Heldner, Rosalba Nodari, Giovanna Lenoci
Cross-Gender and Cross-Dialect Tone Recognition for Vietnamese
Antje Schweitzer, Ngoc Thang Vu
Prosody Modification Using Allpass Residual of Speech Signals
Karthika Vijayan, K. Sri Rama Murty
Analyzing the Contribution of Top-Down Lexical and Bottom-Up Acoustic Cues in the Detection of Sentence Prominence
Sofoklis Kakouros, Joris Pelemans, Lyan Verwimp, Patrick Wambacq, Okko Räsänen
A Longitudinal Study of Children’s Intonation in Narrative Speech
Jeffrey Kallay, Melissa A. Redford
Velum Control for Oral Sounds
Reed Blaylock, Louis Goldstein, Shrikanth S. Narayanan
F0 Development in Acquiring Korean Stop Distinction
Gayeon Son
Phonetic Reduction Can Lead to Lengthening, and Enhancement Can Lead to Shortening
Clara Cohen, Matt Carlson
Mechanical Production of [b], [m] and [w] Using Controlled Labial and Velopharyngeal Gestures
Takayuki Arai
An Improved 3D Geometric Tongue Model
Qiang Fang, Yun Chen, Haibo Wang, Jianguo Wei, Jianrong Wang, Xiyu Wu, Aijun Li
Congruency Effect Between Articulation and Grasping in Native English Speakers
Mikko Tiainen, Fatima M. Felisberti, Kaisa Tiippana, Martti Vainio, Juraj Simko, Jiri Lukavsky, Lari Vainio
Emergence of Vocal Developmental Sequences in a Predictive Coding Model of Speech Acquisition
Shamima Najnin, Bonny Banerjee
Categorization of Natural Spanish Whistled Vowels by Naïve Spanish Listeners
Julien Meyer, Laure Dentel, Fanny Meunier
Between- and Within-Speaker Effects of Bilingualism on F0 Variation
Rob Voigt, Dan Jurafsky, Meghan Sumner
Vowel Characteristics in the Assessment of L2 English Pronunciation
Calbert Graham, Paula Buttery, Francis Nolan
Kulning (Swedish Cattle Calls): Acoustic, EGG, Stroboscopic and High-Speed Video Analyses of an Unusual Singing Style
Ahmed Geneid, Anne-Maria Laukkanen, Anita McAllister, Robert Eklund
Glottal Squeaks in VC Sequences
Míša Hejná, Pertti Palo, Scott Moisik
Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks
Naoya Takahashi, Tofigh Naghibi, Beat Pfister
Personalized Natural Language Understanding
Xiaohu Liu, Ruhi Sarikaya, Liang Zhao, Yong Ni, Yi-Cheng Pan
A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems
Layla El Asri, Jing He, Kaheer Suleman
Root Cause Analysis of Miscommunication Hotspots in Spoken Dialogue Systems
Spiros Georgiladakis, Georgia Athanasopoulou, Raveesh Meena, José Lopes, Arodami Chorianopoulou, Elisavet Palogiannidi, Elias Iosif, Gabriel Skantze, Alexandros Potamianos
Making Personal Digital Assistants Aware of What They Do Not Know
Omar Zia Khan, Ruhi Sarikaya
Implementing Acoustic-Prosodic Entrainment in a Conversational Avatar
Rivka Levitan, Štefan Beňuš, Ramiro H. Gálvez, Agustín Gravano, Florencia Savoretti, Marian Trnka, Andreas Weise, Julia Hirschberg
Perceived Usability and Cognitive Demand of Secondary Tasks in Spoken Versus Visual-Manual Automotive Interaction
Annika Silvervarg, Sofia Lindvall, Jonatan Andersson, Ida Esberg, Christian Jernberg, Filip Frumerie, Arne Jönsson
Zara: An Empathetic Interactive Virtual Agent
Pascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Wan Yan, Ricky Ho Yin Chan
Measuring Pronunciation Improvement in Users of CAPT Tool TipTopTalk!
Cristian Tejedor-García, David Escudero-Mancebo, Enrique Cámara-Arenas, César González-Ferreras, Valentín Cardeñoso-Payo
SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free L-F Model Component
Hideki Kawahara
Real-Time Tracking of Speakers’ Emotions, States, and Traits on Mobile Platforms
Erik Marchi, Florian Eyben, Gerhard Hagerer, Björn Schuller
Speaker Comparison for Forensic and Investigative Applications II
Jean-François Bonastre, Joseph P. Campbell, Anders Eriksson, Hiro Nakasone, Reva Schwartz
Acoustic-Prosodic and Turn-Taking Features in Interactions with Children with Neurodevelopmental Disorders
Daniel Bone, Somer Bishop, Rahul Gupta, Sungbok Lee, Shrikanth S. Narayanan
Automatic Detection of Parkinson’s Disease Based on Modulated Vowels
Daria Hemmerling, Juan Rafael Orozco-Arroyave, Andrzej Skalski, Janusz Gajda, Elmar Nöth
Towards Automatic Detection of Amyotrophic Lateral Sclerosis from Speech Acoustic and Articulatory Samples
Jun Wang, Prasanna V. Kothalkar, Beiming Cao, Daragh Heitzman
Neurophysiological Vocal Source Modeling for Biomarkers of Disease
Gregory Ciccarelli, Thomas F. Quatieri, Satrajit S. Ghosh
Relation of Automatically Extracted Formant Trajectories with Intelligibility Loss and Speaking Rate Decline in Amyotrophic Lateral Sclerosis
Rachelle L. Horwitz-Martin, Thomas F. Quatieri, Adam C. Lammert, James R. Williamson, Yana Yunusova, Elizabeth Godoy, Daryush D. Mehta, Jordan R. Green
Automatic Analysis of Typical and Atypical Encoding of Spontaneous Emotion in the Voice of Children
Fabien Ringeval, Erik Marchi, Charline Grossard, Jean Xavier, Mohamed Chetouani, David Cohen, Björn Schuller
Recognition of Depression in Bipolar Disorder: Leveraging Cohort and Person-Specific Knowledge
Soheil Khorram, John Gideon, Melvin McInnis, Emily Mower Provost
Diagnosing People with Dementia Using Automatic Conversation Analysis
Bahman Mirheidari, Daniel Blackburn, Markus Reuber, Traci Walker, Heidi Christensen
SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms
Paul Yaozhu Chan, Minghui Dong, Grace Xue Hui Ho, Haizhou Li
Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016
Jordi Bonada, Martí Umbert, Merlijn Blaauw
Vocal Effort Modification for Singing Synthesis
Olivier Perrotin, Christophe d’Alessandro
Bertsokantari: a TTS Based Singing Synthesis System
Eder del Blanco, Inma Hernaez, Eva Navas, Xabier Sarasola, D. Erro
Evaluation of Singing Synthesis: Methodology and Case Study with Concatenative and Performative Systems
Lionel Feugère, Christophe d’Alessandro, Samuel Delalez, Luc Ardaillon, Axel Roebel
Expressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric F0 Model
Luc Ardaillon, Celine Chabot-Canet, Axel Roebel
Optimal Unit Stitching in a Unit Selection Singing Synthesis System
Marius Cotescu
The Perception of Overlapping Speech: Effects of Speaker Prosody and Listener Attitudes
Katherine Hilton
Who Do You Think Will Speak Next? Perception of Turn-Taking Cues in Slovak and Argentine Spanish
Agustín Gravano, Pablo Brusco, Štefan Beňuš
Disentrainment may be a Positive Thing: A Novel Measure of Unsigned Acoustic-Prosodic Synchrony, and its Relation to Speaker Engagement
Juan M. Pérez, Ramiro H. Gálvez, Agustín Gravano
Respiratory Turn-Taking Cues
Marcin Włodarczak, Mattias Heldner
The Discourse Marker “so” in Turn-Taking and Turn-Releasing Behavior
Emma Rennie, Rebecca Lunsford, Peter A. Heeman
Acoustic Properties of Formality in Conversational Japanese
Ethan Sherr-Ziarko
Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques
Thomas Pellegrini, Sandrine Mouysset
Joint Learning of Speaker and Phonetic Similarities with Siamese Networks
Neil Zeghidour, Gabriel Synnaeve, Nicolas Usunier, Emmanuel Dupoux
Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets
Vikramjit Mitra, Dimitra Vergyri, Horacio Franco
Learning Multiscale Features Directly from Waveforms
Zhenyao Zhu, Jesse H. Engel, Awni Hannun
Supervised Learning of Acoustic Models in a Zero Resource Setting to Improve DPGMM Clustering
Michael Heck, Sakriani Sakti, Satoshi Nakamura
Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions
Haihua Xu, Hang Su, Chongjia Ni, Xiong Xiao, Hao Huang, Eng Siong Chng, Haizhou Li
Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features
Taichi Asami, Ryo Masumura, Yushi Aono, Koichi Shinoda
Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks
Naoyuki Kanda, Shoji Harada, Xugang Lu, Hisashi Kawai
Acoustic Word Embeddings for ASR Error Detection
Sahar Ghannay, Yannick Estève, Nathalie Camelin, Paul Deléglise
Combining Semantic Word Classes and Sub-Word Unit Speech Recognition for Robust OOV Detection
Axel Horndasch, Anton Batliner, Caroline Kaufhold, Elmar Nöth
Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition
Chuandong Xie, Wu Guo, Guoping Hu, Junhua Liu
Colloquialising Modern Standard Arabic Text for Improved Speech Recognition
Sarah Al-Shareef, Thomas Hain
Pitch-Range Perception: The Dynamic Interaction Between Voice Quality and Fundamental Frequency
Jianjing Kuang, Mark Liberman
Comparing the Contributions of Amplitude and Phase to Speech Intelligibility in a Vocoder-Based Speech Synthesis Model
Fei Chen, Benson C.L. Chiao
Modeling Noise Influence to Speech Intelligibility Non-Intrusively by Reduced Speech Dynamic Range
Fei Chen
Do GMM Phoneme Classifiers Perceive Synthetic Sibilants as Humans Do?
Gábor Pintér, Hiroki Watanabe
Neural Responses to Speech-Specific Modulations Derived from a Spectro-Temporal Filter Bank
Marina Frye, Cristiano Micheli, Inga M. Schepers, Gerwin Schalk, Jochem W. Rieger, Bernd T. Meyer
Comparing Different Methods for Analyzing ERP Signals
Kimberley Mulder, Louis ten Bosch, Lou Boves
Supplementary Motor Area Activation in Disfluency Perception: An fMRI Study of Listener Neural Responses to Spontaneously Produced Unfilled and Filled Pauses
Robert Eklund, Martin Ingvar
Vowel Fundamental and Formant Frequency Contributions to English and Mandarin Sentence Intelligibility
Daniel Fogerty, Fei Chen
Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition
Che-Wei Huang, Shrikanth S. Narayanan
Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition
Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen Meng, Lianhong Cai
Inter-Speech Clicks in an Interspeech Keynote
Jürgen Trouvain, Zofia Malisz
Speaker Age Classification and Regression Using i-Vectors
Joanna Grzybowska, Stanisław Kacprzak
Sparsely Connected and Disjointly Trained Deep Neural Networks for Low Resource Behavioral Annotation: Acoustic Classification in Couples’ Therapy
Haoqi Li, Brian Baucom, Panayiotis Georgiou
Automatically Classifying Self-Rated Personality Scores from Speech
Guozhen An, Sarah Ita Levitan, Rivka Levitan, Andrew Rosenberg, Michelle Levine, Julia Hirschberg
Estimation of Children’s Physical Characteristics from Their Voices
Jill Fain Lehman, Rita Singh
Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Hayakawa Akira, Saturnino Luz, Nick Campbell
Predicting Affective Dimensions Based on Self Assessed Depression Severity
Rahul Gupta, Shrikanth S. Narayanan
Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information
Wen-Yu Huang, Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Chi-Chun Lee
Use of Vowels in Discriminating Speech-Laugh from Laughter and Neutral Speech
Sri Harsha Dumpala, P. Gangamohan, Suryakanth V. Gangashetty, B. Yegnanarayana
A Convex Model for Linguistic Influence in Group Conversations
Kan Kawabata, Visar Berisha, Anna Scaglione, Amy LaCross
A Deep Learning Approach to Modeling Empathy in Addiction Counseling
James Gibson, Doğan Can, Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis Georgiou, Shrikanth S. Narayanan
Unipolar Depression vs. Bipolar Disorder: An Elicitation-Based Approach to Short-Term Detection of Mood Disorder
Kun-Yi Huang, Chung-Hsien Wu, Yu-Ting Kuo, Fong-Lin Jang
Conditional Random Fields for the Tunisian Dialect Grapheme-to-Phoneme Conversion
Abir Masmoudi, Mariem Ellouze, Fethi Bougares, Yannick Esètve, Lamia Belguith
Efficient Thai Grapheme-to-Phoneme Conversion Using CRF-Based Joint Sequence Modeling
Sittipong Saychum, Sarawoot Kongyoung, Anocha Rugchatjaroen, Patcharika Chootrakool, Sawit Kasuriya, Chai Wutiwiwatchai
An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging
Aurore Jaumard-Hakoun, Kele Xu, Clémence Leboullenger, Pierre Roussel-Ragot, Bruce Denby
Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai
Audio-to-Visual Speech Conversion Using Deep Neural Networks
Sarah Taylor, Akihiro Kato, Iain Matthews, Ben Milner
Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine
Toru Nakashika, Yasuhiro Minami
Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data
Asterios Toutios, Tanner Sorensen, Krishna Somandepalli, Rachel Alexander, Shrikanth S. Narayanan
Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information
Xurong Xie, Xunying Liu, Lan Wang
Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks
Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai
Generating Gestural Scores from Acoustics Through a Sparse Anchor-Based Representation of Speech
Christopher Liberatore, Ricardo Gutierrez-Osuna
On the Suitability of Vocalic Sandwiches in a Corpus-Based TTS Engine
David Guennec, Damien Lolive
Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
Using Zero-Frequency Resonator to Extract Multilingual Intonation Structure
Jinfu Ni, Yoshinori Shiga, Hisashi Kawai
A DNN-HMM Approach to Story Segmentation
Jia Yu, Xiong Xiao, Lei Xie, Eng Siong Chng, Haizhou Li
The SIWIS Database: A Multilingual Speech Database with Acted Emphasis
Jean-Philippe Goldman, Pierre-Edouard Honnet, Rob Clark, Philip N. Garner, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Tiago Macedo, Beat Pfister, Manuel Sam Ribeiro, Eric Wehrli, Junichi Yamagishi
Open Source Speech and Language Resources for Frisian
Emre Yılmaz, Henk van den Heuvel, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, David Van Leeuwen
The SRI CLEO Speaker-State Corpus
Andreas Kathol, Elizabeth Shriberg, Massimilano de Zambotti
SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese
Nancy F. Chen, Rong Tong, Darren Wee, Peixuan Lee, Bin Ma, Haizhou Li
The SRI Speech-Based Collaborative Learning Corpus
Colleen Richey, Cynthia D’Angelo, Nonye Alozie, Harry Bratt, Elizabeth Shriberg
An Expectation Maximization Approach to Joint Modeling of Multidimensional Ratings Derived from Multiple Annotators
Anil Ramakrishna, Rahul Gupta, Ruth B. Grossman, Shrikanth S. Narayanan
Voting Detector: A Combination of Anomaly Detectors to Reveal Annotation Errors in TTS Corpora
Jindřich Matoušek, Daniel Tihelka
The Magic Stone: A Video Game to Improve Communication Skills of People with Intellectual Disabilities
Mario Corrales-Astorgano, David Escudero-Mancebo, César González-Ferreras, Yurena Gutiérrez-González, Valle Flores-Lucas, Valentín Cardeñoso-Payo, Lourdes Aguilar-Cuevas
Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features
Finnian Kelly, Anil Alexander, Oscar Forth, Samuel Kent, Jonas Lindh, Joel Åkesson
A Real-Time Framework for Visual Feedback of Articulatory Data Using Statistical Shape Models
Kristy James, Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer
Flexible, Rapid Authoring of Goal-Orientated, Multi-Turn Dialogues Using the Task Completion Platform
Alex Marin, Paul Crook, Omar Zia Khan, Vasiliy Radostev, Khushboo Aggarwal, Ruhi Sarikaya
Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models
Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Takuya Yoshioka, Dung T. Tran, Tomohiro Nakatani
Transfer Learning with Bottleneck Feature Networks for Whispered Speech Recognition
Boon Pang Lim, Faith Wong, Yuyao Li, Jia Wei Bay
Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations
Tasha Nagamine, Zhuo Chen, Nima Mesgarani
Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings
Masayuki Suzuki, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran, George Saon
Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic Models
Lahiru Samarakoon, Khe Chai Sim
Improving Children’s Speech Recognition Through Out-of-Domain Data Augmentation
Joachim Fainberg, Peter Bell, Mike Lincoln, Steve Renals
Virtual Machines and Containers as a Platform for Experimentation
Florian Metze, Eric Riebling, Anne S. Warlaumont, Elika Bergelson
CloudCAST — Remote Speech Technology for Speech Professionals
Phil Green, Ricard Marxer, Stuart Cunningham, Heidi Christensen, Frank Rudzicz, Maria Yancheva, André Coy, Massimiliano Malavasi, Lorenzo Desideri, Fabio Tamburini
webASR 2 — Improved Cloud Based Speech Technology
Thomas Hain, Jeremy Christian, Oscar Saz, Salil Deena, Madina Hasan, Raymond W.M. Ng, Rosanna Milner, Mortaza Doulaty, Yulan Liu
Sharing Speech Synthesis Software for Research and Education Within Low-Tech and Low-Resource Communities
Andrew R. Plummer, Mary E. Beckman
The Berkeley Phonetics Machine
Ronald L. Sprouse, Keith Johnson
Experiences with Shared Resources for Research and Education in Speech and Language Processing
Rebecca Bates, Eric Fosler-Lussier, Florian Metze, Martha Larson, Gina-Anne Levow, Emily Mower Provost
The Voice Conversion Challenge 2016
Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi
Analysis of the Voice Conversion Challenge 2016 Evaluation Results
Mirjam Wester, Zhizheng Wu, Junichi Yamagishi
The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion
Ling-Hui Chen, Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai
A Voice Conversion Mapping Function Based on a Stacked Joint-Autoencoder
Seyed Hamidreza Mohammadi, Alexander Kain
Locally Linear Embedding for Exemplar-Based Spectral Conversion
Yi-Chiao Wu, Hsin-Te Hwang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang
Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016
Fernando Villavicencio, Junichi Yamagishi, Jordi Bonada, Felipe Espic
ML Parameter Generation with a Reformulated MGE Training Criterion — Participation in the Voice Conversion Challenge 2016
D. Erro, A. Alonso, L. Serrano, D. Tavarez, I. Odriozola, Xabier Sarasola, Eder del Blanco, J. Sanchez, I. Saratxaga, Eva Navas, Inma Hernaez
The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016
Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda
Release from Energetic Masking Caused by Repeated Patterns of Glimpsing Windows
Maury Lander-Portnoy
Glimpsing Predictions for Natural and Vocoded Sentence Intelligibility During Modulation Masking: Effect of the Glimpse Cutoff Criterion
Bobby Gibbs, Daniel Fogerty
Temporal Envelopes in Sine-Wave Speech Recognition
Li Xu
Understanding Periodically Interrupted Mandarin Speech
Jing Liu, Rosanna H.N. Tong, Fei Chen
Factors Affecting the Intelligibility of Sine-Wave Speech
Fei Chen, Daniel Fogerty
Effects of Urgent Speech and Preceding Sounds on Speech Intelligibility in Noisy and Reverberant Environments
Nao Hodoshima
Integrated Spoofing Countermeasures and Automatic Speaker Verification: An Evaluation on ASVspoof 2015
Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Hong Yu, Tomi Kinnunen, Nicholas Evans, Zheng-Hua Tan
Cross-Database Evaluation of Audio-Based Spoofing Detection Systems
Pavel Korshunov, Sébastien Marcel
Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech
Kaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah
An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions
Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li
Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech
Md. Sahidullah, Rosa Gonzalez Hautamäki, Dennis Alexander Lehmann Thomsen, Tomi Kinnunen, Zheng-Hua Tan, Ville Hautamäki, Robert Parts, Martti Pitkänen
Statistical Modeling of Speaker’s Voice with Temporal Co-Location for Active Voice Authentication
Zhong Meng, Biing-Hwang Juang
Joint Enhancement and Coding of Speech by Incorporating Wiener Filtering in a CELP Codec
Johannes Fischer, Tom Bäckström
Multi-Channel Linear Prediction Based on Binaural Coherence for Speech Dereverberation
Hong Liu, Xiuling Wang, Miao Sun, Cheng Pang
Single-Channel Speech Enhancement Using Double Spectrum
Martin Blass, Pejman Mowlaee, W. Bastiaan Kleijn
On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement
Lukas Drude, Bhiksha Raj, Reinhold Haeb-Umbach
Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement
Steffen Zeiler, Hendrik Meutzner, Ahmed Hussen Abdelaziz, Dorothea Kolossa
Assessing Speech Quality in Speech-Aware Hearing Aids Based on Phoneme Posteriorgrams
Constantin Spille, Hendrik Kayser, Hynek Hermansky, Bernd T. Meyer
Time-Varying Quasi-Closed-Phase Weighted Linear Prediction Analysis of Speech for Accurate Formant Detection and Tracking
Dhananjaya Gowda, Paavo Alku
Improved Depiction of Tissue Boundaries in Vocal Tract Real-Time MRI Using Automatic Off-Resonance Correction
Yongwan Lim, Sajan Goud Lingala, Asterios Toutios, Shrikanth S. Narayanan, Krishna S. Nayak
Modeling and Transforming Speech Using Variational Autoencoders
Merlijn Blaauw, Jordi Bonada
Phase-Encoded Speech Spectrograms
Chandra Sekhar Seelamantula
Towards Minimally Invasive Velar State Detection in Normal and Silent Speech
Peter Birkholz, Petko Bakardjiev, Steffen Kürbis, Rico Petrick
RNN-BLSTM Based Multi-Pitch Estimation
Jianshu Zhang, Jian Tang, Li-Rong Dai
TUSK: A Framework for Overviewing the Performance of F0 Estimators
Masanori Morise, Hideki Kawahara
A Robust Non-Parametric and Filtering Based Approach for Glottal Closure Instant Detection
Pradeep Rengaswamy, Gurunath Reddy M., K. Sreenivasa Rao, Pallab Dasgupta
Analysis of Face Mask Effect on Speaker Recognition
Rahim Saeidi, Ilkka Huhtakallio, Paavo Alku
Data Selection for Within-Class Covariance Estimation
Elliot Singer, Tyler Campbell, Douglas Reynolds
Inter-Task System Fusion for Speaker Recognition
M. Ferras, Srikanth Madikeri, S. Dey, Petr Motlicek, Hervé Bourlard
Mahalanobis Metric Scoring Learned from Weighted Pairwise Constraints in I-Vector Speaker Recognition System
Zhenchun Lei, Yanhong Wan, Jian Luo, Yingen Yang
Novel Subband Autoencoder Features for Detection of Spoofed Speech
Meet H. Soni, Tanvina B. Patel, Hemant A. Patil
On the Issue of Calibration in DNN-Based Speaker Recognition Systems
Mitchell McLaren, Diego Castan, Luciana Ferrer, Aaron Lawson
Probabilistic Approach Using Joint Long and Short Session i-Vectors Modeling to Deal with Short Utterances for Speaker Recognition
Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre
Short Utterance Variance Modelling and Utterance Partitioning for PLDA Speaker Verification
Ahilan Kanagasundaram, David Dean, Sridha Sridharan, Clinton Fookes, Ivan Himawan
Speaker-Dependent Dictionary-Based Speech Enhancement for Text-Dependent Speaker Verification
Nicolai Bæk Thomsen, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Børge Lindberg, Søren Holdt Jensen
Text-Available Speaker Recognition System for Forensic Applications
Chengzhu Yu, Chunlei Zhang, Finnian Kelly, Abhijeet Sangwan, John H.L. Hansen
Transfer Learning for Speaker Verification on Short Utterances
Qingyang Hong, Lin Li, Lihong Wan, Jun Zhang, Feng Tong
Twin Model G-PLDA for Duration Mismatch Compensation in Text-Independent Speaker Verification
Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong Aik Lee
Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Clustering
Xiao-Lei Zhang
Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data
Yao Tian, Meng Cai, Liang He, Wei-Qiang Zhang, Jia Liu
Maximum a posteriori Based Decoding for CTC Acoustic Models
Naoyuki Kanda, Xugang Lu, Hisashi Kawai
Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures
Afsaneh Asaei, Gil Luyet, Milos Cernak, Hervé Bourlard
Model Compression Applied to Small-Footprint Keyword Spotting
George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni
Why do ASR Systems Despite Neural Nets Still Depend on Robust Features
Angel Mario Castro Martinez, Marc René Schädler
An Adaptive Multi-Band System for Low Power Voice Command Recognition
Qing He, Gregory W. Wornell, Wei Ma
Memory-Efficient Modeling and Search Techniques for Hardware ASR Decoders
Michael Price, Anantha Chandrakasan, James Glass
Log-Linear System Combination Using Structured Support Vector Machines
J. Yang, Anton Ragni, Mark J.F. Gales, Kate M. Knill
Efficient Segmental Cascades for Speech Recognition
Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
A WFST Framework for Single-Pass Multi-Stream Decoding
Sirui Xu, Eric Fosler-Lussier
Comparison of Multiple System Combination Techniques for Keyword Spotting
William Hartmann, Le Zhang, Kerri Barnes, Roger Hsiao, Stavros Tsakalidis, Richard Schwartz
Rescoring by Combination of Posteriorgram Score and Subword-Matching Score for Use in Query-by-Example
Masato Obara, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh
Phone Synchronous Decoding with CTC Lattice
Zhehuai Chen, Wei Deng, Tao Xu, Kai Yu
Speech Features for Depression Detection
Saurabh Sahu, Carol Espy-Wilson
Parkinson’s Disease Progression Assessment from Speech Using GMM-UBM
T. Arias-Vergara, J.C. Vasquez-Correa, Juan Rafael Orozco-Arroyave, J.F. Vargas-Bonilla, Elmar Nöth
Speech-Based Detection of Alzheimer’s Disease in Conversational German
Jochen Weiner, Christian Herff, Tanja Schultz
Cross-Cultural Depression Recognition from Vocal Biomarkers
Sharifa Alghowinem, Roland Goecke, Julien Epps, Michael Wagner, Jeffrey Cohn
Speech Recognition in Alzheimer’s Disease and in its Assessment
Luke Zhou, Kathleen C. Fraser, Frank Rudzicz
Does She Speak RTT? Towards an Earlier Identification of Rett Syndrome Through Intelligent Pre-Linguistic Vocalisation Analysis
Florian B. Pokorny, Peter B. Marschik, Christa Einspieler, Björn Schuller
Speech Rhythm in Parkinson’s Disease: A Study on Italian
Massimo Pettorino, Maria Grazia Busà, Elisa Pellegrino
English Language Speech Assistant
Xavier Anguera, Vu Van
Remeeting — Deep Insights to Conversations
Allen Guo, Arlo Faria, Korbinian Riedhammer
SERAPHIM Live! — Singing Synthesis for the Performer, the Composer, and the 3D Game Developer
Paul Yaozhu Chan, Minghui Dong, Grace Xue Hui Ho, Haizhou Li
My-Own-Voice: A Web Service That Allows You to Create a Text-to-Speech Voice From Your Own Voice
Fabrice Malfrere, Olivier Deroo, Emmanuelle Franques, Jonathan Hourez, Nicolas Mazars, Vincent Pagel, Geoffrey Wilfart
Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction
Tara N. Sainath, Arun Narayanan, Ron J. Weiss, Ehsan Variani, Kevin W. Wilson, Michiel Bacchiani, Izhak Shafran
Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
Bo Li, Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Michiel Bacchiani
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks
Hakan Erdogan, John R. Hershey, Shinji Watanabe, Michael I. Mandel, Jonathan Le Roux
Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance
Cristina Guerrero, Georgina Tryfou, Maurizio Omologo
Multichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions
Michael I. Mandel, Jon Barker
Far-Field ASR Without Parallel Data
Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini
The Deception Sub-Challenge: The Data
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini
Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection
Sarah Ita Levitan, Guozhen An, Min Ma, Rivka Levitan, Andrew Rosenberg, Julia Hirschberg
Is Deception Emotional? An Emotion-Driven Predictive Approach
Shahin Amiriparian, Jouni Pohjalainen, Erik Marchi, Sergey Pugachevskiy, Björn Schuller
Prosodic Cues and Answer Type Detection for the Deception Sub-Challenge
Claude Montacié, Marie-José Caraty
The Sincerity Sub-Challenge: The Data
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini
Automatic Estimation of Perceived Sincerity from Spoken Language
Brandon M. Booth, Rahul Gupta, Pavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan
Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis
Gábor Gosztolya, Tamás Grósz, György Szaszák, László Tóth
Minimization of Regression and Ranking Losses with Shallow Neural Networks on Automatic Sincerity Evaluation
Hung-Shin Lee, Yu Tsao, Chi-Chun Lee, Hsin-Min Wang, Wei-Cheng Lin, Wei-Chen Chen, Shan-Wen Hsiao, Shyh-Kang Jeng
Prediction of Deception and Sincerity from Speech Using Automatic Phone Recognition-Based Features
Robert Herms
Sincerity and Deception in Speech: Two Sides of the Same Coin? A Transfer- and Multi-Task Learning Perspective
Yue Zhang, Felix Weninger, Zhao Ren, Björn Schuller
Fusing Acoustic Feature Representations for Computational Paralinguistics Tasks
Heysem Kaya, Alexey A. Karpov
Introduction
Naomi Harte, Peter Jančovič, Karl-L. Schuchmann
Poster Overview Presentations
Naomi Harte, Peter Jančovič, Karl-L. Schuchmann
Discussion
Naomi Harte, Peter Jančovič, Karl-L. Schuchmann
Closing Remarks
Naomi Harte, Peter Jančovič, Karl-L. Schuchmann
A Stochastic Model for Computer-Aided Human-Human Dialogue
Merwan Barlier, Romain Laroche, Olivier Pietquin
Highlighting Psychological Features for Predicting Child Interjections During Story Telling
Gaël Lejeune, François Rioult, Bruno Crémilleux
Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues
Kai Sun, Su Zhu, Lu Chen, Siqiu Yao, Xueyang Wu, Kai Yu
Automatic Recognition of Social Roles Using Long Term Role Transitions in Small Group Interactions
Gaurav Fotedar, Aditya Gaonkar P., Saikat Chatterjee, Prasanta Kumar Ghosh
On the Influence of Gender on Interruptions in Multiparty Dialogue
Paul Van Eecke, Raquel Fernández
Detection of User Escalation in Human-Computer Interactions
Ian Beaver, Cynthia Freeman
Assessing Idiosyncrasies in a Bayesian Model of Speech Communication
Marie-Lou Barnaud, Julien Diard, Pierre Bessière, Jean-Luc Schwartz
Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and Cognition
Maria K. Wolters, Najoung Kim, Jung-Ho Kim, Sarah E. MacPherson, Jong C. Park
Sensorimotor Response to Visual Imagery of Tongue Displacement
William F. Katz, Divya Prabhakaran
Does Auditory-Motor Learning of Speech Transfer from the CV Syllable to the CVCV Word?
Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz, Amélie Rochet-Capellan
Exemplar Dynamics in Phonetic Convergence of Speech Rate
Antje Schweitzer, Michael Walsh
Articulation Rate in Adverse Listening Conditions in Younger and Older Adults
Outi Tuomainen, Valerie Hazan
Error Correction in Lightly Supervised Alignment of Broadcast Subtitles
Julia Olcoz, Oscar Saz, Thomas Hain
Automatic Genre and Show Identification of Broadcast Media
Mortaza Doulaty, Oscar Saz, Raymond W.M. Ng, Thomas Hain
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments
Guan-Lin Chao, William Chan, Ian Lane
Text-Dependent Audiovisual Synchrony Detection for Spoofing Detection in Mobile Person Recognition
Amit Aides, Hagai Aronowitz
Improving Boundary Estimation in Audiovisual Speech Activity Detection Using Bayesian Information Criterion
Fei Tao, John H.L. Hansen, Carlos Busso
Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR
Sebastian Gergen, Steffen Zeiler, Ahmed Hussen Abdelaziz, Robert Nickel, Dorothea Kolossa
Retrieval of Textual Song Lyrics from Sung Inputs
Anna M. Kruspe
Phoneme, Phone Boundary, and Tone in Automatic Scoring of Mandarin Proficiency
Jiahong Yuan, Mark Liberman
Tone Classification in Mandarin Chinese Using Convolutional Neural Networks
Charles Chen, Razvan Bunescu, Li Xu, Chang Liu
Robust Estimation of Fundamental Frequency Using Single Frequency Filtering Approach
Vishala Pannala, G. Aneeja, Sudarsana Reddy Kadiri, B. Yegnanarayana
A Fast and Accurate Fundamental Frequency Estimator Using Recursive Moving Average Filters
Ryunosuke Daido, Yuji Hisaminato
Frequency Estimation from Waveforms Using Multi-Layered Neural Networks
Prateek Verma, Ronald W. Schafer
Speaker Linking and Applications Using Non-Parametric Hashing Methods
Douglas E. Sturim, William M. Campbell
Iterative PLDA Adaptation for Speaker Diarization
Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier
A Speaker Diarization System for Studying Peer-Led Team Learning Groups
Harishchandra Dubey, Lakshmish Kaushik, Abhijeet Sangwan, John H.L. Hansen
DNN-Based Speaker Clustering for Speaker Diarisation
Rosanna Milner, Thomas Hain
On the Importance of Efficient Transition Modeling for Speaker Diarization
Itshak Lapidot, Jean-François Bonastre
Priors for Speaker Counting and Diarization with AHC
Gregory Sell, Alan McCree, Daniel Garcia-Romero
Two-Pass IB Based Speaker Diarization System Using Meeting-Specific ANN Based Features
Nauman Dawalatabad, Srikanth Madikeri, C Chandra Sekhar, Hema A. Murthy
DNN-Based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification
Zeyan Oo, Yuta Kawakami, Longbiao Wang, Seiichi Nakagawa, Xiong Xiao, Masahiro Iwahashi
Unit-Selection Attack Detection Based on Unfiltered Frequency-Domain Features
Ulrich Scherhag, Andreas Nautsch, Christian Rathgeb, Christoph Busch
Investigating the Impact of Dialect Prestige on Lexical Decision
Mairym Lloréns Monteserín, Jason Zevin
Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features
Jinxi Guo, Gary Yeung, Deepak Muralidharan, Harish Arsikere, Amber Afshan, Abeer Alwan
Factor Analysis Based Speaker Verification Using ASR
Hang Su, Steven Wegmann
Joint Sound Source Separation and Speaker Recognition
Jeroen Zegers, Hugo Van hamme
Robust Multichannel Gender Classification from Speech in Movie Audio
Naveen Kumar, Md. Nasir, Panayiotis Georgiou, Shrikanth S. Narayanan
Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer
Xavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin, Hanna Silen
First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention
Wenfu Wang, Shuang Xu, Bo Xu
The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis
Zhengqi Wen, Ya Li, Jianhua Tao
Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis
Eunwoo Song, Frank K. Soong, Hong-Goo Kang
Voice Quality Control Using Perceptual Expressions for Statistical Parametric Speech Synthesis Based on Cluster Adaptive Training
Yamato Ohtani, Koichiro Mori, Masahiro Morita
Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis
Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, Simon King
Speaker Representations for Speaker Adaptation in Multiple Speakers’ BLSTM-RNN-Based Speech Synthesis
Yi Zhao, Daisuke Saito, Nobuaki Minematsu
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemysław Szczepaniak
An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
Nobukatsu Hojo, Yusuke Ijima, Hideyuki Mizuno
Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks
Lauri Juvela, Xin Wang, Shinji Takaki, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework
Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN
Blaise Potard, Matthew P. Aylett, David A. Baude, Petr Motlicek
Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody
Alexandros Lazaridis, Milos Cernak, Philip N. Garner
On Smoothing and Enhancing Dynamics of Pitch Contours Represented by Discrete Orthogonal Polynomials for Prosody Generation
Chen-Yu Chiang
An Investigation of Recurrent Neural Network Architectures Using Word Embeddings for Phrase Break Prediction
Anandaswarup Vadapalli, Suryakanth V. Gangashetty
Model-Based Parametric Prosody Synthesis with Deep Neural Network
Hao Liu, Heng Lu, Xu Shao, Yi Xu
Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models
Thomas Drugman, Janne Pylkkönen, Reinhard Kneser
Learning N-Gram Language Models from Uncertain Data
Vitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, Brian Roark
Entropy Based Pruning for Non-Negative Matrix Based Language Models with Contextual Features
Barlas Oğuz, Issac Alphonso, Shuangyu Chang
Unsupervised Adaptation of Recurrent Neural Network Language Models
Siva Reddy Gangireddy, Pawel Swietojanski, Peter Bell, Steve Renals
Contextual Prediction Models for Speech Recognition
Yoni Halpern, Keith Hall, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Bäuml
Combining Feature and Model-Based Adaptation of RNNLMs for Multi-Genre Broadcast Speech Recognition
Salil Deena, Madina Hasan, Mortaza Doulaty, Oscar Saz, Thomas Hain
A Low Cost Desktop Robot and Tele-Presence Device for Interactive Speech Research
Michael C. Brady
Silent-Speech Command Word Recognition Using Electro-Optical Stomatography
Simon Stone, Peter Birkholz
An Engine for Online Video Search in Large Archives of the Holocaust Testimonies
Petr Stanislav, Jan Švec, Pavel Ircing
Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training
Kateřina Žmolíková, Martin Karafiát, Karel Veselý, Marc Delcroix, Shinji Watanabe, Lukáš Burget, Jan Černocký
Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition
Souvik Kundu, Khe Chai Sim, Mark J.F. Gales
Robust Speech Recognition Using Generalized Distillation Framework
Konstantin Markov, Tomoko Matsui
Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition
Yusuke Shinohara
The Use of Locally Normalized Cepstral Coefficients (LNCC) to Improve Speaker Recognition Accuracy in Highly Reverberant Rooms
Víctor Poblete, Juan Pablo Escudero, Josué Fredes, José Novoa, Richard M. Stern, Simon King, Néstor Becerra Yoma
Two-Stage Data Augmentation for Low-Resourced Speech Recognition
William Hartmann, Tim Ng, Roger Hsiao, Stavros Tsakalidis, Richard Schwartz
The Native Language Sub-Challenge: The Data
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini
Native Language Identification Using Spectral and Source-Based Features
Avni Rajpal, Tanvina B. Patel, Hardik B. Sailor, Maulik C. Madhavi, Hemant A. Patil, Hiroya Fujisaki
Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features
Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss
Convolutional Neural Networks with Data Augmentation for Classifying Speakers’ Native Language
Gil Keren, Jun Deng, Jouni Pohjalainen, Björn Schuller
Native Language Detection Using the I-Vector Framework
Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro L. Koerich
Within-Speaker Features for Native Language Recognition in the Interspeech 2016 Computational Paralinguistics Challenge
Mark Huckvale
Multimodal Fusion of Multirate Acoustic, Prosodic, and Lexical Speaker Characteristics for Native Language Identification
Prashanth Gurunath Shivakumar, Sandeep Nallan Chakravarthula, Panayiotis Georgiou
Exploiting Phone Log-Likelihood Ratio Features for the Detection of the Native Language of Non-Native English Speakers
Alberto Abad, Eugénio Ribeiro, Fábio Kepler, Ramon Astudillo, Isabel Trancoso
Determining Native Language and Deception Using Phonetic Features and Classifier Combination
Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth
The INTERSPEECH 2016 Computational Paralinguistics Challenge: A Summary of Results
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini
Discussion
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini
A Preliminary Ultrasound Study of Nasal and Lateral Coronals in Arrernte
Marija Tabain, Richard Beare
Illustrating the Production of the International Phonetic Alphabet Sounds Using Fast Real-Time Magnetic Resonance Imaging
Asterios Toutios, Sajan Goud Lingala, Colin Vaz, Jangwon Kim, John Esling, Patricia Keating, Matthew Gordon, Dani Byrd, Louis Goldstein, Krishna S. Nayak, Shrikanth S. Narayanan
Marginal Contrast Among Romanian Vowels: Evidence from ASR and Functional Load
Margaret E.L. Renwick, Ioana Vasilescu, Camille Dutrey, Lori Lamel, Bianca Vieru
Effects of Subglottal-Coupling and Interdental-Space on Formant Trajectories During Front-to-Back Vowel Transitions in Chinese
Shuanglin Fan, Kiyoshi Honda, Jianwu Dang, Hui Feng
Perceptual Lateralization of Coda Rhotic Production in Puerto Rican Spanish
Mairym Lloréns Monteserín, Shrikanth S. Narayanan, Louis Goldstein
Interaction Between Lexical Tone and Intonation: An EMA Study
Hao Yi, Sam Tilsen
Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion
Huaiping Ming, Dongyan Huang, Lei Xie, Jie Wu, Minghui Dong, Haizhou Li
Visual Speech Synthesis Using Dynamic Visemes, Contextual Features and DNNs
Ausdang Thangthai, Ben Milner, Sarah Taylor
A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs
Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, Simon King
Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis
Bo Li, Heiga Zen
GlottDNN — A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis
Manu Airaksinen, Bajibabu Bollepalli, Lauri Juvela, Zhizheng Wu, Simon King, Paavo Alku
Singing Voice Synthesis Based on Deep Neural Networks
Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
Blind Recovery of Perceptual Models in Distributed Speech and Audio Coding
Tom Bäckström, Florin Ghido, Johannes Fischer
Glimpse-Based Metrics for Predicting Speech Intelligibility in Additive Noise Conditions
Yan Tang, Martin Cooke
Analyzing the Relation Between Overall Quality and the Quality of Individual Phases in a Telephone Conversation
Friedemann Köster, Sebastian Möller
Intelligibility Enhancement at the Receiving End of the Speech Transmission System — Effects of Far-End Noise Reduction
Emma Jokinen, Paavo Alku
Intelligibility of Disordered Speech: Global and Detailed Scores
Mario Ganzeboom, Marjoke Bakker, Catia Cucchiarini, Helmer Strik
Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility in Noise
Maria Koutsogiannaki, Yannis Stylianou
Dynamic Transcription for Low-Latency Speech Translation
Jan Niehues, Thai Son Nguyen, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Müller, Matthias Sperber, Sebastian Stüker, Alex Waibel
Learning a Translation Model from Word Lattices
Oliver Adams, Graham Neubig, Trevor Cohn, Steven Bird
Disfluency Detection Using a Bidirectional LSTM
Vicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi
Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models
Xiaoyin Che, Sheng Luo, Haojin Yang, Christoph Meinel
Transferring Emphasis in Speech Translation Using Hard-Attentional Neural Network Models
Quoc Truong Do, Sakriani Sakti, Graham Neubig, Satoshi Nakamura
Better Evaluation of ASR in Speech Translation Context Using Word Embeddings
Ngoc-Tien Le, Christophe Servan, Benjamin Lecouteux, Laurent Besacier
Entropy Coding of Spectral Envelopes for Speech and Audio Coding Using Distribution Quantization
Srikanth Korse, Tobias Jähnel, Tom Bäckström
An Objective Evaluation Methodology for Blind Bandwidth Extension
Stéphane Villette, Sen Li, Pravin Ramadas, Daniel J. Sinder
EVS Channel Aware Mode Robustness to Frame Erasures
Anssi Rämö, Antti Kurittu, Henri Toukomaa
An Interaural Magnification Algorithm for Enhancement of Naturally-Occurring Level Differences
Shadi Pirhosseinloo, Kostas Kokkinakis
Probabilistic Spatial Filter Estimation for Signal Enhancement in Multi-Channel Automatic Speech Recognition
Hendrik Kayser, Niko Moritz, Jörn Anemüller
Improved a priori SAP Estimator in Complex Noisy Environment for Dual Channel Microphone System
Youna Ji, Young-cheol Park
A Spectral Modulation Sensitivity Weighted Pre-Emphasis Filter for Active Noise Control System
Kah-Meng Cheong, Yuh-Yuan Wang, Tai-Shih Chi
Semi-Coupled Dictionary Based Automatic Bandwidth Extension Approach for Enhancing Children’s ASR
Ganji Sreeram, Rohit Sinha
Bird Song Synthesis Based on Hidden Markov Models
Jordi Bonada, Robert Lachlan, Merlijn Blaauw
Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification
Kantapon Kaewtip, Charles Taylor, Abeer Alwan
A Framework for Automated Marmoset Vocalization Detection and Classification
Alan Wisler, Laura J. Brattain, Rogier Landman, Thomas F. Quatieri
Call Alternation Between Specific Pairs of Male Frogs Revealed by a Sound-Imaging Method in Their Natural Habitat
Ikkyu Aihara, Takeshi Mizumoto, Hiromitsu Awano, Hiroshi G. Okuno
Sinusoidal Modelling for Ecoacoustics
Patrice Guyot, Alice Eldridge, Ying Chen Eyre-Walker, Alison Johnston, Thomas Pellegrini, Mika Peck
Individual Identity in Songbirds: Signal Representations and Metric Learning for Locating the Information in Complex Corvid Calls
Dan Stowell, Veronica Morfi, Lisa F. Gill
Recognition of Multiple Bird Species Based on Penalised Maximum Likelihood and HMM-Based Modelling of Individual Vocalisation Elements
Peter Jančovič, Münevver Köküer
Cost Effective Acoustic Monitoring of Bird Species
Ciira wa Maina
Feature Learning and Automatic Segmentation for Dolphin Communication Analysis
Daniel Kohlsdorf, Denise Herzing, Thad Starner
Localizing Bird Songs Using an Open Source Robot Audition System with a Microphone Array
Reiji Suzuki, Shiho Matsubayashi, Kazuhiro Nakadai, Hiroshi G. Okuno
Robust Detection of Multiple Bioacoustic Events with Repetitive Structures
Frank Kurth
A Real-Time Parametric General-Purpose Mammalian Vocal Synthesiser
Roger K. Moore
YIN-Bird: Improved Pitch Tracking for Bird Vocalisations
Colm O’Reilly, Nicola M. Marples, David J. Kelly, Naomi Harte
Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions
Yao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Berlin Chen
Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech
Peter A. Heeman, Rebecca Lunsford, Andy McMillin, J. Scott Yaruss
Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale
Simin Xie, Nan Yan, Ping Yu, Manwa L. Ng, Lan Wang, Zhuanzhuan Ji
Automated Screening of Speech Development Issues in Children by Identifying Phonological Error Patterns
Lauren Ward, Alessandro Stefani, Daniel Smith, Andreas Duenser, Jill Freyne, Barbara Dodd, Angela Morgan
Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures
Ju Lin, Yanlu Xie, Jinsong Zhang
Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model
Myungjong Kim, Jun Wang, Hoirin Kim
Detection of Total Syllables and Canonical Syllables in Infant Vocalizations
Anne S. Warlaumont, Heather L. Ramsdell-Hudock
Improving Automatic Recognition of Aphasic Speech with AphasiaBank
Duc Le, Emily Mower Provost
Pronunciation Assessment of Japanese Learners of French with GOP Scores and Phonetic Information
Vincent Laborde, Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Halima Sahraoui, Jérôme Farinas
Pronunciation Error Detection for New Language Learners
Sean Robertson, Cosmin Munteanu, Gerald Penn
L2 English Rhythm in Read Speech by Chinese Students
Hongwei Ding, Xinping Xu
Improving the Probabilistic Framework for Representing Dialogue Systems with User Response Model
Miao Li, Zhipeng Chen, Ji Wu
Dialogue Session Segmentation by Embedding-Enhanced TextTiling
Yiping Song, Lili Mou, Rui Yan, Li Yi, Zinan Zhu, Xiaohua Hu, Ming Zhang
Target-Based State and Tracking Algorithm for Spoken Dialogue System
Miao Li, Zhiyang He, Ji Wu
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection
Sheng-syun Shen, Hung-Yi Lee
Objective Language Feature Analysis in Children with Neurodevelopmental Disorders During Autism Assessment
Manoj Kumar, Rahul Gupta, Daniel Bone, Nikolaos Malandrakis, Somer Bishop, Shrikanth S. Narayanan
Improving Generalisation to New Speakers in Spoken Dialogue State Tracking
Iñigo Casanueva, Thomas Hain, Phil Green
Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine
Bo-Hsiang Tseng, Sheng-syun Shen, Hung-Yi Lee, Lin-Shan Lee
How Neural Network Depth Compensates for HMM Conditional Independence Assumptions in DNN-HMM Acoustic Models
Suman Ravuri, Steven Wegmann
Jointly Learning to Locate and Classify Words Using Convolutional Networks
Dimitri Palaz, Gabriel Synnaeve, Ronan Collobert
On the Efficient Representation and Execution of Deep Acoustic Models
Raziel Alvarez, Rohit Prabhavalkar, Anton Bakhtin
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahremani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur
Virtual Adversarial Training Applied to Neural Higher-Order Factors for Phone Classification
Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf
Sequence Student-Teacher Training of Deep Neural Networks
Jeremy H.M. Wong, Mark J.F. Gales
Robustness in Speech, Speaker, and Language Recognition: “You’ve Got to Know Your Limitations”
John H.L. Hansen, Hynek Bořil
The Use of Read versus Conversational Lombard Speech in Spectral Tilt Modeling for Intelligibility Enhancement in Near-End Noise Conditions
Emma Jokinen, Ulpu Remes, Paavo Alku
Corpora for the Evaluation of Robust Speaker Recognition Systems
Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Joseph P. Campbell
A French Corpus for Distant-Microphone Speech Processing in Real Homes
Nancy Bertin, Ewen Camberlein, Emmanuel Vincent, Romain Lebarbenchon, Stéphane Peillon, Éric Lamande, Sunit Sivasankaran, Frédéric Bimbot, Irina Illina, Ariane Tom, Sylvain Fleury, Éric Jamet
Realistic Multi-Microphone Data Simulation for Distant Speech Recognition
Mirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo
Synthesis of Device-Independent Noise Corpora for Realistic ASR Evaluation
Hannes Gamper, Mark R.P. Thomas, Lyle Corbin, Ivan Tashev
Speaker Recognition Using Real vs Synthetic Parallel Data for DNN Channel Compensation
Fred Richardson, Michael Brandstein, Jennifer Melot, Douglas Reynolds
Discussion
Dayana Ribas, Emmanuel Vincent, John H.L. Hansen, Emma Jokinen, Mirco Ravanelli, Hannes Gamper, Fred Richardson
Combining Data-Oriented and Process-Oriented Approaches to Modeling Reaction Time Data
Louis ten Bosch, Lou Boves, M. Ernestus
Do Listeners Learn Better from Natural Speech?
Michael McAuliffe, Molly Babel, Charlotte Vaughn
Processing and Adaptation to Ambiguous Sounds during the Course of Perceptual Learning
Polina Drozdova, Roeland van Hout, Odette Scharenborg
The Effect of Background Noise on the Activation of Phonological and Semantic Information During Spoken-Word Recognition
Florian Hintz, Odette Scharenborg
Relationships Between Functional Load and Auditory Confusability Under Different Speech Environments
Shinae Kang, Clara Cohen
The Role of Pitch in Punjabi Word Identification
Jasmeen Kanwal, Amanda Ritchart
Improving TTS with Corpus-Specific Pronunciation Adaptation
Marie Tahon, Raheel Qader, Gwénolé Lecorvé, Damien Lolive
Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks for Grapheme-to-Phoneme Conversion Utilizing Complex Many-to-Many Alignments
Amr El-Desoky Mousa, Björn Schuller
Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks
Daan van Esch, Mason Chua, Kanishka Rao
Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis
Maël Pouget, Olha Nahorna, Thomas Hueber, Gérard Bailly
Redefining the Linguistic Context Feature Set for HMM and DNN TTS Through Position and Parsing
Rasmus Dall, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System
Xin Wang, Shinji Takaki, Junichi Yamagishi
Local Sparsity Based Online Dictionary Learning for Environment-Adaptive Speech Enhancement with Nonnegative Matrix Factorization
Kwang Myung Jeon, Hong Kook Kim
Noise Aware and Combined Noise Models for Speech Denoising in Unknown Noise Conditions
Pavlos Papadopoulos, Colin Vaz, Shrikanth S. Narayanan
Causal Speech Enhancement Combining Data-Driven Learning and Suppression Rule Estimation
Seyedmahdad Mirsamadi, Ivan Tashev
A Phase-Based Time-Frequency Masking for Multi-Channel Speech Enhancement in Domestic Environments
Alessio Brutti, Antigoni Tsiami, Athanasios Katsamanis, Petros Maragos
Generalizing Steady State Suppression for Enhanced Intelligibility Under Reverberation
Petko N. Petkov, Yannis Stylianou
Speech Intelligibility Prediction Based on the Envelope Power Spectrum Model with the Dynamic Compressive Gammachirp Auditory Filterbank
Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani
Prediction and Generation of Backchannel Form for Attentive Listening Systems
Tatsuya Kawahara, Takashi Yamaguchi, Koji Inoue, Katsuya Takanashi, Nigel Ward
Measuring Turn-Taking Offsets in Human-Human Dialogues
Rebecca Lunsford, Peter A. Heeman, Emma Rennie
Using Past Speaker Behavior to Better Predict Turn Transitions
Tomer Meshorer, Peter A. Heeman
Quantitative Analysis of Backchannels Uttered by an Interviewer During Neuropsychological Tests
Gérard Bailly, Frédéric Elisei, Alexandra Juphard, Olivier Moreaud
Predicting User Satisfaction from Turn-Taking in Spoken Conversations
Shammur Absar Chowdhury, Evgeny A. Stepanov, Giuseppe Riccardi
Towards Building an Attentive Artificial Listener: On the Perception of Attentiveness in Feedback Utterances
Catharine Oertel, Joakim Gustafson, Alan W. Black
Language Recognition via Sparse Coding
Youngjune L. Gwon, William M. Campbell, Douglas E. Sturim, H.T. Kung
A Feature Normalisation Technique for PLLR Based Language Identification Systems
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah
An Investigation of Deep Neural Network Architectures for Language Recognition in Indian Languages
Mounika K.V., Sivanand Achanta, Lakshmi H. R., Suryakanth V. Gangashetty, Anil Kumar Vuppala
Automatic Dialect Detection in Arabic Broadcast Speech
Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals
Combining Weak Tokenisers for Phonotactic Language Recognition in a Resource-Constrained Setting
Raymond W.M. Ng, Bhusan Chettri, Thomas Hain
End-to-End Language Identification Using Attention-Based Recurrent Neural Networks
Wang Geng, Wenfu Wang, Yuanyuan Zhao, Xinyuan Cai, Bo Xu
Enhancing Multilingual Recognition of Emotion in Speech by Language Identification
Hesam Sagha, Pavel Matějka, Maryna Gavryukova, Filip Povolny, Erik Marchi, Björn Schuller
Deep Neural Network Bottleneck Features for Acoustic Event Recognition
Seongkyu Mun, Suwon Shon, Wooil Kim, Hanseok Ko
Combining Energy and Cross-Entropy Analysis for Nuclear Segments Detection
Antonio Origlia, Francesco Cutugno
Anchored Speech Detection
Roland Maas, Sree Hari Krishnan Parthasarathi, Brian King, Ruitong Huang, Björn Hoffmeister
Towards Smart-Cars That Can Listen: Abnormal Acoustic Event Detection on the Road
Mahesh Kumar Nandwana, Taufiq Hasan
Hierarchical Classification of Speaker and Background Noise and Estimation of SNR Using Sparse Representation
K.V. Vijay Girish, A.G. Ramakrishnan, T.V. Ananthapadmanabha
Robust Sound Event Detection in Continuous Audio Environments
Haomin Zhang, Ian McLoughlin, Yan Song
Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition
Naoya Takahashi, Michael Gygli, Beat Pfister, Luc Van Gool
Artificial Neural Network-Based Feature Combination for Spatial Voice Activity Detection
Stefan Meier, Walter Kellermann
HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors
Tomi Kinnunen, Alexey Sholokhov, Elie Khoury, Dennis Alexander Lehmann Thomsen, Md. Sahidullah, Zheng-Hua Tan
Manual versus Automated: The Challenging Routine of Infant Vocalisation Segmentation in Home Videos to Study Neuro(mal)development
Florian B. Pokorny, Robert Peharz, Wolfgang Roth, Matthias Zöhrer, Franz Pernkopf, Peter B. Marschik, Björn Schuller
Minimizing Annotation Effort for Adaptation of Speech-Activity Detection Systems
Luciana Ferrer, Martin Graciarena
Progress and Prospects for Spoken Language Technology: What Ordinary People Think
Roger K. Moore, Hui Li, Shih-Hao Liao
Progress and Prospects for Spoken Language Technology: Results from Four Sexennial Surveys
Roger K. Moore, Ricard Marxer
On Employing a Highly Mismatched Crowd for Speech Transcription
Purushotam Radadia, Rahul Kumar, Kanika Kalra, Shirish Karande, Sachin Lodha
Sage: The New BBN Speech Processing Platform
Roger Hsiao, Ralf Meermeier, Tim Ng, Zhongqiang Huang, Maxwell Jordan, Enoch Kan, Tanel Alumäe, Jan Silovsky, William Hartmann, Francis Keith, Omer Lang, Manhung Siu, Owen Kimball
DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech Recognition
Kang Hyun Lee, Tae Gyoon Kang, Woo Hyun Kang, Nam Soo Kim
Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition
Michael Wand, Jürgen Schmidhuber
Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages
Basil Abraham, S. Umesh, Neethu Mariam Joy
Multi-Language Neural Network Language Models
Anton Ragni, Edgar Dakin, Xie Chen, Mark J.F. Gales, Kate M. Knill
Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration
Ottokar Tilk, Tanel Alumäe
TheanoLM — An Extensible Toolkit for Neural Network Language Modeling
Seppo Enarvi, Mikko Kurimo
Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems
P. Lanchantin, Mark J.F. Gales, Penny Karanasou, X. Liu, Y. Qian, L. Wang, P.C. Woodland, C. Zhang
Manipulating Word Lattices to Incorporate Human Corrections
Yashesh Gaur, Florian Metze, Jeffrey P. Bigham
Context-Aware Restaurant Recommendation for Natural Language Queries: A Formative User Study in the Automotive Domain
Philipp Fischer, Cornelius Styp von Rekowski, Andreas Nürnberger
Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application
Stephanie Pancoast, Murat Akbacak
Automatic Speech Transcription for Low-Resource Languages — The Case of Yoloxóchitl Mixtec (Mexico)
Vikramjit Mitra, Andreas Kathol, Jonathan D. Amith, Rey Castillo García
Real-Time Presentation Tracking Using Semantic Keyword Spotting
Reza Asadi, Harriet J. Fell, Timothy Bickmore, Ha Trinh
Deriving Phonetic Transcriptions and Discovering Word Segmentations for Speech-to-Speech Translation in Low-Resource Settings
Andrew Wilkinson, Tiancheng Zhao, Alan W. Black
Unsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech Recognition
Satoshi Tsujioka, Sakriani Sakti, Koichiro Yoshino, Graham Neubig, Satoshi Nakamura
Learning Personalized Pronunciations for Contact Name Recognition
Antoine Bruguier, Fuchun Peng, Françoise Beaufays
Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy
Zhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal, Felix I. Wyss
Optimizing Speech Recognition Evaluation Using Stratified Sampling
Janne Pylkkönen, Thomas Drugman, Max Bisani
Speech Ventures
Nicolas Scheffer, Korbinian Riedhammer, Alexandre Lebrun, David Suendermann-Oeft
Context Aware Mispronunciation Detection for Mandarin Pronunciation Training
Rong Tong, Nancy F. Chen, Bin Ma, Haizhou Li
DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring
Jidong Tao, Lei Chen, Chong Min Lee
Self-Adaptive DNN for Improving Spoken Language Proficiency Assessment
Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft
Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees
Wei Li, Kehuang Li, Sabato Marco Siniscalchi, Nancy F. Chen, Chin-Hui Lee
Phoneme Set Design Considering Integrated Acoustic and Linguistic Features of Second Language Speech
Xiaoyun Wang, Tsuneo Kato, Seiichi Yamamoto
HMM-Based Non-Native Accent Assessment Using Posterior Features
Ramya Rasipuram, Milos Cernak, Mathew Magimai-Doss
Automatic Assessment and Error Detection of Shadowing Speech: Case of English Spoken by Japanese Learners
Shuju Shi, Yosuke Kashiwagi, Shohei Toyama, Junwei Yue, Yutaka Yamauchi, Daisuke Saito, Nobuaki Minematsu
Multiplicity of the Acoustic Correlates of the Fortis-Lenis Contrast: Plosives in Aberystwyth English
Míša Hejná
Automatic Measurement of Voice Onset Time and Prevoicing Using Recurrent Neural Networks
Yossi Adi, Joseph Keshet, Olga Dmitrieva, Matt Goldrick
L1-L2 Interference: The Case of Final Devoicing of French Voiced Fricatives in Final Position by German Learners
Sucheta Ghosh, Camille Fauth, Aghilas Sini, Yves Laprie
Perceptual Salience of Voice Source Parameters in Signaling Focal Prominence
Irena Yanushevskaya, Andy Murphy, Christer Gobl, Ailbhe Ní Chasaide
Classification of Voice Modality Using Electroglottogram Waveforms
Michal Borsky, Daryush D. Mehta, Julius P. Gudjohnsen, Jon Gudnason
Voice-Quality Difference Between the Vowels in Filled Pauses and Ordinary Lexical Items
Kikuo Maekawa, Hiroki Mori
Generation of Emotion Control Vector Using MDS-Based Space Transformation for Expressive Speech Synthesis
Yan-You Chen, Chung-Hsien Wu, Yu-Fong Huang
Direct Expressive Voice Training Based on Semantic Selection
Igor Jauk, Antonio Bonafonte
Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis
Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi
Pause Prediction from Text for Speech Synthesis with User-Definable Pause Insertion Likelihood Threshold
Norbert Braunschweiler, Ranniery Maia
A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training
Quoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach
Yibin Zheng, Ya Li, Zhengqi Wen, Xingguang Ding, Jianhua Tao
Results of The 2015 NIST Language Recognition Evaluation
Hui Zhao, Désiré Bansé, George Doddington, Craig Greenberg, Jaime Hernández-Cordero, John Howard, Lisa Mason, Alvin Martin, Douglas Reynolds, Elliot Singer, Audrey Tong
The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS
Kong Aik Lee, Haizhou Li, Li Deng, Ville Hautamäki, Wei Rao, Xiong Xiao, Anthony Larcher, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Jianshu Chen, Ivan Kukanov, Amir Hossein Poorjam, Trung Ngo Trong, Cheng-Lin Xu, Haihua Xu, Bin Ma, Eng Siong Chng, Sylvain Meignier
Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
Non-Iterative Parameter Estimation for Total Variability Model Using Randomized Singular Value Decomposition
Ruchir Travadi, Shrikanth S. Narayanan
Stacked Long-Term TDNN for Spoken Language Recognition
Daniel Garcia-Romero, Alan McCree
A Divide-and-Conquer Approach for Language Identification Based on Recurrent Neural Networks
G. Gelly, Jean-Luc Gauvain, V.B. Le, A. Messaoudi
Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs
Chiori Hori, Takaaki Hori, Shinji Watanabe, John R. Hershey
A Step Beyond Local Observations with a Dialog Aware Bidirectional GRU Network for Spoken Language Understanding
Vedran Vukotić, Christian Raymond, Guillaume Gravier
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding
Yun-Nung Chen, Dilek Hakkani-Tür, Gokhan Tur, Jianfeng Gao, Li Deng
Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding
Ngoc Thang Vu
A New Pre-Training Method for Training Deep Learning Models with Application to Spoken Language Understanding
Asli Celikyilmaz, Ruhi Sarikaya, Dilek Hakkani-Tür, Xiaohu Liu, Nikhil Ramesh, Gokhan Tur
Joint Syntactic and Semantic Analysis with a Multitask Deep Learning Framework for Spoken Language Understanding
Jeremie Tafforeau, Frederic Bechet, Thierry Artiere, Benoit Favre
Exploiting Hidden-Layer Responses of Deep Neural Networks for Language Recognition
Ruizhi Li, Sri Harish Mallidi, Lukáš Burget, Oldřich Plchot, Najim Dehak
Out of Set Language Modelling in Hierarchical Language Identification
Saad Irtza, Vidhyasaharan Sethu, Sarith Fernando, Eliathamby Ambikairajah, Haizhou Li
Language Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs
Ryo Masumura, Taichi Asami, Hirokazu Masataki, Yushi Aono, Sumitaka Sakauchi
Gating Recurrent Enhanced Memory Neural Networks on Language Identification
Wang Geng, Yuanyuan Zhao, Wenfu Wang, Xinyuan Cai, Bo Xu
Sequence Summarizing Neural Networks for Spoken Language Recognition
Jan Pešán, Lukáš Burget, Jan Černocký
The Role of Spectral Resolution in Foreign-Accented Speech Perception
Michelle R. Kapolowicz, Vahid Montazeri, Peter F. Assmann
THU-EE System Description for NIST LRE 2015
Liang He, Yao Tian, Yi Liu, Jiaming Xu, Weiwei Liu, Cai Meng, Jia Liu
Variation in Spoken North Sami Language
Kristiina Jokinen, Trung Ngo Trong, Ville Hautamäki
Improved Music Genre Classification with Convolutional Neural Networks
Weibin Zhang, Wenkang Lei, Xiangmin Xu, Xiaofeng Xing
Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals
Gurunath Reddy M., K. Sreenivasa Rao
Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation
Jitong Chen, DeLiang Wang
Phonotactic Language Identification for Singing
Anna M. Kruspe
Comparing the Influence of Spectro-Temporal Integration in Computational Speech Segregation
Thomas Bentsen, Tobias May, Abigail A. Kressner, Torsten Dau
Blind Speech Separation with GCC-NMF
Sean U.N. Wood, Jean Rouat
Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking
Vahid Montazeri, Shaikat Hossain, Peter F. Assmann
Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks
Emad M. Grais, Gerard Roma, Andrew J.R. Simpson, Mark D. Plumbley
Monaural Source Separation Using a Random Forest Classifier
Cosimo Riday, Saurabh Bhargava, Richard H.R. Hahnloser, Shih-Chii Liu
Adaptive Group Sparsity for Non-Negative Matrix Factorization with Application to Unsupervised Source Separation
Xu Li, Ziteng Wang, Xiaofei Wang, Qiang Fu, Yonghong Yan
A Robust Dual-Microphone Speech Source Localization Algorithm for Reverberant Environments
Yanmeng Guo, Xiaofei Wang, Chao Wu, Qiang Fu, Ning Ma, Guy J. Brown
Speech Localisation in a Multitalker Mixture by Humans and Machines
Ning Ma, Guy J. Brown
Reverberation-Robust One-Bit TDOA Based Moving Source Localization for Automatic Camera Steering
Harshavardhan Sundar, Gokul Deepak Manavalan, T.V. Sreenivas, Chandra Sekhar Seelamantula
Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage
Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino
Phase-Aware Signal Processing for Automatic Speech Recognition
Johannes Fahringer, Tobias Schrank, Johannes Stahl, Pejman Mowlaee, Franz Pernkopf
Unsupervised Deep Auditory Model Using Stack of Convolutional RBMs for Speech Recognition
Hardik B. Sailor, Hemant A. Patil
Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and Production
Philip Weber, Linxue Bai, Martin Russell, Peter Jančovič, Stephen Houghton
Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition
Shiliang Zhang, Hui Jiang, Shifu Xiong, Si Wei, Li-Rong Dai
Future Context Attention for Unidirectional LSTM Based Acoustic Model
Jian Tang, Shiliang Zhang, Si Wei, Li-Rong Dai
Hybrid Accelerated Optimization for Speech Recognition
Jen-Tzung Chien, Pei-Wen Huang, Tan Lee
On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin Training
William Chan, Ian Lane
GMM-Free Flat Start Sequence-Discriminative DNN Training
Gábor Gosztolya, Tamás Grósz, László Tóth
Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
Yajie Miao, Florian Metze
Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling
Yuanyuan Zhao, Shuang Xu, Bo Xu
Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models
Albert Zeyer, Ralf Schlüter, Hermann Ney
Advances in Very Deep Convolutional Neural Networks for LVCSR
Tom Sercu, Vaibhava Goel
Acoustic Modelling from the Signal Domain Using CNNs
Pegah Ghahremani, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur
Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition
Yevgen Chebotar, Austin Waters
Triphone State-Tying via Deep Canonical Correlation Analysis
Weiran Wang, Hao Tang, Karen Livescu
Low-Rank Representation of Nearest Neighbor Posterior Probabilities to Enhance DNN Based Acoustic Modeling
Gil Luyet, Pranay Dighe, Afsaneh Asaei, Hervé Bourlard
Improving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-Vectors
Hao Zheng, Shanshan Zhang, Liwei Qiao, Jianping Li, Wenju Liu
Pitch-Adaptive Front-End Features for Robust Children’s ASR
S. Shahnawazuddin, Abhishek Dey, Rohit Sinha
ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks
Miguel Ángel del-Agua, Santiago Piqueras, Adrià Giménez, Alberto Sanchis, Jorge Civera, Alfons Juan
Automatic Correction of ASR Outputs by Using Machine Translation
Luis Fernando D’Haro, Rafael E. Banchs
A Framework for Practical Multistream ASR
Sri Harish Mallidi, Hynek Hermansky
DNNs for Unsupervised Extraction of Pseudo FMLLR Features Without Explicit Adaptation Data
Neethu Mariam Joy, Murali Karthick Baskar, S. Umesh, Basil Abraham
Multi-Attribute Factorized Hidden Layer Adaptation for DNN Acoustic Models
Lahiru Samarakoon, Khe Chai Sim
Speaker Normalization Through Feature Shifting of Linearly Transformed i-Vector
Jahyun Goo, Younggwan Kim, Hyungjun Lim, Hoirin Kim
Computational Approaches to Linguistic Code Switching
Mona Diab, Pascale Fung, Julia Hirschberg, Thamar Solorio
Compositional Neural Network Language Models for Agglutinative Languages
Ebru Arisoy, Murat Saraclar
NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition
Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier
Recurrent Neural Network Language Model with Incremental Updated Context Information Generated Using Bag-of-Words Representation
Md. Akmal Haidar, Mikko Kurimo
Sequential Recurrent Neural Networks for Language Modeling
Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow
Word-Phrase-Entity Recurrent Neural Networks for Language Modeling
Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang
LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition
Kazuki Irie, Zoltán Tüske, Tamer Alkhouli, Ralf Schlüter, Hermann Ney
Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka
Amit Das, Preethi Jyothi, Mark Hasegawa-Johnson
Speed Perturbation and Vowel Duration Modeling for ASR in Hausa and Wolof Languages
Elodie Gauthier, Laurent Besacier, Sylvie Voisin
Improving the Lwazi ASR Baseline
Charl van Heerden, Neil Kleynhans, Marelie Davel
Preliminary Experiments on Unsupervised Word Discovery in Mboshi
Pierre Godard, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Laurent Besacier, Hélène Bonneau-Maynard, Guy-Noël Kouarata, Kevin Löser, Annie Rialland, François Yvon
Unsupervised Phoneme Segmentation of Previously Unseen Languages
Marco Vetter, Markus Müller, Fatima Hamlaoui, Graham Neubig, Satoshi Nakamura, Sebastian Stüker, Alex Waibel
CNN-Based Phone Segmentation Experiments in a Less-Represented Language
Céline Manenti, Thomas Pellegrini, Julien Pinquier
Part-of-Speech Tagging and Chunking in Text-to-Speech Synthesis for South African Languages
Georg I. Schlünz, Nkosikhona Dlamini, Rynhardt P. Kruger
The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken Zulu
Ewald van der Westhuizen, Thomas Niesler
A New Model of Speech Motor Control Based on Task Dynamics and State Feedback
Vikram Ramanarayanan, Benjamin Parrell, Louis Goldstein, Srikantan Nagarajan, John Houde
Using a Biomechanical Model and Articulatory Data for the Numerical Production of Vowels
Saeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch, Ian Stavness, Pierre Badin
A New Model for Acoustic Wave Propagation and Scattering in the Vocal Tract
Jianguo Wei, Wendan Guan, Darcy Q. Hou, Dingyi Pan, Wenhuan Lu, Jianwu Dang
Uncontrolled Manifolds in Vowel Production: Assessment with a Biomechanical Model of the Tongue
Andrew Szabados, Pascal Perrier
Experimental Validation of Sound Generated from Flow in Simplified Vocal Tract Model of Sibilant /s/
Tsukasa Yoshinaga, Kazunori Nozaki, Shigeo Wada
Bayesian Modeling in Speech Motor Control: A Principled Structure for the Integration of Various Constraints
Jean-François Patri, Pascal Perrier, Julien Diard
Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks
Zixing Zhang, Fabien Ringeval, Jing Han, Jun Deng, Erik Marchi, Björn Schuller
Defining Emotionally Salient Regions Using Qualitative Agreement Method
Srinivas Parthasarathy, Carlos Busso
Representation Learning for Speech Emotion Recognition
Sayan Ghosh, Eugene Laksana, Louis-Philippe Morency, Stefan Scherer
Multilingual Speech Emotion Recognition System Based on a Three-Layer Model
Xingfeng Li, Masato Akagi
Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention Features
Ozlem Kalinli
On the Correlation and Transferability of Features Between Automatic Speech Recognition and Speech Emotion Recognition
Haytham M. Fayek, Margaret Lech, Lawrence Cavedon
On the Influence of Text Content on Pass-Phrase Strength for Short-Duration Text-Dependent Automatic Speaker Authentication
Giacomo Valenti, Adrien Daniel, Nicholas Evans
Articulation Rate Filtering of CQCC Features for Automatic Speaker Verification
Massimiliano Todisco, Héctor Delgado, Nicholas Evans
The IBM Speaker Recognition System: Recent Advances and Error Analysis
Seyed Omid Sadjadi, Jason W. Pelecanos, Sriram Ganapathy
Probabilistic Approach Using Joint Clean and Noisy i-Vectors Modeling for Speaker Recognition
Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre
Generalized Discriminant Analysis (GDA) for Improved i-Vector Based Speaker Recognition
Fahimeh Bahmaninezhad, John H.L. Hansen
Noise and Metadata Sensitive Bottleneck Features for Improving Speaker Recognition with Non-Native Speech Input
Yao Qian, Jidong Tao, David Suendermann-Oeft, Keelan Evanini, Alexei V. Ivanov, Vikram Ramanarayanan
Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks
Huy Phan, Lars Hertel, Marco Maass, Alfred Mertins
Audio-Based Distributional Representations of Meaning Using a Fusion of Feature Encodings
Giannis Karamanolakis, Elias Iosif, Athanasia Zlatintsi, Aggelos Pikrakis, Alexandros Potamianos
Robust DNN-Based VAD Augmented with Phone Entropy Based Rejection of Background Speech
Yuya Fujita, Ken-ichi Iso
Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection
Ruben Zazo, Tara N. Sainath, Gabor Simko, Carolina Parada
The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation
Martin Graciarena, Luciana Ferrer, Vikramjit Mitra
Model Adaptation and Active Learning in the BBN Speech Activity Detection System for the DARPA RATS Program
Damianos Karakos, Scott Novotney, Le Zhang, Richard Schwartz
Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech
Vikramjit Mitra, Julien VanHout, Wen Wang, Chris Bartels, Horacio Franco, Dimitra Vergyri, Abeer Alwan, Adam Janin, John H.L. Hansen, Richard M. Stern, Abhijeet Sangwan, Nelson Morgan
Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems’ Outputs for Spoken Term Detection
Naoki Sawada, Hiromitsu Nishizaki
Enhancing Data-Driven Phone Confusions Using Restricted Recognition
Mark Kane, Julie Carson-Berndsen
Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search
Chongjia Ni, Lei Wang, Cheung-Chi Leung, Feng Rao, Li Lu, Bin Ma, Haizhou Li
Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis
Cheung-Chi Leung, Lei Wang, Haihua Xu, Jingyong Hou, Van Tung Pham, Hang Lv, Lei Xie, Xiong Xiao, Chongjia Ni, Bin Ma, Eng Siong Chng, Haizhou Li
Novel Subband Autoencoder Features for Non-Intrusive Quality Assessment of Noise Suppressed Speech
Meet H. Soni, Hemant A. Patil
SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
Tian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee
A Novel Risk-Estimation-Theoretic Framework for Speech Enhancement in Nonstationary and Non-Gaussian Noise Conditions
Jishnu Sadasivan, Chandra Sekhar Seelamantula
Two-Stage Temporal Processing for Single-Channel Speech Enhancement
Suman Samui, Indrajit Chakrabarti, Soumya Kanti Ghosh
A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach
Nazreen P.M., A.G. Ramakrishnan, Prasanta Kumar Ghosh
Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement
Atsunori Ogawa, Shogo Seki, Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Kazuya Takeda
Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks
Anurag Kumar, Dinei Florencio
Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement
Prashanth Gurunath Shivakumar, Panayiotis Georgiou
HMM-Based Speech Enhancement Using Sub-Word Models and Noise Adaptation
Akihiro Kato, Ben Milner
Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech
Li Li, Hirokazu Kameoka, Takuya Higuchi, Hiroshi Saruwatari
A priori SNR Estimation Using a Generalized Decision Directed Approach
Aleksej Chinaev, Reinhold Haeb-Umbach
A DNN-HMM Approach to Non-Negative Matrix Factorization Based Speech Enhancement
Ziteng Wang, Xu Li, Xiaofei Wang, Qiang Fu, Yonghong Yan
SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement
Szu-Wei Fu, Yu Tsao, Xugang Lu
An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement
Kehuang Li, Bo Wu, Chin-Hui Lee
A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-Based Sparse Representation
Bin Liu, Jianhua Tao
Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech Recognition
Vikramjit Mitra, Horacio Franco
On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models
Natalia Tomashenko, Yuri Khokhlov, Yannick Estève
Analytical Assessment of Dual-Stream Merging for Noise-Robust ASR
Louis ten Bosch, Bert Cranen, Yang Sun
Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition
Erfan Loweimi, Jon Barker, Thomas Hain
Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-Target Learning for Noisy Speech Recognition
Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion
Takuya Higuchi, Takuya Yoshioka, Tomohiro Nakatani
Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions
Dung T. Tran, Marc Delroix, Atsunori Ogawa, Tomohiro Nakatani
Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic Modeling
Yusuke Fujita, Ryoich Takashima, Takeshi Homma, Masahito Togami
Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition
Animesh Prasad, Khe Chai Sim
An Investigation on the Use of i-Vectors for Robust ASR
Dimitrios Dimitriadis, Samuel Thomas, Sriram Ganapathy
The Sheffield Wargame Corpus — Day Two and Day Three
Yulan Liu, Charles Fox, Madina Hasan, Thomas Hain
Recurrent Models for Auditory Attention in Multi-Microphone Distant Speech Recognition
Suyoun Kim, Ian Lane
Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks
Wonkyum Lee, Kyu J. Han, Ian Lane
Semi-Supervised Training in Deep Learning Acoustic Model
Yan Huang, Yongqiang Wang, Yifan Gong
Multilingual Data Selection for Low Resource Speech Recognition
Samuel Thomas, Kartik Audhkhasi, Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran
An Investigation on Training Deep Neural Networks Using Probabilistic Transcriptions
Amit Das, Mark Hasegawa-Johnson
Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages
Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson
ASR for South Slavic Languages Developed in Almost Automated Way
Jan Nouza, Radek Safarik, Petr Cerva
Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery
Marzieh Razavi, Mathew Magimai-Doss
Language Adaptive DNNs for Improved Low Resource Speech Recognition
Markus Müller, Sebastian Stüker, Alex Waibel
Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource Languages
Tanel Alumäe, Stavros Tsakalidis, Richard Schwartz
Article |
---|