doi: 10.21437/Interspeech.2014
ISSN: 2958-1796
Learning about speech
Anne Cutler
Decision learning in data science: where John Nash meets social media
K. J. Ray Liu
Language diversity: speech processing in a multi-lingual context
Lori Lamel
Sound patterns in language
William S.-Y. Wang
Achievements and challenges of deep learning — from speech analysis and recognition to language and multimodal processing
Li Deng
Language ID-based training of multilingual stacked bottleneck features
Yu Zhang, Ekapol Chuangsuwanich, James R. Glass
Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR
Van Hai Do, Xiong Xiao, Eng Siong Chng, Haizhou Li
Improving ASR performance on non-native speech using multilingual and crosslingual information
Ngoc Thang Vu, Yuanfan Wang, Marten Klose, Zlatka Mihaylova, Tanja Schultz
Language independent and unsupervised acoustic models for speech recognition and keyword spotting
Kate M. Knill, Mark J. F. Gales, Anton Ragni, Shakti P. Rath
Cross-lingual adaptation with multi-task adaptive networks
Peter Bell, Joris Driesen, Steve Renals
On recognition of non-native speech using probabilistic lexical model
Marzieh Razavi, Mathew Magimai Doss
Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation
Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
A target approximation intonation model for yorùbá TTS
Daniel R. van Niekerk, Etienne Barnard
Learning continuous-valued word representations for phrase break prediction
Anandaswarup Vadapalli, Kishore Prahallad
Improving Mandarin prosodic boundary prediction with rich syntactic features
Hao Che, Jianhua Tao, Ya Li
Investigating automatic & human filled pause insertion for speech synthesis
Rasmus Dall, Marcus Tomalin, Mirjam Wester, William Byrne, Simon King
The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech
Rasmus Dall, Mirjam Wester, Martin Corley
Introducing i-vectors for joint anti-spoofing and speaker verification
Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sébastien Marcel
Random projections for large-scale speaker search
Ryan Leary, Walter Andrews
Analysis of i-vector framework for speaker identification in TV-shows
Corinne Fredouille, Delphine Charlet
Boosting bonsai trees for efficient features combination: application to speaker role identification
Antoine Laurent, Nathalie Camelin, Christian Raymond
Identifying contributors in the BBC world service archive
Yves Raimond, Thomas Nixon
Effect of long-term ageing on i-vector speaker verification
Finnian Kelly, Rahim Saeidi, Naomi Harte, David A. van Leeuwen
Acoustic correlates of phonological status
Maarten Versteegh, Amanda Seidl, Alejandrina Cristia
Parameterization of the glottal source with the phase plane plot
Manu Airaksinen, Paavo Alku
Transcribing tone — a likelihood-based quantitative evaluation of chao's tone letters
Phil Rose
Intonational phonology and prosodic hierarchy in malay
Diyana Hamzah, James Sneed German
Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian
Uwe D. Reichel, Katalin Mády
An evaluation of machine learning methods for prominence detection in French
George Christodoulides, Mathieu Avanzi
Investigating the effect of F0 and vocal intensity on harmonic magnitudes: data from high-speed laryngeal videoendoscopy
Gang Chen, Soo Jin Park, Jody Kreiman, Abeer Alwan
Adapting prosodic chunking algorithm and synthesis system to specific style: the case of dictation
Elisabeth Delais-Roussarie, Damien Lolive, Hiyon Yoo, Nelly Barbot, Olivier Rosec
The articulation of lexical and post-lexical palatalization in Korean
Jae-Hyun Sung
Articulation and neutralization: a preliminary study of lenition in scottish gaelic
Diana Archangeli, Samuel Johnston, Jae-Hyun Sung, Muriel Fisher, Michael Hammond, Andrew Carnie
Nasality in speech and its contribution to speaker individuality
Kanae Amino, Hisanori Makinae, Tatsuya Kitamura
Is speech rhythm an intrinsic property of language?
Jason Brown, Eden Matene
Where /ar/ the /r/s in standard austrian German?
Anke Jackschina, Barbara Schuppler, Rudolf Muhr
Diphthongized vowels in the yi county hui Chinese dialect
Fang Hu, Minghui Zhang
Rhythmic variability between some asian languages: results from an automatic analysis of temporal characteristics
Volker Dellwo, Peggy Mok, Mathias Jenny
Listener estimation of speaker age based on whispered speech
Angelika Braun, Daniela Decker
The Lombard effect with Thai lexical tones: an acoustic analysis of articulatory modifications in noise
Benjawan Kasisopa, Virginie Attina, Denis Burnham
Learning situated knowledge bases through dialog
Aasish Pappu, Alexander I. Rudnicky
Crowdsourcing for situated dialog systems in a moving car
Teruhisa Misu
Evaluating coherence in open domain conversational systems
Ryuichiro Higashinaka, Toyomi Meguro, Kenji Imamura, Hiroaki Sugiyama, Toshiro Makino, Yoshihiro Matsuo
Adapting dependency parsing to spontaneous speech for open domain spoken language understanding
Frederic Bechet, Alexis Nasr, Benoit Favre
Incremental on-line adaptation of POMDP-based dialogue managers to extended domains
M. Gašić, Dongho Kim, Pirros Tsiakoulis, Catherine Breslin, Matthew Henderson, M. Szummer, B. Thomson, Steve Young
Hypotheses ranking for robust domain classification and tracking in dialogue systems
Jean-Philippe Robichaud, Paul A. Crook, Puyang Xu, Omar Zia Khan, Ruhi Sarikaya
Motor control primitives arising from a learned dynamical systems model of speech articulation
Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan
Nonword repetition of taiwanese disyllabic tonal sequences in adults with language attrition
Chia-Hsin Yeh, Chiung-Yao Wang, Jung-Yueh Tu
A unified account of prominence effects in an optimization-based model of speech timing
Andreas Windmann, Juraj Šimko, Petra Wagner
Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints
Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan
Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: benefit to speech recognition
Prasad Sudhakar, Prasanta Kumar Ghosh
Contribution of tongue lateral to consonant production
Jun Wang, William Katz, Thomas F. Campbell
A preliminary study on acoustic correlates of tone2+tone2 disyllabic word stress in Mandarin
Min Liu, Shuju Shi, Jinsong Zhang
Vowel length impact on locus equation parameters: an investigation on jordanian Arabic
Mohammad Abuoudeh, Olivier Crouzet
Corpus-testing a fricative discriminator; or, just how invariant is this invariant?
Philip J. Roberts, Henning Reetz, Aditi Lahiri
Modeling coarticulation in continuous speech
Brian O. Bush, Alexander Kain
On classification between normal and pathological voices using the MEEI-kayPENTAX database: issues and consequences
Khalid Daoudi, Blaise Bertrac
Synchronic variation in the articulation and the acoustics of the Polish three-way place distinction in sibilants and its implications for diachronic change
Véronique Bukmaier, Jonathan Harrington, Ulrich Reubold, Felicitas Kleber
Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter
Rahul Gupta, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan
Modeling therapist empathy through prosody in drug addiction counseling
Bo Xiao, Daniel Bone, Maarten Van Segbroeck, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan
An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model
Daniel Bone, Chi-Chun Lee, Alexandros Potamianos, Shrikanth S. Narayanan
Speech emotion recognition using deep neural network and extreme learning machine
Kun Han, Dong Yu, Ivan Tashev
An annotation scheme for sighs in spontaneous dialogue
Khiet P. Truong, Gerben J. Westerhof, Franciska de Jong, Dirk Heylen
Speaker idiosyncratic variability of intensity across syllables
Lei He, Volker Dellwo
Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora
Soroosh Mariooryad, Reza Lotfian, Carlos Busso
Identification of age-group from children's speech by computers and humans
Saeid Safavi, Martin Russell, Peter Jančovič
Theme identification in human-human conversations with features from specific speaker type hidden spaces
Mohamed Morchid, Richard Dufour, Mohamed Bouallegue, Georges Linarès, Renato De Mori
Learning phrase patterns for text classification using a knowledge graph and unlabeled data
Alex Marin, Roman Holenstein, Ruhi Sarikaya, Mari Ostendorf
Targeted feature dropout for robust slot filling in natural language understanding
Puyang Xu, Ruhi Sarikaya
Spoken question answering using tree-structured conditional random fields and two-layer random walk
Sz-Rung Shiang, Hung-yi Lee, Lin-shan Lee
Shrinkage based features for slot tagging with conditional random fields
Ruhi Sarikaya, Asli Celikyilmaz, Anoop Deoras, Minwoo Jeong
Cluster based Chinese abbreviation modeling
Yangyang Shi, Yi-Cheng Pan, Mei-Yuh Hwang
Parsing named entity as syntactic structure
Xiantao Zhang, Dongchen Li, Xihong Wu
Detecting out-of-domain utterances addressed to a virtual personal assistant
Gokhan Tur, Anoop Deoras, Dilek Hakkani-Tür
Fusion of knowledge-based and data-driven approaches to grammar induction
Spiros Georgiladakis, Christina Unger, Elias Iosif, Sebastian Walter, Philipp Cimiano, Euripides Petrakis, Alexandros Potamianos
Improving named entity recognition with prosodic features
Denys Katerenchuk, Andrew Rosenberg
Neural network models for lexical addressee detection
Suman V. Ravuri, Andreas Stolcke
Manipulating stance and involvement using collaborative tasks: an exploratory comparison
Valerie Freeman, Julian Chan, Gina-Anne Levow, Richard Wright, Mari Ostendorf, Victoria Zayats
Incremental dialog processing in a task-oriented dialog
Fabrizio Ghigi, Maxine Eskenazi, M. Ines Torres, Sungjin Lee
Detecting incorrectly-segmented utterances for posteriori restoration of turn-taking and ASR results
Naoki Hotta, Kazunori Komatani, Satoshi Sato, Mikio Nakano
Segmentation and disfluency removal for conversational speech translation
Hany Hassan, Lee Schwartz, Dilek Hakkani-Tür, Gokhan Tur
Cost-level integration of statistical and rule-based dialog managers
Shinji Watanabe, John R. Hershey, Tim K. Marks, Youichi Fujii, Yusuke Koji
Inverse reinforcement learning for micro-turn management
Dongho Kim, Catherine Breslin, Pirros Tsiakoulis, M. Gašić, Matthew Henderson, Steve Young
Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactions
John Kane, Irena Yanushevskaya, Céline de Looze, Brian Vaughan, Ailbhe Ní Chasaide
Long short-term memory recurrent neural network architectures for large scale acoustic modeling
Haşim Sak, Andrew Senior, Françoise Beaufays
Unfolded recurrent neural networks for speech recognition
George Saon, Hagen Soltau, Ahmad Emami, Michael Picheny
Manifold regularized deep neural networks
Vikrant Singh Tomar, Richard C. Rose
Modeling long temporal contexts for robust DNN-based speech recognition
Bo Li, Khe Chai Sim
A long, deep and wide artificial neural net for robust speech recognition in unknown noise
Feipeng Li, Phani S. Nidadavolu, Hynek Hermansky
Investigation of deep neural networks for robust recognition of nonlinearly distorted speech
Ladislav Seps, Jiri Malek, Petr Cerva, Jan Nouza
Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge
Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Alvin F. Martin, Alan McCree, Mark Przybocki, Douglas A. Reynolds
Constrained speaker linking
David A. van Leeuwen, Niko Brümmer
RBM-PLDA subsystem for the NIST i-vector challenge
Sergey Novoselov, Timur Pekhovsky, Konstantin Simonchik, Andrey Shulipa
Limited labels for unlimited data: active learning for speaker recognition
Stephen H. Shum, Najim Dehak, James R. Glass
Bayesian calibration for forensic evidence reporting
Niko Brümmer, Albert Swart
Replicate mismatch between test/background and development databases: the impact on the performance of likelihood ratio-based forensic voice comparison
Shunichi Ishihara
Automatic estimation of the lip radiation effect in glottal inverse filtering
Manu Airaksinen, Tom Bäckström, Paavo Alku
Simulation of 3d larynges with asymmetric distribution of viscoelastic properties in their vocal folds
Marcelo de Oliveira Rosa
Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods
Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura
A study of invariant properties and variation patterns in the converter/distributor model for emotional speech
Jangwon Kim, Donna Erickson, Sungbok Lee, Shrikanth S. Narayanan
A hybrid approach to 3d tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation
Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer
Estimation of vocal-tract shape from speech spectrum and speech resynthesis based on a generative model
Tokihiko Kaburagi
A real-time MRI study of articulatory setting in second language speech
Andrés Benítez, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan
Retroflex and bunched English /r/ with physical models of the human vocal tract
Takayuki Arai
Parameterization of articulatory pattern in speakers with ALS
Panying Rong, Yana Yunusova, James D. Berry, Lorne Zinman, Jordan R. Green
Missing samples estimation in electromagnetic articulography data using equality constrained kalman smoother
Sujith P, Prasanta Kumar Ghosh
Palate-referenced articulatory features for acoustic-to-articulator inversion
An Ji, Michael T. Johnson, Jeff Berry
A study on the improvement of measurement accuracy of the three-dimensional electromagnetic articulography
Hidetsugu Uchida, Kohei Wakamiya, Tokihiko Kaburagi
The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load
Björn Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, Yue Zhang
Filtering and subspace selection for spectral features in detecting speech under physical stress
Jouni Pohjalainen, Paavo Alku
Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens
Ming Li
Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction
Heysem Kaya, Tuğçe Özkaptan, Albert Ali Salah, Sadık Fikret Gürgen
Ensemble of machine learning algorithms for cognitive and physical speaker load detection
How Jing, Ting-Yao Hu, Hung-Shin Lee, Wei-Chen Chen, Chi-Chun Lee, Yu Tsao, Hsin-Min Wang
Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks
Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth
High-level speech event analysis for cognitive load classification
Claude Montacié, Marie-José Caraty
On the use of Bhattacharyya based GMM distance and neural net features for identification of cognitive load levels
Tin Lay Nwe, Trung Hieu Nguyen, Bin Ma
Prediction of cognitive load from speech with the VOQAL voice quality toolbox for the interspeech 2014 computational paralinguistics challenge
Mark Huckvale
The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge
Jia Min Karen Kua, Vidhyasaharan Sethu, Phu Le, Eliathamby Ambikairajah
Classification of cognitive load from speech using an i-vector framework
Maarten Van Segbroeck, Ruchir Travadi, Colin Vaz, Jangwon Kim, Matthew P. Black, Alexandros Potamianos, Shrikanth S. Narayanan
Revisiting the right-ear advantage for speech: implications for speech displays
Nandini Iyer, Eric Thompson, Brian Simpson, Griffin Romigh
Comparing reaction time sequences from human participants and computational models
L. ten Bosch, Miriam Ernestus, Lou Boves
Detecting the number of competing speakers — human selective hearing versus spectrogram distance based estimator
Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu
The influence of sensory memory and attention on the context effect in talker normalization
Guo Li, Gang Peng
Automatic speech recognition with primarily temporal envelope information
Payton Lin, Fei Chen, Syu Siang Wang, Ying-Hui Lai, Yu Tsao
An adaptive envelope compression strategy for speech processing in cochlear implants
Ying-Hui Lai, Fei Chen, Yu Tsao
Articulatory dynamics and coordination in classifying cognitive change with preclinical mTBI
Brian S. Helfer, Thomas F. Quatieri, James R. Williamson, Laurel Keyes, Benjamin Evans, W. Nicholas Greene, Trina Vian, Joseph Lacirignola, Trey Shenk, Thomas Talavage, Jeff Palmer, Kristin Heaton
A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics
Nozomi Jinbo, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
Investigation of the relative perceptual importance of temporal envelope and temporal fine structure between tonal and non-tonal languages
Dongmei Wang, James M. Kates, John H. L. Hansen
Vowel spectral contributions to English and Mandarin sentence intelligibility
Daniel Fogerty, Fei Chen
Significance of aperiodicity in the pitch perception of expressive voices
Vinay Kumar Mittal, B. Yegnanarayana
DIAPIX-FL: a symmetric corpus of problem-solving dialogues in first and second languages
Mirjam Wester, María Luisa García Lecumberri, Martin Cooke
Cross-linguistic investigations of oral and silent reading
Christophe Coupé, Yoon Mi Oh, François Pellegrino, Egidio Marsico
Non-native word recognition in noise: the role of word-initial and word-final information
Juul Coumans, Roeland van Hout, Odette Scharenborg
The effects of high and low variability phonetic training on the perception and production of English vowels /e/-/æ/ by Cantonese ESL learners with high and low L2 proficiency levels
Janice Wing Sze Wong
Dutch vowel production by Spanish learners: duration and spectral features
Pepi Burgos, Mátyás Jani, Catia Cucchiarini, Roeland van Hout, Helmer Strik
English consonant confusions by Greek listeners in quiet and noise and the role of phonological short-term memory
Angelos Lengeris, Katerina Nicolaidis
Corpus-based L2 phonological data and semi-automatic perceptual analysis: the case of nasal vowels produced by beginner Japanese learners of French
Sylvain Detey, Isabelle Racine, Julien Eychenne, Yuji Kawaguchi
Perception of prosodic prominence and boundaries by L1 and L2 speakers of English
Gábor Pintér, Shinobu Mizuguchi, Koichi Tateishi
Prosody perception, reading accuracy, nonliteral language comprehension, and music and tonal pitch discrimination in school aged children
Rose Thomas Kalathottukaren, Suzanne C. Purdy, Elaine Ballard
Phoneme category retuning in a non-native language
Polina Drozdova, Roeland van Hout, Odette Scharenborg
Speech emotion recognition with cross-lingual databases
Bo-Chang Chiou, Chia-Ping Chen
Speaker diarization using eye-gaze information in multi-party conversations
Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Tatsuya Kawahara
Unsupervised speaker diarization using riemannian manifold clustering
Che-Wei Huang, Bo Xiao, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Towards a complete binary key system for the speaker diarization task
Héctor Delgado, Corinne Fredouille, Javier Serrano
An iterative speaker re-diarization scheme for improving speaker-based entity extraction in multimedia archives
Houman Ghaemmaghami, David Dean, Sridha Sridharan
Speaker diarization using gesture and speech
Binyam Gebrekidan Gebre, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts, Tom Heskes
Is incremental cross-show speaker diarization efficient for processing large volumes of data?
Grégor Dupuy, Sylvain Meignier, Yannick Estève
Detecting and labeling speakers on overlapping speech using vector taylor series
Pranay Dighe, Marc Ferràs, Hervé Bourlard
Phoneme background model for information bottleneck based speaker diarization
Sree Harsha Yella, Petr Motlicek, Hervé Bourlard
Diarizing large corpora using multi-modal speaker linking
Marc Ferràs, Stefano Masneri, Oliver Schreer, Hervé Bourlard
Multimodal understanding for person recognition in video broadcasts
Frederic Bechet, Meriem Bendris, Delphine Charlet, Géraldine Damnati, Benoit Favre, Mickael Rouvier, Remi Auguste, Benjamin Bigot, Richard Dufour, Corinne Fredouille, Georges Linarès, Jean Martinet, Gregory Senay, Pierre Tirilly
Comparing time-frequency representations for directional derivative features
James Gibson, Maarten Van Segbroeck, Shrikanth S. Narayanan
Robust speech recognition with speech enhanced deep neural networks
Jun Du, Qing Wang, Tian Gao, Yong Xu, Li-Rong Dai, Chin-Hui Lee
An investigation of likelihood normalization for robust ASR
Emmanuel Vincent, Aggelos Gkiokas, Dominik Schnitzer, Arthur Flexer
Identifying the human-machine differences in complex binaural scenes: what can be learned from our auditory system
Constantin Spille, Bernd T. Meyer
Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling
Jürgen T. Geiger, Zixing Zhang, Felix Weninger, Björn Schuller, Gerhard Rigoll
Joint adaptation and adaptive training of TVWR for robust automatic speech recognition
Shilin Liu, Khe Chai Sim
Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression
Hyung-Min Park, Matthew Maciejewski, Chanwoo Kim, Richard M. Stern
Variable-component deep neural network for robust speech recognition
Rui Zhao, Jinyu Li, Yifan Gong
Effective modulation spectrum factorization for robust speech recognition
Yu-Chen Kao, Yi-Ting Wang, Berlin Chen
Hybrid MLP/structured-SVM tandem systems for large vocabulary and robust ASR
Suman V. Ravuri
Robust speech recognition using temporal masking and thresholding algorithm
Chanwoo Kim, Kean K. Chin, Michiel Bacchiani, Richard M. Stern
Deep neural network bottleneck features for generalized variable parameter HMMs
Xurong Xie, Rongfeng Su, Xunying Liu, Lan Wang
A novel dynamic parameters calculation approach for model compensation
Suliang Bu, Yanmin Qian, Kai Yu
Speech recognition based on Itakura-Saito divergence and dynamics/sparseness constraints from mixed sound of speech and music by non-negative matrix factorization
Naoaki Hashimoto, Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa
Noise robust speech recognition based on noise-adapted HMMs using speech feature compensation
Yong-Joo Chung
Noise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognition
M. J. Alam, Patrick Kenny, Pierre Dumouchel, Douglas O'Shaughnessy
Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch
X. Chen, Y. Wang, X. Liu, Mark J. F. Gales, Philip C. Woodland
Word pair approximation for more efficient decoding with high-order language models
David Nolden, Ralf Schlüter, Hermann Ney
Comparing approaches to convert recurrent neural networks into backoff language models for efficient decoding
Heike Adel, Katrin Kirchhoff, Ngoc Thang Vu, Dominic Telaar, Tanja Schultz
Removing redundancy from lattices
David Nolden, Hagen Soltau, Daniel Povey, Pegah Ghahremani, Lidia Mangu, Hermann Ney
Lattice decoding and rescoring with long-Span neural network language models
Martin Sundermeyer, Zoltán Tüske, Ralf Schlüter, Hermann Ney
Word-phrase-entity language models: getting more mileage out of n-grams
Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang, Andreas Stolcke, Benoît Dumoulin
A novel boosting algorithm for improved i-vector based speaker verification in noisy environments
Sourjya Sarkar, K. Sreenivasa Rao
Using deep belief networks for vector-based speaker recognition
W. M. Campbell
A deep neural network speaker verification system targeting microphone speech
Yun Lei, Luciana Ferrer, Mitchell McLaren, Nicolas Scheffer
Application of convolutional neural networks to speaker recognition in noisy conditions
Mitchell McLaren, Yun Lei, Nicolas Scheffer, Luciana Ferrer
SVM based speaker recognition: harnessing trials with multiple enrollment sessions
Jason Pelecanos, Weizhong Zhu, Sibel Yaman
I-vector speaker verification based on phonetic information under transmission channel effects
Laura Fernández Gallardo, Michael Wagner, Sebastian Möller
Using conditional random fields to predict focus word pair in spontaneous spoken English
Xiao Zang, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai
Applications of maximum entropy rankers to problems in spoken language processing
Richard Sproat, Keith Hall
Text-to-speech with cross-lingual neural network-based grapheme-to-phoneme models
Xavi Gonzalvo, Monika Podsiadło
Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis
Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi
Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages
B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan
An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis
Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, Junichi Yamagishi, Javier Latorre
Chaotic mixed excitation source for speech synthesis
Hemant A. Patil, Tanvina B. Patel
Refined inter-segment joining in multi-form speech synthesis
Alexander Sorin, Slava Shechtman, Vincent Pollet
A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis system
Ran Zhang, Zhengqi Wen, Jianhua Tao, Ya Li, Bing Liu, Xiaoyan Lou
Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression
Diandra Fabre, Thomas Hueber, Pierre Badin
Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models
Patrick Lumban Tobing, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura, Ayu Purwarianti
Speech-driven head motion synthesis using neural networks
Chuang Ding, Pengcheng Zhu, Lei Xie, Dongmei Jiang, Zhong-Hua Fu
Text-independent voice conversion using speaker model alignment method from non-parallel speech
Peng Song, Yun Jin, Wenming Zheng, Li Zhao
Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes
Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai
Hierarchical modeling of F0 contours for voice conversion
Gerard Sanchez, Hanna Silen, Jani Nurminen, Moncef Gabbouj
Speech prosody generation for text-to-speech synthesis based on generative model of F0 contours
Kento Kadowaki, Tatsuma Ishihara, Nobukatsu Hojo, Hirokazu Kameoka
An iterative approach to decision tree training for context dependent speech synthesis
Xiayu Chen, Yang Zhang, Mark Hasegawa-Johnson
Prosodic phrasing modeling for vietnamese TTS using syntactic information
Thi Thu Trang Nguyen, Albert Rilliard, Do Dat Tran, Christophe d'Alessandro
Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling
Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi
Reconstruction of mistracked articulatory trajectories
Qiang Fang, Jianguo Wei, Fang Hu
Enabling controllability for continuous expression space
Langzhou Chen, Norbert Braunschweiler
Analysis of spectral enhancement using global variance in HMM-based speech synthesis
Takashi Nose, Akinori Ito
Intelligibility analysis of fast synthesized speech
Cassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus, Junichi Yamagishi
Speech synthesis reactive to dynamic noise environmental conditions
Susana Palmaz López-Peláez, Robert A. J. Clark
Partial representations improve the prosody of incremental speech synthesis
Timo Baumann
Dialogue context sensitive speech synthesis using factorized decision trees
Pirros Tsiakoulis, Catherine Breslin, M. Gašić, Matthew Henderson, Dongho Kim, Steve Young
Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis
Xin Wang, Zhen-Hua Ling, Li-Rong Dai
On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speech
Dhananjaya Gowda, Heikki Kallasjoki, Reima Karhila, Cristian Contan, Kalle Palomäki, Mircea Giurgiu, Mikko Kurimo
Objective evaluation of HMM-based speech synthesis system using kullback-leibler divergence
C. -T. Do, M. Evrard, A. Leman, Christophe d'Alessandro, Albert Rilliard, J. -L. Crebouw
Speech intonation for TTS: study on evaluation methodology
Javier Latorre, Kayoko Yanagisawa, Vincent Wan, BalaKrishna Kolluru, Mark J. F. Gales
Improving language-universal feature extraction with deep maxout and convolutional neural networks
Yajie Miao, Florian Metze
Exploiting vocal-source features to improve ASR accuracy for low-resource languages
Raul Fernandez, Jia Cui, Andrew Rosenberg, Bhuvana Ramabhadran, Xiaodong Cui
Data augmentation for low resource languages
Anton Ragni, Kate M. Knill, Shakti P. Rath, Mark J. F. Gales
About combining forward and backward-based decoders for selecting data for unsupervised training of acoustic models
Denis Jouvet, Dominique Fohr
Combination of multilingual and semi-supervised training for under-resourced languages
František Grézl, Martin Karafiát
Investigating the learning effect of multilingual bottle-neck features for ASR
Ngoc Thang Vu, Jochen Weiner, Tanja Schultz
Distributed learning of multilingual DNN feature extractors using GPUs
Yajie Miao, Hao Zhang, Florian Metze
Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages
Shakti P. Rath, Kate M. Knill, Anton Ragni, Mark J. F. Gales
Recent improvements in neural network acoustic modeling for LVCSR in low resource languages
Jia Cui, Bhuvana Ramabhadran, Xiaodong Cui, Andrew Rosenberg, Brian Kingsbury, Abhinav Sethy
Towards better performance with heterogeneous training data in acoustic modeling using deep neural networks
Yan Huang, Malcolm Slaney, Michael L. Seltzer, Yifan Gong
A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models
Takuya Higuchi, Hirofumi Takeda, Tomohiko Nakamura, Hirokazu Kameoka
Enhancing audio source separability using spectro-temporal regularization with NMF
Colin Vaz, Dimitrios Dimitriadis, Shrikanth S. Narayanan
Blind speech source localization, counting and separation for 2-channel convolutive mixtures in a reverberant environment
Sayeh Mirzaei, Hugo Van hamme, Yaser Norouzi
Discriminative NMF and its application to single-channel source separation
Felix Weninger, Jonathan Le Roux, John R. Hershey, Shinji Watanabe
Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information
Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino
A graph-based Gaussian component clustering approach to unsupervised acoustic modeling
Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, Haizhou Li
A speech system for estimating daily word counts
Ali Ziaei, Abhijeet Sangwan, John H. L. Hansen
Ensemble modeling of denoising autoencoder for speech spectrum restoration
Xugang Lu, Yu Tsao, Shigeki Matsuda, Chiori Hori
Acoustic modeling with deep neural networks using raw time signal for LVCSR
Zoltán Tüske, Pavel Golik, Ralf Schlüter, Hermann Ney
Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions
Vikramjit Mitra, Wen Wang, Horacio Franco, Yun Lei, Chris Bartels, Martin Graciarena
Deep scattering spectra with deep neural networks for LVCSR tasks
Tara N. Sainath, Vijayaditya Peddinti, Brian Kingsbury, Petr Fousek, Bhuvana Ramabhadran, David Nahamoo
Robust CNN-based speech recognition with Gabor filter kernels
Shuo-Yiin Chang, Nelson Morgan
Probabilistic linear discriminant analysis with bottleneck features for speech recognition
Liang Lu, Steve Renals
Evaluating speech features with the minimal-pair ABX task (II): resistance to noise
Thomas Schatz, Vijayaditya Peddinti, Xuan-Nga Cao, Francis Bach, Hynek Hermansky, Emmanuel Dupoux
Investigating NMF speech enhancement for neural network based acoustic models
Jürgen T. Geiger, Jort F. Gemmeke, Björn Schuller, Gerhard Rigoll
Automatic speech feature classification for children with cochlear implants
Jason Lilley, James Mahshie, H. Timothy Bunnell
Sequential maximum mutual information linear discriminant analysis for speech recognition
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey
Model and feature based compensation for whispered speech recognition
Shabnam Ghaffarzadegan, Hynek Bořil, John H. L. Hansen
Post-masking: a hybrid approach to array processing for speech recognition
Amir R. Moghimi, Bhiksha Raj, Richard M. Stern
ASR feature extraction with morphologically-filtered power-normalized cochleograms
F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín, C. Peláez-Moreno
Should deep neural nets have ears? the role of auditory features in deep learning approaches
Angel Mario Castro Martinez, Niko Moritz, Bernd T. Meyer
Extending Limabeam with discrimination and coarse gradients
Charles Fox, Thomas Hain
Generation of F0 contour using deep boltzmann machine and twin Gaussian process hybrid model for bengali language
Sankar Mukherjee, Shyamal Kumar Das Mandal
Room localization for distant speech recognition
Juan A. Morales-Cordovilla, Hannes Pessentheiner, Martin Hagmüller, Gernot Kubin
Posterior-based sparse representation for automatic speech recognition
Sara Bahaadini, Afsaneh Asaei, David Imseng, Hervé Bourlard
Lateral formants in three central australian languages
Marija Tabain, Andrew Butcher, Gavan Breen, Richard Beare
Detecting articulatory compensation in acoustic data through linear regression modeling
Alina Khasanova, Jennifer Cole, Mark Hasegawa-Johnson
The relationship between the second subglottal resonance and vowel class, standing height, trunk length, and F0 variation for Mandarin speakers
Jinxi Guo, Angli Liu, Harish Arsikere, Abeer Alwan, Steven M. Lulich
Comparison of speech quality with and without sensors in electromagnetic articulograph AG 501 recording
Nisha Meenakshi, Chiranjeevi Yarra, B. K. Yamini, Prasanta Kumar Ghosh
Impact of age in the production of European Portuguese vowels
Luciana Albuquerque, Catarina Oliveira, António Teixeira, Pedro Sa-Couto, João Freitas, Miguel Sales Dias
`houston, we have a solution': a case study of the analysis of astronaut speech during NASA apollo 11 for long-term speaker modeling
Chengzhu Yu, John H. L. Hansen, Douglas W. Oard
Relating automatic vowel space estimates to talker intelligibility
Yi Luan, Richard Wright, Mari Ostendorf, Gina-Anne Levow
Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation
Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino
Sparse time-frequency representation of speech by the vandermonde transform
Christian Fischer Pedersen, Tom Bäckström
Analysis and identification of human scream: implications for speaker recognition
Mahesh Kumar Nandwana, John H. L. Hansen
F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification
Dongmei Wang, Philipos C. Loizou, John H. L. Hansen
The influence of pitch and noise on the discriminability of filterbank features
Malcolm Slaney, Michael L. Seltzer
Choosing useful word alternates for automatic speech recognition correction interfaces
David Harwath, Alexander Gruenstein, Ian McGraw
An initial investigation of long-term adaptation for meeting transcription
X. Chen, Mark J. F. Gales, Kate M. Knill, Catherine Breslin, Langzhou Chen, K. K. Chin, Vincent Wan
Progress in the BBN keyword search system for the DARPA RATS program
Tim Ng, Roger Hsiao, Le Zhang, Damianos Karakos, Sri Harish Mallidi, Martin Karafiát, Karel Veselý, Igor Szőke, Bing Zhang, Long Nguyen, Richard Schwartz
Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive
Jan Nouza, Petr Cerva, Jindrich Zdansky, Karel Blavka, Marek Bohac, Jan Silovsky, Josef Chaloupka, Michaela Kucharova, Ladislav Seps, Jiri Malek, Michal Rott
Automatic assessment of children's reading with the FLaVoR decoding using a phone confusion model
Emre Yılmaz, Joris Pelemans, Hugo Van hamme
RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and Portuguese
M. Ali Basha Shaik, Zoltán Tüske, M. Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney
Single channel source separation with general stochastic networks
Matthias Zöhrer, Franz Pernkopf
Large-margin conditional random fields for single-microphone speech separation
Yu Ting Yeung, Tan Lee, Cheung-Chi Leung
On the use of the Watson mixture model for clustering-based under-determined blind source separation
Ingrid Jafari, Roberto Togneri, Sven Nordholm
Binary mask estimation based on frequency modulations
Chung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi
Bayesian factorization and selection for speech and music separation
Po-Kai Yang, Chung-Chien Hsu, Jen-Tzung Chien
Self-adaption in single-channel source separation
Michael Wohlmayr, Ludwig Mohr, Franz Pernkopf
Multichannel automatic recognition of voice command in a multi-room smart home: an experiment involving seniors and users with visual impairment
Michel Vacher, Benjamin Lecouteux, François Portet
An evaluation of unsupervised acoustic model training for a dysarthric speech interface
Oliver Walter, Vladimir Despotovic, Reinhold Haeb-Umbach, Jort F. Gemmeke, Bart Ons, Hugo Van hamme
Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography
Jose A. Gonzalez, Lam A. Cheah, Jie Bai, Stephen R. Ell, James M. Gilbert, Roger K. Moore, Phil D. Green
Audio-visual signal processing in a multimodal assisted living environment
Alexey Karpov, Lale Akarun, Hülya Yalçın, Alexander Ronzhin, Barış Evrim Demiröz, Aysun Çoban, Miloš Železný
On the selection of the impulse responses for distant-speech recognition based on contaminated speech training
Mirco Ravanelli, Maurizio Omologo
Adaptive speech recognition and dialogue management for users with speech disorders
I. Casanueva, H. Christensen, Thomas Hain, Phil D. Green
Prediction of cognitive performance in an animal fluency task based on rate and articulatory markers
Bea Yu, Thomas F. Quatieri, James R. Williamson, James C. Mundt
Analysis of laughter events in real science classes by using multiple environment sensor data
Carlos Ishi, Hiroaki Hatano, Norihiro Hagita
Parallel deep neural network training for LVCSR tasks using blue gene/Q
Tara N. Sainath, I-hsin Chung, Bhuvana Ramabhadran, Michael Picheny, John Gunnels, Brian Kingsbury, George Saon, Vernon Austel, Upendra Chaudhari
Word embeddings for speech recognition
Samy Bengio, Georg Heigold
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs
Frank Seide, Hao Fu, Jasha Droppo, Gang Li, Dong Yu
Boundary contraction training for acoustic models based on discrete deep neural networks
Ryu Takeda, Naoyuki Kanda, Nobuo Nukaga
Restructuring output layers of deep neural networks using minimum risk parameter clustering
Yotaro Kubo, Jun Suzuki, Takaaki Hori, Atsushi Nakamura
Distributed asynchronous optimization of convolutional neural networks
William Chan, Ian Lane
Convolutional deep maxout networks for phone recognition
László Tóth
Joint sequence training of phone and grapheme acoustic model based on multi-task learning deep neural networks
Dongpeng Chen, Brian Mak, Sunil Sivadas
Improving semi-supervised deep neural network for keyword search in low resource languages
Roger Hsiao, Tim Ng, Le Zhang, Shivesh Ranjan, Stavros Tsakalidis, Long Nguyen, Richard Schwartz
Pruning deep neural networks by optimal brain damage
Chao Liu, Zhiyong Zhang, Dong Wang
Improving the performance of far-field speaker verification using multi-condition training: the case of GMM-UBM and i-vector systems
Anderson R. Avila, Milton Sarria-Paja, Francisco J. Fraga, Douglas O'Shaughnessy, Tiago H. Falk
Clustering-based i-vector formulation for speaker recognition
Hung-Shin Lee, Yu Tsao, Hsin-Min Wang, Shyh-Kang Jeng
Speaker recognition via fusion of subglottal features and MFCCs
Harish Arsikere, Hitesh Anand Gupta, Abeer Alwan
The NIST SRE summed channel speaker recognition system
Hanwu Sun, Bin Ma
Advantages of wideband over narrowband channels for speaker verification employing MFCCs and LFCCs
Laura Fernández Gallardo, Michael Wagner, Sebastian Möller
Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features
Ming Li, Wenbo Liu
Feature Switching in the i-vector framework for speaker verification
T. Asha, M. S. Saranya, D. S. Karthik Pandia, Srikanth Madikeri, Hema A. Murthy
PLDA modeling in the fishervoice subspace for speaker verification
Jinghua Zhong, Weiwu Jiang, Wei Rao, Man-Wai Mak, Helen Meng
Performance factor analysis for the 2012 NIST speaker recognition evaluation
Alvin F. Martin, Craig S. Greenberg, Vincent M. Stanford, John M. Howard, George R. Doddington, John J. Godfrey
Simultaneous gender classification and voice activity detection using deep neural networks
Hiroshi Fujimura
Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons
Ahmed Hussen Abdelaziz, Dorothea Kolossa
Lipreading using convolutional neural network
Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata
Lipreading approach for isolated digits recognition under whisper and neutral speech
Fei Tao, Carlos Busso
Multimodal exemplar-based voice conversion using lip features in noisy environments
Kenta Masaka, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki
Towards a practical silent speech recognition system
Yunbin Deng, James T. Heaton, Geoffrey S. Meltzner
Enhancing multimodal silent speech interfaces with feature selection
João Freitas, Artur Ferreira, Mário Figueiredo, António Teixeira, Miguel Sales Dias
Opti-speech: a real-time, 3d visual feedback system for speech training
William Katz, Thomas F. Campbell, Jun Wang, Eric Farrar, J. Coleman Eubanks, Arvind Balasubramanian, Balakrishnan Prabhakaran, Rob Rennaker
Across-speaker articulatory normalization for speaker-independent silent speech recognition
Jun Wang, Ashok Samal, Jordan R. Green
Conversion from facial myoelectric signals to speech: a unit selection approach
Marlene Zahner, Matthias Janke, Michael Wand, Tanja Schultz
Towards real-life application of EMG-based speech recognition by using unsupervised adaptation
Michael Wand, Tanja Schultz
Simple gesture-based error correction interface for smartphone speech recognition
Yuan Liang, Koji Iwano, Koichi Shinoda
Normalization of ASR confidence classifier scores via confidence mapping
Kshitiz Kumar, Chaojun Liu, Yifan Gong
Neural network phone duration model for speech recognition
Tanel Alumäe
Sequence discriminative distributed training of long short-term memory recurrent neural networks
Haşim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga, Mark Mao
Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition
Zhen Huang, Jinyu Li, Chao Weng, Chin-Hui Lee
A comparison of training approaches for discriminative segmental models
Hao Tang, Kevin Gimpel, Karen Livescu
Asynchronous stochastic optimization for sequence training of deep neural networks: towards big data
Erik McDermott, Georg Heigold, Pedro J. Moreno, Andrew Senior, Michiel Bacchiani
Detection of children's paralinguistic events in interaction with caregivers
Hrishikesh Rao, Jonathan C. Kim, Mark A. Clements, Agata Rozga, Daniel S. Messinger
Age and rhythmic variations: a study on Italian
Massimo Pettorino, Elisa Pellegrino
Probabilistic acoustic volume analysis for speech affected by depression
Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Jarek Krajewski
Exploring modulation spectrum features for speech-based depression level classification
Elif Bozkurt, Orith Toledo-Ronen, Alexander Sorin, Ron Hoory
Automatic modelling of depressed speech: relevant features and relevance of gender
Florian Hönig, Anton Batliner, Elmar Nöth, Sebastian Schnieder, Jarek Krajewski
Excitation source features for discrimination of anger and happy emotions
P. Gangamohan, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana
Encoding linear models as weighted finite-state transducers
Ke Wu, Cyril Allauzen, Keith Hall, Michael Riley, Brian Roark
Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion
Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura
Unsupervised language filtering using the latent dirichlet allocation
Wei Zhang, Robert A. J. Clark, Yongyuan Wang
Generating multiple-accent pronunciations for TTS using joint sequence model interpolation
BalaKrishna Kolluru, Vincent Wan, Javier Latorre, Kayoko Yanagisawa, Mark J. F. Gales
Using a hybrid approach to build a pronunciation dictionary for Brazilian Portuguese
Gustavo Mendonça, Sandra Aluisio
A flexible front-end for HTS
Matthew P. Aylett, Rasmus Dall, Arnab Ghoshal, Gustav Eje Henter, Thomas Merritt
Cross-language perception of Japanese singleton and geminate consonants: preliminary data from non-native learners of Japanese and native speakers of Italian and australian English
Kimiko Tsukada, Felicity Cox, John Hajek
Difficulty in discriminating non-native vowels: are Dutch vowels easier for australian English than Spanish listeners?
Samra Alispahic, Paola Escudero, Karen E. Mulak
Acoustic properties of shared vowels in bilingual Mandarin-English children
Jing Yang, Robert Allen Fox
Generating segmental foreign accent
María Luisa García Lecumberri, Roberto Barra-Chicote, Rubén Pérez Ramón, Junichi Yamagishi, Martin Cooke
Differences of pitch profiles in Germanic and slavic languages
Bistra Andreeva, Grażyna Demenko, Bernd Möbius, Frank Zimmerer, Jeanin Jügler, Magdalena Oleskowicz-Popiel
The obligatory contour principle in african and European varieties of French
Mathieu Avanzi, Guri Bordal, Gélase Nimbona
Content matching for short duration speaker recognition
Nicolas Scheffer, Yun Lei
Extended RSR2015 for text-dependent speaker verification over VHF channel
Anthony Larcher, Kong Aik Lee, Pablo L. Sordo Martínez, Trung Hieu Nguyen, Bin Ma, Haizhou Li
Tandem deep features for text-dependent speaker verification
Tianfan Fu, Yanmin Qian, Yuan Liu, Kai Yu
In-domain versus out-of-domain training for text-dependent JFA
Patrick Kenny, Themos Stafylakis, M. J. Alam, Pierre Ouellet, Marcel Kockmann
Domain adaptation for text dependent speaker verification
Hagai Aronowitz, Asaf Rendel
Factor analysis with sampling methods for text dependent speaker recognition
Antonio Miguel, Jesús Villalba, Alfonso Ortega, Eduardo Lleida, Carlos Vaquero
Dictionary-based pitch tracking with dynamic programming
Ewout van den Berg, Bhuvana Ramabhadran
Acoustic features for robust classification of Mandarin tones
Hongbing Hu, Stephen A. Zahorian, Peter Guzewich, Jiang Wu
Preservation of lexical tones in singing in a tone language
Anastasia Karlsson, Håkan Lundström, Jan-Olof Svantesson
Emotional speech classification using adaptive sinusoidal modelling
Theodora Yakoumaki, George P. Kafentzis, Yannis Stylianou
Formant enhancement based speech watermarking for tampering detection
Shengbei Wang, Masashi Unoki, Nam Soo Kim
Modelling primitive streaming of simple tone sequences through factorisation of modulation pattern tensors
Tom Barker, Hugo Van hamme, Tuomas Virtanen
Detection of vowel onset points in voiced aspirated sounds of indian languages
Biswajit Dev Sarma, S. R. M. Prasanna
Accuracy evaluation of esophageal voice analysis based on automatic topology generated-voicing source HMM
Akira Sasou
Audio watermarking based on multiple echoes hiding for FM radio
Xuejun Zhang, Xiang Xie
Development of bilingual ASR system for MediaParl corpus
Petr Motlicek, David Imseng, Milos Cernak, Namhoon Kim
Investigation of cross-lingual bottleneck features in hybrid ASR systems
Jie Li, Rong Zheng, Bo Xu
Language identification of individual words with joint sequence models
Oluwapelumi Giwa, Marelie H. Davel
Audio-to-text alignment for speech recognition with very limited resources
Xavier Anguera, Jordi Luque, Ciro Gracia
A minimal-resource transliteration framework for vietnamese
Hoang Gia Ngo, Nancy F. Chen, Sunil Sivadas, Bin Ma, Haizhou Li
Combining recurrent neural networks and factored language models during decoding of code-Switching speech
Heike Adel, Dominic Telaar, Ngoc Thang Vu, Katrin Kirchhoff, Tanja Schultz
Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages
Zoltán Tüske, Pavel Golik, David Nolden, Ralf Schlüter, Hermann Ney
Mixture of latent words language models for domain adaptation
Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi
Improving spoken document retrieval by unsupervised language model adaptation using utterance-based web search
Robert Herms, Marc Ritter, Thomas Wilhelm-Stein, Maximilian Eibl
The nested indian buffet process for flexible topic modeling
Jen-Tzung Chien, Ying-Lan Chang
Automated closed captioning for Russian live broadcasting
K. Levin, I. Ponomareva, A. Bulusheva, G. Chernykh, I. Medennikov, N. Merkin, A. Prudnikov, Natalia Tomashenko
Pronunciation modeling of foreign words for Mandarin ASR by considering the effect of language transfer
Lei Wang, Rong Tong
Pronunciation learning for named-entities through crowd-sourcing
Attapol T. Rutherford, Fuchun Peng, Françoise Beaufays
Pronunciation variation in read and conversational austrian German
Barbara Schuppler, Martine Adda-Decker, Juan A. Morales-Cordovilla
Discriminative pronunciation modeling for dialectal speech recognition
Maider Lehr, Kyle Gorman, Izhak Shafran
The goodness of pronunciation algorithm applied to disordered speech
Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Jérôme Farinas, Marina Robert
Using deep neural networks to improve proficiency assessment for children English language learners
Angeliki Metallinou, Jian Cheng
Alignment of spoken utterances with slide content for easier learning with recorded lectures using structured support vector machine (SVM)
Han Lu, Sheng-syun Shen, Sz-Rung Shiang, Hung-yi Lee, Lin-shan Lee
A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners
Richeng Duan, Jinsong Zhang, Wen Cao, Yanlu Xie
3d tongue motion visualization based on ultrasound image sequences
Kele Xu, Yin Yang, A. Jaumard-Hakoun, Martine Adda-Decker, A. Amelot, S. K. Al Kork, L. Crevier-Buchman, P. Chawah, G. Dreyfus, T. Fux, C. Pillot-Loiseau, P. Roussel, M. Stone, B. Denby
Listen with your skin: aerotak speech perception enhancement system
Donald Derrick, Tom De Rybel, Greg A. O'Beirne, Jennifer Hay
Speech assistant system
László Czap
Spoken dialogue system for restaurant recommendation and reservation
Rafael E. Banchs, Seokhwan Kim
Interlingual map task corpus collection
Hayakawa Akira, Nick Campbell, Saturnino Luz
A client mobile application for Chinese-Spanish statistical machine translation
Jordi Centelles, Marta R. Costa-jussà, Rafael E. Banchs
LuciawebGL: a new WebGL-Based talking head
Alberto Benin, Piero Cosi, Giuseppe Riccardo Leone, Giulio Paci
Crowdee: mobile crowdsourcing micro-task platform for celebrating the diversity of languages
Babak Naderi, Tim Polzehl, André Beyer, Tibor Pilz, Sebastian Möller
On the use of the `pure data' programming language for teaching and public outreach in speech processing
Roger K. Moore
Syncwords: a platform for semi-automated closed captioning and subtitles
Aleksandr Dubinsky
Simple4all
Robert A. J. Clark
An educational platform to capture, visualize and analyze rare singing
P. Chawah, S. K. Al Kork, T. Fux, Martine Adda-Decker, A. Amelot, N. Audibert, B. Denby, G. Dreyfus, A. Jaumard-Hakoun, C. Pillot-Loiseau, P. Roussel, M. Stone, Kele Xu, L. Crevier-Buchman
Single-channel speech enhancement based on non-negative matrix factorization and online noise adaptation
Kwang Myung Jeon, Chan Jun Chun, Woo Kyeong Seong, Hong Kook Kim, Myung Kyu Choi
Intelligibility of high-pitched vowel sounds in the singing and speaking of a female Cantonese opera singer
Dieter Maurer, Peggy Mok, Daniel Friedrichs, Volker Dellwo
Iterative refinement of amplitude and phase in single-channel speech enhancement
Pejman Mowlaee, Mario Kaoru Watanabe, Rahim Saeidi
elite-HTS: a NLP tool for French HMM-based speech synthesis
Sophie Roekhaut, Sandrine Brognaux, Richard Beaufort, Thierry Dutoit
SARA — singapore's automated responsive assistant for the touristic domain
Andreea I. Niculescu, Rafael E. Banchs, Ridong Jiang, Seokhwan Kim, Kheng Hui Yeo, Arthur Niswar
The speech recognition virtual kitchen: launch party
Andrew Plummer, Eric Riebling, Anuj Kumar, Florian Metze, Eric Fosler-Lussier, Rebecca Bates
System for automated speech and language analysis (SALSA)
Kyle Marek-Spartz, Benjamin Knoll, Robert Bill, Thomas Christie, Serguei Pakhomov
Pronunciation practice support system for children who have difficulty correctly pronouncing words
Ikuyo Masuda-Katsuse
Automated production of true-cased punctuated subtitles for weather and news broadcasts
Joris Driesen, Alexandra Birch, Simon Grimsey, Saeid Safarfashandi, Juliet Gauthier, Matt Simpson, Steve Renals
I2r speech2singing perfects everyone's singing
Minghui Dong, S. W. Lee, Haizhou Li, Paul Chan, Xuejian Peng, Jochen Walter Ehnes, Dongyan Huang
Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech
Gustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo, Simon King
Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis
Thomas Merritt, Tuomo Raitio, Simon King
Voice expression conversion with factorised HMM-TTS models
Javier Latorre, Vincent Wan, Kayoko Yanagisawa
Noise-robust TTS speaker adaptation with statistics smoothing
Kayoko Yanagisawa, Langzhou Chen, Mark J. F. Gales
Speech synthesis in various communicative situations: impact of pronunciation variations
Sandrine Brognaux, Benjamin Picart, Thomas Drugman
Formant-controlled speech synthesis using hidden trajectory model
Ming-Qi Cai, Zhen-Hua Ling, Li-Rong Dai
Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection
Xiao-Lei Zhang, DeLiang Wang
Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection
Abhay Prasad, Prasanta Kumar Ghosh, Shrikanth S. Narayanan
Speech activity detection for NASA apollo space missions: challenges and solutions
Ali Ziaei, Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen, Douglas W. Oard
Towards improving statistical model based voice activity detection
Ming Tu, Xiang Xie, Yishan Jiao
The use of low-frequency ultrasound for voice activity detection
Ian Vince McLoughlin
Improving the speech activity detection for the DARPA RATS phase-3 evaluation
Jeff Ma
Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation
Duc Le, Emily Mower Provost
Ranking severity of speech errors by their phonological impact in context
Sofia Strömbergsson, Christina Tånnander, Jens Edlund
Automatic detection of parkinson's disease from words uttered in three different languages
J. R. Orozco-Arroyave, Florian Hönig, J. D. Arias-Londoño, J. F. Vargas-Bonilla, S. Skodda, J. Rusz, Elmar Nöth
Automating an objective measure of pediatric speech intelligibility
Jason Lilley, Susan Nittrouer, H. Timothy Bunnell
A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech
Mostafa Shahin, Beena Ahmed, Jacqueline McKechnie, Kirrie Ballard, Ricardo Gutierrez-Osuna
Acoustic and kinematic characteristics of vowel production through a virtual vocal tract in dysarthria
Jeff Berry, Andrew Kolb, Cassandra North, Michael T. Johnson
The EMG-UKA corpus for electromyographic speech processing
Michael Wand, Matthias Janke, Tanja Schultz
A whispered Mandarin corpus for speech technology applications
Pei Xuan Lee, Darren Wee, Hilary Si Yin Toh, Boon Pang Lim, Nancy F. Chen, Bin Ma
Euronews: a multilingual benchmark for ASR and LID
Roberto Gretter
ATHENA: a Greek multi-sensory database for home automation control uthor: isidoros rodomagoulakis (NTUA, Greece)
Antigoni Tsiami, Isidoros Rodomagoulakis, Panagiotis Giannoulis, Athanasios Katsamanis, Gerasimos Potamianos, Petros Maragos
The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones
Marco Matassoni, Ramón Fernandez Astudillo, Athanasios Katsamanis, Mirco Ravanelli
Verbal description of LEGO blocks
Diogo Henriques, Isabel Trancoso, Daniel Mendes, Alfredo Ferreira
Phase importance in speech processing applications
Pejman Mowlaee, Rahim Saeidi, Yannis Stylianou
Phase-based harmonic/percussive separation
Estefanía Cano, Mark Plumbley, Christian Dittmar
Phase distortion statistics as a representation of the glottal source: application to the classification of voice qualities
Gilles Degottex, Nicolas Obin
A measure of phase randomness for the harmonic model in speech synthesis
Gilles Degottex, Daniel Erro
Enhancement of speech intelligibility in near-end noise conditions with phase modification
Emma Jokinen, Marko Takanen, Hannu Pulakka, Paavo Alku
A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation
S. Aswin Shanmugam, Hema Murthy
The importance of phase on voice quality assessment
Maria Koutsogiannaki, Olympia Simantiraki, Gilles Degottex, Yannis Stylianou
Feature extraction from analytic phase of speech signals for speaker verification
Karthika Vijayan, Vinay Kumar, K. Sri Rama Murty
A cross-vocoder study of speaker independent synthetic speech detection using phase information
Jon Sanchez, Ibon Saratxaga, Inma Hernaez, Eva Navas, Daniel Erro
Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection
Peng Yang, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li
Recent improvements in SRI's keyword detection system for noisy audio
Julien van Hout, Vikramjit Mitra, Yun Lei, Dimitra Vergyri, Martin Graciarena, Arindam Mandal, Horacio Franco
Utilizing state-level distance vector representation for improved spoken term detection by text and spoken queries
Mitsuaki Makino, Naoki Yamamoto, Atsuhiko Kai
Unsupervised spoken word retrieval using Gaussian-bernoulli restricted boltzmann machines
Raghavendra Reddy Pappagari, Shekhar Nayak, K. Sri Rama Murty
Unsupervised query-by-example spoken term detection using bag of acoustic words and non-segmental dynamic time warping
Basil George, Abhijeet Saxena, Gautam Mantena, Kishore Prahallad, B. Yegnanarayana
An empirical study of multilingual and low-resource spoken term detection using deep neural networks
Jie Li, Xiaorui Wang, Bo Xu
Diagnostic techniques for spoken keyword discovery
Peter Schulam, Murat Akbacak
Robust retrieval models for false positive errors in spoken documents
Sho Kawasaki, Tomoyosi Akiba
Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual features
Yuan-ming Liou, Yi-sheng Fu, Hung-yi Lee, Lin-shan Lee
Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion
Guillaume Gravier, Nathan Souviraà-Labastie, Sébastien Campion, Frédéric Bimbot
Semantically based search in a social speech task
Fernando García, Emilio Sanchis, Ferran Pla
Study of changes in glottal vibration characteristics during laughter
Vinay Kumar Mittal, B. Yegnanarayana
On predicting the unpleasantness level of a sound event
Stavros Ntalampiras, Ilyas Potamitis
Predicting when to laugh with structured classification
Bilal Piot, Olivier Pietquin, Matthieu Geist
Conversational structures affecting auditory likeability
Benjamin Weiss, Katrin Schoenenberg
Towards the adaptation of prosodic models for expressive text-to-speech synthesis
Mathieu Avanzi, George Christodoulides, Damien Lolive, Elisabeth Delais-Roussarie, Nelly Barbot
Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus
Sho Matsumiya, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura
Learning L2 prosody is more difficult than you realize — F0 characteristics and chunking size of L1 English, TW L2 English and TW L1 Mandarin
Chiu-yu Tseng, Chao-yu Su
Investigating prosodic relations between initiating and responding laughs
Khiet P. Truong, Jürgen Trouvain
Application of image processing methods to filled pauses detection from spontaneous speech
Dmytro Prylipko, Olga Egorow, Ingo Siegert, Andreas Wendemuth
Perception of sentence stress in English infant directed speech
Sofoklis Kakouros, Okko Räsänen
Automatic recognition of attitudes in video blogs — prosodic and visual feature analysis
Noor Alhusna Madzlan, JingGuang Han, Francesca Bonin, Nick Campbell
“was that your mother on the phone?”: classifying interpersonal relationships between dialog participants with lexical and acoustic properties
Denys Katerenchuk, David Guy Brizan, Andrew Rosenberg
Combining source and system information for limited data speaker verification
Rohan Kumar Das, S. Abhiram, S. R. M. Prasanna, A. G. Ramakrishnan
New insight into the use of phone log-likelihood ratios as features for language recognition
Mireia Diez, Amparo Varona, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, German Bordel
Robust language identification using convolutional neural network features
Sriram Ganapathy, Kyu Han, Samuel Thomas, Mohamed Omar, Maarten Van Segbroeck, Shrikanth S. Narayanan
Acoustic feature transformation using UBM-based LDA for speaker recognition
Chengzhu Yu, Gang Liu, John H. L. Hansen
SNR-dependent mixture of PLDA for noise robust speaker verification
Man-Wai Mak
Nearest neighbor discriminant analysis for robust speaker recognition
Seyed Omid Sadjadi, Jason Pelecanos, Weizhong Zhu
Enhanced language modeling for extractive speech summarization with sentence relatedness information
Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu
I-vector based representation of highly imperfect automatic transcriptions
Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, Georges Linarès, Driss Matrouf, Renato De Mori
Incorporating lexical and prosodic information at different levels for meeting summarization
Catherine Lai, Steve Renals
Subspace Gaussian mixture models for dialogues classification
Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato De Mori
Factor analysis based semantic variability compensation for automatic conversation representation
Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato De Mori
Speech cohesion for topic segmentation of spoken contents
Abdessalam Bouchekif, Géraldine Damnati, Delphine Charlet
A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models
Yan Huang, Dong Yu, Chaojun Liu, Yifan Gong
Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognition
Michiel Bacchiani, Andrew Senior, Georg Heigold
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models
Navdeep Jaitly, Vincent Vanhoucke, Geoffrey Hinton
Learning small-size DNN with output-distribution-based criteria
Jinyu Li, Rui Zhao, Jui-Ting Huang, Yifan Gong
Ensemble deep learning for speech recognition
Li Deng, John C. Platt
Learning conditional random field with hierarchical representations for dialogue act recognition
Yucan Zhou, Qinghua Hu, Jie Liu, Yuan Jia
Can adolescents with autism perceive emotional prosody?
Cristiane Hsu, Yi Xu
Age, hearing loss and the perception of affective utterances in conversational speech
Juliane Schmidt, Esther Janse, Odette Scharenborg
Analysis of emotional effect on speech-body gesture interplay
Zhaojun Yang, Shrikanth S. Narayanan
When voices get emotional: a study of emotion-enhanced memory and impairment during emotional prosody exposure
Cyrielle Chappuis, Didier Grandjean
Perception of pitch tails at potential turn boundaries in Swedish
Margaret Zellers
Towards a perceptual model of speech rhythm: integrating the influence of f0 on perceived duration
Robert Fuchs
DNN-based stochastic postfilter for HMM-based speech synthesis
Ling-Hui Chen, Tuomo Raitio, Cassia Valentini-Botinhao, Junichi Yamagishi, Zhen-Hua Ling
Statistical parametric speech synthesis using weighted multi-distribution deep belief network
Shiyin Kang, Helen Meng
TTS synthesis with bidirectional LSTM based recurrent neural networks
Yuchen Fan, Yao Qian, Feng-Long Xie, Frank K. Soong
Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
Tuomo Raitio, Antti Suni, Lauri Juvela, Martti Vainio, Paavo Alku
An introduction to computational networks and the computational network toolkit (invited talk)
Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoff Zweig, Chris Rossbach, Jon Currey
Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks
Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, Ron Hoory
Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree
Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai
High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion
Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki
Sequence error (SE) minimization training of neural network for voice conversion
Feng-Long Xie, Yao Qian, Yuchen Fan, Frank K. Soong, Haifeng Li
Robust articulatory speech synthesis using deep neural networks for BCI applications
Florent Bocquelet, Thomas Hueber, Laurent Girin, Pierre Badin, Blaise Yvert
Acoustic investigation of /th/ lenition in brunei Mandarin
Shufang Xu
Mapping emotions into acoustic space: the role of voice quality
Ting Wang, Hongwei Ding, Jianjing Kuang, Qiuwu Ma
Principal components of auditory spectro-temporal receptive fields
Nagaraj Mahajan, Nima Mesgarani, Hynek Hermansky
Segmentation in singer turns with the Bayesian information criterion
Marwa Thlithi, Thomas Pellegrini, Julien Pinquier, Régine André-Obrecht
Mappings between vocal tract area functions, vocal tract resonances and speech formants for multiple speakers
Catherine I. Watson
A next step towards measuring perceived quality of speech through physiology
Sebastian Arndt, Markus Wenzel, Jan-Niklas Antons, Friedemann Köster, Sebastian Möller, Gabriel Curio
Effect of spectral degradation to the intelligibility of vowel sentences
Fei Chen, Sharon W. K. Wong, Lena L. N. Wong
Consonant context effects on vowel sensorimotor adaptation
Jeff Berry, John Jaeger, Melissa Wiedenhoeft, Brittany Bernal, Michael T. Johnson
Assessing objective characterizations of phonetic convergence
Gérard Bailly, Amélie Martin
Generalizing time-frequency importance functions across noises, talkers, and phonemes
Michael I. Mandel, Sarah E. Yoho, Eric W. Healy
Does elderly speech recognition in noise benefit from spectral and visual cues?
Yatin Mahajan, Jeesun Kim, Chris Davis
On the conversant-specificity of stochastic turn-taking models
Kornel Laskowski
Single-ended estimation of speech intelligibility using the ITU p.563 feature set
Toshihiro Sakano, Yosuke Kobayashi, Kazuhiro Kondo
Spectral tilt modelling with GMMs for intelligibility enhancement of narrowband telephone speech
Emma Jokinen, Ulpu Remes, Marko Takanen, Kalle Palomäki, Mikko Kurimo, Paavo Alku
Analyzing perceptual dimensions of conversational speech quality
Friedemann Köster, Sebastian Möller
Interplay of informational content and energetic masking in speech perception in noise
Vincent Aubanel, Chris Davis, Jeesun Kim
On spectral and time domain energy reallocation for speech-in-noise intelligibility enhancement
Tudor-Cătălin Zorilă, Yannis Stylianou
Objective quality evaluation of noise-suppressed speech: effects of temporal envelope and fine-structure cues
Fei Chen, Yi Hu
Noisy speech enhancement based on long term harmonic model to improve speech intelligibility for hearing impaired listeners
Dongmei Wang, Philipos C. Loizou, John H. L. Hansen
Using linguistic predictability and the lombard effect to increase the intelligibility of synthetic speech in noise
Cassia Valentini-Botinhao, Mirjam Wester
Speech pre-enhancement using a discriminative microscopic intelligibility model
Maryam Al Dabel, Jon Barker
Least squares signal declipping for robust speech recognition
Mark J. Harvilla, Richard M. Stern
Semi-supervised training for bottle-neck feature based DNN-HMM hybrid systems
Haihua Xu, Hang Su, Eng Siong Chng, Haizhou Li
A big data approach to acoustic model training corpus selection
Olga Kapralova, John Alex, Eugene Weinstein, Pedro J. Moreno, Olivier Siohan
Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera
Patrick Cardinal, Ahmed Ali, Najim Dehak, Yu Zhang, Tuka Al Hanai, Yifan Zhang, James R. Glass, Stephan Vogel
rwthlm — the RWTH aachen university neural network language modeling toolkit
Martin Sundermeyer, Ralf Schlüter, Hermann Ney
Language modeling with sum-product networks
Wei-Chen Cheng, Stanley Kok, Hoai Vu Pham, Hai Leong Chieu, Kian Ming A. Chai
Improving deep neural network acoustic modeling for audio corpus indexing under the IARPA babel program
Xiaodong Cui, Brian Kingsbury, Jia Cui, Bhuvana Ramabhadran, Andrew Rosenberg, Mohammad Sadegh Rasooli, Owen Rambow, Nizar Habash, Vaibhava Goel
Cross-language transfer of semantic annotation via targeted crowdsourcing
Shammur Absar Chowdhury, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Ioannis Klasinas
Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding
Dilek Hakkani-Tür, Asli Celikyilmaz, Larry Heck, Gokhan Tur, Geoff Zweig
Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch
Philip N. Garner, David Imseng, Thomas Meyer
Building resources for Algerian Arabic dialects
S. Harrat, K. Meftouh, M. Abbas, K. Smaili
Spoken language recognition based on senone posteriors
Luciana Ferrer, Yun Lei, Mitchell McLaren, Nicolas Scheffer
Automatic language identification using long short-term memory recurrent neural networks
Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Haşim Sak, Joaquin Gonzalez-Rodriguez, Pedro J. Moreno
Robust language recognition via adaptive language factor extraction
Brecht Desplanques, Kris Demuynck, Jean-Pierre Martens
Dialect levelling in Finnish: a universal speech attribute approach
Hamid Behravan, Ville Hautamäki, Sabato Marco Siniscalchi, Elie Khoury, Tommi Kurki, Tomi Kinnunen, Chin-Hui Lee
Improving native accent identification using deep neural networks
Mingming Chen, Zhanlei Yang, Hao Zheng, Wenju Liu
Foreign accent recognition based on temporal information contained in lowpass-filtered speech
Marie-José Kolly, Adrian Leemann, Volker Dellwo
Adaptation of deep neural network acoustic models using factorised i-vectors
Penny Karanasou, Yongqiang Wang, Mark J. F. Gales, Philip C. Woodland
Regularized feature-space discriminative adaptation for robust ASR
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura, Steven J. Rennie, Vaibhava Goel
Towards speaker adaptive training of deep neural network acoustic models
Yajie Miao, Hao Zhang, Florian Metze
Component structuring and trajectory modeling for speech recognition
Arseniy Gorin, Denis Jouvet
Speaker dependent bottleneck layer training for speaker adaptation in automatic speech recognition
Rama Doddipatla, Madina Hasan, Thomas Hain
Improving wideband acoustic models using mixed-bandwidth training data via DNN adaptation
Zhao You, Bo Xu
Speaker age estimation for elderly speech recognition in European Portuguese
Thomas Pellegrini, Vahid Hedayati, Isabel Trancoso, Annika Hämäläinen, Miguel Sales Dias
Unsupervised model selection for recognition of regional accented speech
Maryam Najafian, Andrea DeMarco, Stephen Cox, Martin Russell
Speaker adaptation based on sparse and low-rank eigenphone matrix estimation
Wen-Lin Zhang, Dan Qu, Wei-Qiang Zhang, Bi-Cheng Li
Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation
Yan Huang, Dong Yu, Chaojun Liu, Yifan Gong
A low complexity model adaptation approach involving sparse coding over multiple dictionaries
S. Shahnawazuddin, Rohit Sinha
Effect of frequency weighting on MLP-based speaker canonicalization
Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta
Feature space maximum a posteriori linear regression for adaptation of deep neural networks
Zhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Chao Weng, Chin-Hui Lee
Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing
Natalia Tomashenko, Yuri Khokhlov
BUT 2014 Babel system: analysis of adaptation in NN based systems
Martin Karafiát, František Grézl, Karel Veselý, Mirko Hannemann, Igor Szőke, Jan Černocký
Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers?
Mickael Rouvier, Benoit Favre
A sparse reconstruction method for speech source localization using partial dictionaries over a spherical microphone array
Kushagra Singhal, Rajesh M. Hegde
A robust TDOA estimation method for in-car-noise environments
Weiwei Cui, Jaeyeon Cho, Seungyeol Lee
Robust low-resource sound localization in correlated noise
Lorin Netsch, Jacek Stachurski
Direction-of-arrival estimation of multiple speakers using a planar array
Dongwen Ying, Ruohua Zhou, Junfeng Li, Jielin Pan, Yonghong Yan
Weighted spatial bispectrum correlation matrix for DOA estimation in the presence of interferences
Wei Xue, Shan Liang, Wenju Liu
Multi-sources separation for sound source localization
Mariem Bouafif, Zied Lachiri
Phone classification by a hierarchy of invariant representation layers
Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio
A semi-Markov model for speech segmentation with an utterance-break prior
Mark Sinclair, Peter Bell, Alexandra Birch, Fergus McInnes
Speech detection in transient noises
G. Aneeja, B. Yegnanarayana
Evaluation of dictionary for sparse coding in speech processing
Yongjun He, Guanglu Sun, Guibin Zheng, Jiqing Han
Joint filtering and factorization for recovering latent structure from noisy speech data
Colin Vaz, Vikram Ramanarayanan, Shrikanth S. Narayanan
A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis
A. Gallardo-Antolín, J. M. Montero, Simon King
Read and spontaneous speech classification based on variance of GMM supervectors
Taichi Asami, Ryo Masumura, Hirokazu Masataki, Sumitaka Sakauchi
Co-channel speech detection via spectral analysis of frequency modulated sub-bands
Navid Shokouhi, Seyed Omid Sadjadi, John H. L. Hansen
Word-level invariant representations from acoustic waveforms
Stephen Voinea, Chiyuan Zhang, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio
On closed form calculation of line spectral frequencies (LSF)
Paul Dalsgaard, Ove Andersen
Robust features for content-based audio copy detection
Chahid Ouali, Pierre Dumouchel, Vishwa Gupta
Binaural deep neural network classification for reverberant speech segregation
Yi Jiang, DeLiang Wang, RunSheng Liu
Query-by-example spoken term detection on multilingual unconstrained speech
Xavier Anguera, Luis Javier Rodriguez-Fuentes, Igor Szőke, Andi Buzo, Florian Metze, Mikel Penagarikano
A comparison of multiple methods for rescoring keyword search lists for low resource languages
Victor Soto, Lidia Mangu, Andrew Rosenberg, Julia Hirschberg
Subword and phonetic search for detecting out-of-vocabulary keywords
Damianos Karakos, Richard Schwartz
An in-depth comparison of keyword specific thresholding and sum-to-one score normalization
Yun Wang, Florian Metze
Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages
Hung-yi Lee, Yu Zhang, Ekapol Chuangsuwanich, James R. Glass
Developing STT and KWS systems using limited language resources
Viet-Bac Le, Lori Lamel, Abdel Messaoudi, William Hartmann, Jean-Luc Gauvain, Cécile Woehrling, Julien Despres, Anindya Roy
Comparing decoding strategies for subword-based keyword spotting in low-resourced languages
William Hartmann, Viet-Bac Le, Abdel Messaoudi, Lori Lamel, Jean-Luc Gauvain
Strategies for rescoring keyword search results using word-burst and acoustic features
Min Ma, Justin Richards, Victor Soto, Julia Hirschberg, Andrew Rosenberg
Word-based probabilistic phonetic retrieval for low-resource spoken term detection
Di Xu, Florian Metze
A keyword-boosted sMBR criterion to enhance keyword search performance in deep neural network based acoustic modeling
I-Fan Chen, Nancy F. Chen, Chin-Hui Lee
Combination of FST and CN search in spoken term detection
Justin Chiu, Yun Wang, Jan Trmal, Daniel Povey, Guoguo Chen, Alexander I. Rudnicky
Low-resource open vocabulary keyword search using point process models
Chunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal, Sanjeev Khudanpur
GMM-based bandwidth extension using sub-band basis spectrum model
Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Masami Akamine
A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speech
Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
A comparative study of spectral transformation techniques for singing voice synthesis
S. W. Lee, Zhizheng Wu, Minghui Dong, Xiaohai Tian, Haizhou Li
Application of matrix variate Gaussian mixture model to statistical voice conversion
Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose
Joint nonnegative matrix factorization for exemplar-based voice conversion
Zhizheng Wu, Eng Siong Chng, Haizhou Li
Statistical singing voice conversion with direct waveform modification based on the spectrum differential
Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
Detecting proximity from personal audio recordings
Daniel P. W. Ellis, Hiroyuki Satoh, Zhuo Chen
Acoustic event detection and localization with regression forests
Huy Phan, Marco Maaß, Radoslaw Mazur, Alfred Mertins
Multi-source posteriors for speech activity detection on public talks
Marc Ferràs, Hervé Bourlard
Analysis of spectrogram image methods for sound event classification
Jonathan Dennis, Huy Dat Tran, Eng Siong Chng
Speech-based automatic and robust detection of very early dementia
Aharon Satt, Ron Hoory, Alexandra König, Pauline Aalten, Philippe H. Robert
On the acoustic environment of a neonatal intensive care unit: initial description, and detection of equipment alarms
Ganna Raboshchuk, Climent Nadeu, Omid Ghahabi, Sergi Solvez, Blanca Muñoz Mahamud, Ana Riverola de Veciana, Santiago Navarro Hervas
Non-native perception of regionally accented speech in a multitalker context
Robert Allen Fox, Ewa Jacewicz, Florence Hardjono
A crosslinguistic and acquisitional perspective on intonational rises in French
Giuseppina Turco, Elisabeth Delais-Roussarie
Error patterns of Mandarin disyllabic tones by Japanese learners
Jung-Yueh Tu, Yuwen Hsiung, Min-Da Wu, Yao-Ting Sung
Infant-directed speech enhances temporal rhythmic structure in the envelope
Victoria Leong, Marina Kalashnikova, Denis Burnham, Usha Goswami
Influences of tone sandhi on word recognition in preschool children
Dilu Wewalaarachchi, Leher Singh
Lexical representation of consonant, vowels and tones in early childhood
Hwee Hwee Goh, Charlene Hu, Kheng Hui Yeo, Leher Singh
Audiovisual temporal sensitivity in typical and dyslexic adult readers
Ana A. Francisco, Alexandra Jesse, Margriet A. Groen, James M. McQueen
Aero-tactile integration in fricatives: converting audio to air flow information for speech perception enhancement
Donald Derrick, Greg A. O'Beirne, Tom De Rybel, Jennifer Hay
Relative importance of AM and FM cues for speech comprehension: effects of speaking rate and their implications for neurophysiological processing of speech
Guangting Mai
The effect of regional and non-native accents on word recognition processes: a comparison of EEG responses in quiet to speech recognition in noise
Louise Stringer, Paul Iverson
Towards a neural measure of perceptual distance — classification of electroencephalographic responses to synthetic vowels
Manson C. -M. Fong, James W. Minett, Thierry Blu, William S. -Y. Wang
Collecting a corpus of Dutch noise-induced `slips of the ear'
Odette Scharenborg, Eric Sanders, Bert Cranen
Lexical modeling for Arabic ASR: a systematic approach
Tuka Al Hanai, James R. Glass
Hybrid language models for speech transcription
Luiza Orosanu, Denis Jouvet
Neural network language models for low resource languages
Ankur Gandhe, Florian Metze, Ian Lane
Feed forward pre-training for recurrent neural network language models
Siva Reddy Gangireddy, Fergus McInnes, Steve Renals
Grounding language models in spatiotemporal context
Brandon C. Roy, Soroush Vosoughi, Deb Roy
Direct word graph rescoring using a* search and RNNLM
Shahab Jalalvand, Daniele Falavigna
One billion word benchmark for measuring progress in statistical language modeling
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson
Integrating sequence information in the audio-visual detection of word prominence in a human-machine interaction scenario
Andrea Schnall, Martin Heckmann
Backoff inspired features for maximum entropy language models
Fadi Biadsy, Keith Hall, Pedro J. Moreno, Brian Roark
BioKIT — real-time decoder for biosignal processing
Dominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, Christoph Amma, Dominic Heger, Ngoc Thang Vu, Mark Erhardt, Tim Schlippe, Matthias Janke, Christian Herff, Tanja Schultz
Speech recognition without a lexicon — bridging the gap between graphemic and phonetic systems
David Harwath, James R. Glass
A new auxiliary-vector algorithm with conjugate orthogonality for speech enhancement
Shengkui Zhao, Douglas L. Jones
Acoustic characteristics of critical message utterances in noise applied to speech intelligibility enhancement
Neehar Jathar, Preeti Rao
Dynamic noise aware training for speech enhancement based on deep neural networks
Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee
Microphone array post-filtering using supervised machine learning for speech enhancement
Pasi Pertilä, Joonas Nikunen
Novel speech duration modifier for packet based communication system
Senthil Kumar Mani, Jitendra Kumar Dhiman, K. Sri Rama Murty
Experiments on deep learning for speech denoising
Ding Liu, Paris Smaragdis, Minje Kim
Single-channel dynamic exemplar-based speech enhancement
Nasser Mohammadiha, Simon Doclo
Using hidden Markov models for speech enhancement
Akihiro Kato, Ben Milner
Blind source extraction based on a direction-dependent a-priori SNR
Lukas Pfeifenberger, Franz Pernkopf
Least squares phase estimation of mixed signals
Carlos Eduardo Cancino Chacón, Pejman Mowlaee
Speech enhancement from additive noise and channel distortion — a corpus-based approach
Ji Ming, Danny Crookes
Multi-channel speech enhancement using sparse coding on local time-frequency structures
Zhiyuan Zhou, Zhaogui Ding, Weifeng Li, Zhiyong Wu, Longbiao Wang, Qingmin Liao
Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications
Seyedmahdad Mirsamadi, John H. L. Hansen
Speech enhancement by low-rank and convolutive dictionary spectrogram decomposition
Zhuo Chen, Brian McFee, Daniel P. W. Ellis
Multiple-order non-negative matrix factorization for speech enhancement
Xabier Jaureguiberry, Emmanuel Vincent, Gaël Richard
NMF-based speech enhancement incorporating deep neural network
Tae Gyoon Kang, Kisoo Kwon, Jong Won Shin, Nam Soo Kim
A data-driven approach to speech enhancement using Gaussian process
Sukanya Sonowal, Kisoo Kwon, Nam Soo Kim, Jong Won Shin
Decorrelated innovative codebooks for ACELP using factorization of autocorrelation matrix
Tom Bäckström, Christian R. Helmrich
Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding
Milos Cernak, Alexandros Lazaridis, Philip N. Garner, Petr Motlicek
Subjective voice quality evaluation of artificial bandwidth extension: comparing different audio bandwidths and speech codecs
Hannu Pulakka, Anssi Rämö, Ville Myllylä, Henri Toukomaa, Paavo Alku
Stereo acoustic echo suppression using widely linear filtering in the frequency domain
Zhong-Hua Fu, Lei Xie
Enhanced muting method in packet loss concealment of ITU-t g.722 using sigmoid function with on-line optimized parameters
Bong-Ki Lee, Inyoung Hwang, Jihwan Park, Joon-Hyuk Chang
A robust step-size control algorithm for frequency domain acoustic echo cancellation
Chao Wu, Kaiyu Jiang, Yanmeng Guo, Qiang Fu, Yonghong Yan
Error correction of automatic speech recognition based on normalized web distance
E. Byambakhishig, K. Tanaka, Ryo Aihara, Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki
Unsupervised training methods for discriminative language modeling
Erinç Dikici, Murat Saraçlar
Building a vocabulary self-learning speech recognition system
Long Qin, Alexander I. Rudnicky
Methods for efficient semi-automatic pronunciation dictionary bootstrapping
Tim Schlippe, Matthias Merz, Tanja Schultz
Rapidly building domain-specific entity-centric language models using semantic web knowledge sources
Murat Akbacak, Dilek Hakkani-Tür, Gokhan Tur
Context-dependent pronunciation error pattern discovery with limited annotations
Ann Lee, James R. Glass
Detecting speaker roles and topic changes in multiparty conversations using latent topic models
Ashtosh Sapru, Hervé Bourlard
A deep neural network approach for sentence boundary detection in broadcast news
Chenglin Xu, Lei Xie, Guangpu Huang, Xiong Xiao, Eng Siong Chng, Haizhou Li
Variable Span disfluency detection in ASR transcripts
Rahul Gupta, Sankaranarayanan Ananthakrishnan, Zhaojun Yang, Shrikanth S. Narayanan
A CRF-based approach to automatic disfluency detection in a French call-centre corpus
Camille Dutrey, Chloé Clavel, Sophie Rosset, Ioana Vasilescu, Martine Adda-Decker
Multi-pass sentence-end detection of lecture speech
Madina Hasan, Rama Doddipatla, Thomas Hain
Multi-domain disfluency and repair detection
Victoria Zayats, Mari Ostendorf, Hannaneh Hajishirzi
Task-aware deep bottleneck features for spoken language identification
Bing Jiang, Yan Song, Si Wei, Ian Vince McLoughlin, Li-Rong Dai
Virtual example for phonotactic language recognition
Rong Tong, Bin Ma, Haizhou Li
Phonotactic language recognition based on time-gap-weighted lattice kernels
Wei-Wei Liu, Wei-Qiang Zhang, Jia Liu
UBM fused total variability modeling for language identification
Maarten van Segbroeck, Ruchir Travadi, Shrikanth S. Narayanan
On the complementarity of short-time fourier analysis windows of different lengths for improved language recognition
Mireia Diez, Mikel Penagarikano, German Bordel, Amparo Varona, Luis Javier Rodriguez-Fuentes
Modified-prior i-vector estimation for language identification of short duration utterances
Ruchir Travadi, Maarten Van Segbroeck, Shrikanth S. Narayanan
Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers
Luis Fernando D'Haro, Ricardo Cordoba, Christian Salamea, Javier Ferreiros
PLLR features in language recognition system for RATS
Oldřich Plchot, Mireia Diez, Mehdi Soufifar, Lukáš Burget
Language identification of code Switching sentences and multilingual sentences of under-resourced languages by using multi structural word information
Yin-Lai Yeong, Tien-Ping Tan
Article |
---|