doi: 10.21437/Interspeech.2009
ISSN: 2958-1796
Selected topics from 40 years of research on speech and speaker recognition
Sadaoki Furui
Connecting human and machine learning via probabilistic models of cognition
Thomas L. Griffiths
New horizons in the study of child language acquisition
Deb Roy
Transcribing human-directed speech for spoken language processing
Mari Ostendorf
Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction
Chanwoo Kim, Richard M. Stern
Towards fusion of feature extraction and acoustic model training: a top down process for robust speech recognition
Yu-Hsiang Bosco Chiu, Bhiksha Raj, Richard M. Stern
Temporal modulation processing of speech signals for noise robust ASR
Hong You, Abeer Alwan
Progressive memory-based parametric non-linear feature equalization
Luz Garcia, Roberto Gemello, Franco Mana, Jose Carlos Segura
Dynamic features in the linear domain for robust automatic speech recognition in a reverberant environment
Osamu Ichikawa, Takashi Fukuda, Ryuki Tachibana, Masafumi Nishimura
Local projections and support vector based feature selection in speech recognition
Antonio Miguel, Alfonso Ortega, L. Buera, Eduardo Lleida
Feedforward control of a 3d physiological articulatory model for vowel production
Qiang Fang, Akikazu Nishikido, Jianwu Dang, Aijun Li
Articulatory modeling based on semi-polar coordinates and guided PCA technique
Jun Cai, Yves Laprie, Julie Busset, Fabrice Hirsch
Sequencing of articulatory gestures using cost optimization
Juraj Simko, Fred Cummins
From experiments to articulatory motion - a three dimensional talking head model
Xiao Bo Lu, William Thorpe, Kylie Foster, Peter Hunter
Towards robust glottal source modeling
Javier Pérez, Antonio Bonafonte
Sliding vocal-tract model and its application for vowel production
Takayuki Arai
Minimum hypothesis phone error as a decoding method for speech recognition
Haihua Xu, Daniel Povey, Jie Zhu, Guanyong Wu
Posterior-based out of vocabulary word detection in telephone speech
Stefan Kombrink, Lukáš Burget, Pavel Matějka, Martin Karafiát, Hynek Hermansky
Automatic transcription system for meetings of the Japanese national congress
Yuya Akita, Masato Mimura, Tatsuya Kawahara
Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system
Jonas Lööf, Christian Gollan, Hermann Ney
Porting an european portuguese broadcast news recognition system to brazilian portuguese
Alberto Abad, Isabel Trancoso, Nelson Neto, M. Céu Viana
Modeling northern and southern varieties of dutch for STT
Julien Despres, Petr Fousek, Jean-Luc Gauvain, Sandrine Gay, Yvan Josse, Lori Lamel, Abdel Messaoudi
Nearly perfect detection of continuous f_0 contour and frame classification for TTS synthesis
Thomas Ewender, Sarah Hoffmann, Beat Pfister
AM-FM estimation for speech based on a time-varying sinusoidal model
Yannis Pantazis, Olivier Rosec, Yannis Stylianou
Voice source waveform analysis and synthesis using principal component analysis and Gaussian mixture modelling
Jon Gudnason, Mark R. P. Thomas, Patrick A. Naylor, Dan P. W. Ellis
Model-based estimation of instantaneous pitch in noisy speech
Jung Ook Hong, Patrick J. Wolfe
Complex cepstrum-based decomposition of speech for glottal source estimation
Thomas Drugman, Baris Bozkurt, Thierry Dutoit
Approximate intrinsic fourier analysis of speech
Frank Tompkins, Patrick J. Wolfe
Spectral and temporal modulation features for phonetic recognition
Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu
Use of harmonic phase information for polarity detection in speech signals
Ibon Saratxaga, Daniel Erro, Inmaculada Hernáez, Iñaki Sainz, Eva Navas
Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model
Michael Wohlmayr, Franz Pernkopf
Group-delay-deviation based spectral analysis of speech
Anthony Stark, Kuldip Paliwal
Speaker dependent mapping for low bit rate coding of throat microphone speech
Joseph M. Anand, B. Yegnanarayana, Sanjeev Gupta, M. R. Kesheorey
Analysis of Lombard speech using excitation source information
G. Bapineedu, B. Avinash, Suryakanth V. Gangashetty, B. Yegnanarayana
A comparison of linear and nonlinear dimensionality reduction methods applied to synthetic speech
Andrew Errity, John McKenna
ZZT-domain immiscibility of the opening and closing phases of the LF GFM under frame length variations
C. F. Pedersen, O. Andersen, P. Dalsgaard
Dimension reducing of LSF parameters based on radial basis function neural network
Hongjun Sun, Jianhua Tao, Huibin Jia
Characterizing speaker variability using spectral envelopes of vowel sounds
A. N. Harish, D. R. Sanand, S. Umesh
Analysis of band structures for speaker-specific information in FM feature extraction
Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Julien Epps
Artificial nasalization of speech sounds based on pole-zero models of spectral relations between mouth and nose signals
Karl Schnell, Arild Lacroix
Error metrics for impaired auditory nerve responses of different phoneme groups
Andrew Hines, Naomi Harte
Model-based automatic evaluation of L2 learner's English timing
Chatchawarn Hansakunbuntheung, Hiroaki Kato, Yoshinori Sagisaka
A Bayesian approach to non-intrusive quality assessment of speech
Petko N. Petkov, Iman S. Mossavat, W. Bastiaan Kleijn
Precision of phoneme boundaries derived using hidden Markov models
Ladan Baghai-Ravary, Greg Kochanski, John Coleman
A novel method for epoch extraction from speech signals
Lakshmish Kaushik, Douglas O'Shaughnessy
LS regularization of group delay features for speaker recognition
Jia Min Karen Kua, Julien Epps, Eliathamby Ambikairajah, Eric Choi
Glottal closure and opening instant detection from speech signals
Thomas Drugman, Thierry Dutoit
Relative importance of formant and whole-spectral cues for vowel perception
Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano
Influences of vowel duration on speaker-size estimation and discrimination
Chihiro Takeshima, Minoru Tsuzaki, Toshio Irino
High front vowels in Czech: a contrast in quantity or quality?
Václav Jonáš Podlipský, Radek Skarnitzl, Jan Volín
Effect of contralateral noise on energetic and informational masking on speech-in-speech intelligibility
Marjorie Dole, Michel Hoen, Fanny Meunier
Using location cues to track speaker changes from mobile, binaural microphones
Heidi Christensen, Jon Barker
A perceptual investigation of speech transcription errors involving frequent near-homophones in French and american English
Ioana Vasilescu, Martine Adda-Decker, Lori Lamel, Pierre Hallé
The role of glottal pulse rate and vocal tract length in the perception of speaker identity
Etienne Gaudrain, Su Li, Vin Shen Ban, Roy D. Patterson
Development of voicing categorization in deaf children with cochlear implant
Victoria Medina, Willy Serniclaes
Processing liaison-initial words in native and non-native French: evidence from eye movements
Annie Tremblay
Estimating the potential of signal and interlocutor-track information for language modeling
Nigel G. Ward, Benjamin H. Walker
Effect of r-resonance information on intelligibility
Antje Heinrich, Sarah Hawkins
Perception of temporal cues at discourse boundaries
Hsin-Yi Lin, Janice Fon
Human audio-visual consonant recognition analyzed with three bimodal integration models
Zhanyu Ma, Arne Leijon
Effects of tempo in radio commercials on young and elderly listeners
Hanny den Ouden, Hugo Quené
Self-voice recognition in 4 to 5-year-old children
Sofia Strömbergsson
Are real tongue movements easier to speech read than synthesized?
Olov Engwall, Preben Wik
Eliciting a hierarchical structure of human consonant perception task errors using formal concept analysis
Carmen Peláez-Moreno, Ana I. García-Moral, Francisco J. Valverde-Albacete
Acoustic and perceptual effects of vocal training in amateur male singing
Takeshi Saitou, Masataka Goto
Factor analysis and SVM for language recognition
Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert
Exploring universal attribute characterization of spoken languages for spoken language recognition
Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, Chin-Hui Lee
On the use of phonological features for automatic accent analysis
Abhijeet Sangwan, John H. L. Hansen
Language recognition using language factors
Fabio Castaldo, Sandro Cumani, Pietro Laface, Daniele Colibro
Automatic accent detection: effect of base units and boundary information
Je Hun Jeon, Yang Liu
Age verification using a hybrid speech processing approach
Ron M. Hecht, Omer Hezroni, Amit Manna, Ruth Aloni-Lavi, Gil Dobry, Amir Alfandary, Yaniv Zigel
Information bottleneck based age verification
Ron M. Hecht, Omer Hezroni, Amit Manna, Gil Dobry, Yaniv Zigel, Naftali Tishby
Discriminative n-gram selection for dialect recognition
F. S. Richardson, W. M. Campbell, P. A. Torres-Carrasquillo
Data-driven phonetic comparison and conversion between south african, british and american English pronunciations
Linsen Loots, Thomas Niesler
Target-aware language models for spoken language recognition
Rong Tong, Bin Ma, Haizhou Li, Eng Siong Chng, Kong-Aik Lee
Language identification for speech-to-speech translation
Daniel Chung Yong Lim, Ian Lane
Using prosody and phonotactics in Arabic dialect identification
Fadi Biadsy, Julia Hirschberg
Refactoring acoustic models using variational expectation-maximization
Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen
Investigations on convex optimization using log-linear HMMs for digit string recognition
Georg Heigold, David Rybach, Ralf Schlüter, Hermann Ney
Investigations on discriminative training in large scale acoustic model estimation
Janne Pylkkönen
Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training
Erik McDermott, Shinji Watanabe, Atsushi Nakamura
Compacting discriminative feature space transforms for embedded devices
Etienne Marcheret, Jia-Yu Chen, Petr Fousek, Peder A. Olsen, Vaibhava Goel
A back-off discriminative acoustic model for automatic speech recognition
Hung-An Chang, James R. Glass
Efficient generation and use of MLP features for Arabic speech recognition
J. Park, F. Diehl, M. J. F. Gales, M. Tomalin, P. C. Woodland
A study of bootstrapping with multiple acoustic features for improved automatic speech recognition
Xiaodong Cui, Jian Xue, Bing Xiang, Bowen Zhou
Analysis of low-resource acoustic model self-training
Scott Novotney, Richard Schwartz
Log-linear model combination with word-dependent scaling factors
Björn Hoffmeister, Ruoying Liang, Ralf Schlüter, Hermann Ney
Enabling a user to specify an item at any time during system enumeration - item identification for barge-in-able conversational dialogue systems
Kyoko Matsuyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
System request detection in human conversation based on multi-resolution Gabor wavelet features
Tomoyuki Yamagata, Tetsuya Takiguchi, Yasuo Ariki
Using graphical models for mixed-initiative dialog management systems with realtime Policies
Stefan Schwärzler, Stefan Maier, Joachim Schenk, Frank Wallhoff, Gerhard Rigoll
Conversation robot participating in and activating a group communication
Shinya Fujie, Yoichi Matsuyama, Hikaru Taniyama, Tetsunori Kobayashi
Recent advances in WFST-based dialog system
Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura
A statistical dialog manager for the LUNA project
David Griol, Giuseppe Riccardi, Emilio Sanchis
A Policy-switching learning approach for adaptive spoken dialogue agents
Heriberto Cuayáhuitl, Juventino Montiel-Hernández
Strategies for accelerating the design of dialogue applications using heuristic information from the backend database
L. F. D'Haro, R. Cordoba, R. San-Segundo, J. Macias-Guarasa, J. M. Pardo
Feature-based summary space for stochastic dialogue modeling with hierarchical semantic frames
Florian Pinault, Fabrice Lefèvre, Renato De Mori
Language modeling and dialog management for address recognition
Rajesh Balchandran, Leonid Rachevsky, Larry Sansone
A framework for rapid development of conversational natural language call routing systems for call centers
Ea-Ee Jan, Hong-Kwang Kuo, Osamuyimen Stewart, David Lubensky
The MonAMI reminder: a spoken dialogue system for face-to-face interaction
Jonas Beskow, Jens Edlund, Björn Granström, Joakim Gustafson, Gabriel Skantze, Helena Tobiasson
Influence of training on direct and indirect measures for the evaluation of multimodal systems
Julia Seebode, Stefan Schaffer, Ina Wechsung, Florian Metze
Talking heads for interacting with spoken dialog smart-home systems
Christine Kühnel, Benjamin Weiss, Sebastian Möller
Speech generation from hand gestures based on space mapping
Aki Kunikoshi, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose
The INTERSPEECH 2009 emotion challenge
Björn Schuller, Stefan Steidl, Anton Batliner
GTM-URL contribution to the INTERSPEECH 2009 emotion challenge
Santiago Planet, Ignasi Iriondo, Joan Claudi Socoró, Carlos Monzo, Jordi Adell
Emotion recognition using a hierarchical binary decision tree approach
Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan
Improving automatic emotion recognition from speech signals
Elif Bozkurt, Engin Erzin, Çiǧdem Eroǧlu Erdem, A. Tanju Erdem
Exploring the benefits of discretization of acoustic features for speech emotion recognition
Thurid Vogt, Elisabeth André
Combining spectral and prosodic information for emotion recognition in the interspeech 2009 emotion challenge
Iker Luengo, Eva Navas, Inmaculada Hernáez
Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions
R. Barra-Chicote, Fernando Fernández, S. Lutfi, Juan Manuel Lucas-Cuesta, J. Macias-Guarasa, J. M. Montero, R. San-Segundo, J. M. Pardo
Emotion classification in children's speech using fusion of acoustic and linguistic features
Tim Polzehl, Shiva Sundaram, Hamed Ketabdar, Michael Wagner, Florian Metze
Cepstral and long-term features for emotion recognition
Pierre Dumouchel, Najim Dehak, Yazid Attabi, Réda Dehak, Narjès Boufaden
Brno University of Technology system for Interspeech 2009 emotion challenge
Marcel Kockmann, Lukáš Burget, Jan Černocký
Back-off language model compression
Boulos Harb, Ciprian Chelba, Jeffrey Dean, Sanjay Ghemawat
Improving broadcast news transcription with a precision grammar and discriminative reranking
Tobias Kaufmann, Thomas Ewender, Beat Pfister
Use of contexts in language model interpolation and adaptation
X. Liu, M. J. F. Gales, P. C. Woodland
Exploiting Chinese character models to improve speech recognition performance
J. L. Hieronymus, X. Liu, M. J. F. Gales, P. C. Woodland
Constraint selection for topic-based MDI adaptation of language models
Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot
Nonstationary latent Dirichlet allocation for speech recognition
Chuang-Hua Chueh, Jen-Tzung Chien
Multiple text segmentation for statistical language modeling
Sopheap Seng, Laurent Besacier, Brigitte Bigi, Eric Castelli
Measuring tagging performance of a joint language model
Denis Filimonov, Mary Harper
Improved language modelling using bag of word pairs
Langzhou Chen, K. K. Chin, Kate Knill
Morphological analysis and decomposition for Arabic speech-to-text systems
F. Diehl, M. J. F. Gales, M. Tomalin, P. C. Woodland
Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR
Amr El-Desoky, Christian Gollan, David Rybach, Ralf Schlüter, Hermann Ney
Topic dependent language model based on topic voting on noun history
Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa
Investigation of morph-based speech recognition improvements across speech genres
Péter Mihajlik, Balázs Tarján, Zoltán Tüske, Tibor Fegyó
Effective use of pause information in language modelling for speech recognition
Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa
A parallel training algorithm for hierarchical pitman-yor process language models
Songfang Huang, Steve Renals
Probabilistic and possibilistic language models based on the world wide web
Stanislas Oger, Vladimir Popescu, Georges Linarès
Categorical perception of speech without stimulus repetition
Jack C. Rogers, Matthew H. Davis
Non-automaticity of use of orthographic knowledge in phoneme evaluation
Anne Cutler, Chris Davis, Jeesun Kim
Learning and generalization of novel contrastive cues
Meghan Sumner
Vowel category perception affected by microdurational variations
Einar Meister, Stefan Werner
Perceptual grouping of alternating word pairs: effect of pitch difference and presentation rate
Nandini Iyer, Douglas S. Brungart, Brian D. Simpson
Comparing methods to find a best exemplar in a multidimensional space
Titia Benders, Paul Boersma
Autoregressive HMMs for speech synthesis
Matt Shannon, William Byrne
Asynchronous F0 and spectrum modeling for HMM-based speech synthesis
Cheng-Cheng Wang, Zhen-Hua Ling, Li-Rong Dai
A minimum v/u error approach to F0 generation in HMM-based TTS
Yao Qian, Frank K. Soong, Miaomiao Wang, Zhizheng Wu
Voiced/unvoiced decision algorithm for HMM-based speech synthesis
Shiyin Kang, Zhiwei Shuang, Quansheng Duan, Yong Qin, Lianhong Cai
Local minimum generation error criterion for hybrid HMM speech synthesis
Xavi Gonzalvo, Alexander Gutkin, Joan Claudi Socoró, Ignasi Iriondo, Paul Taylor
Thousands of voices for HMM-based speech synthesis
Junichi Yamagishi, Bela Usabaev, Simon King, Oliver Watts, John Dines, Jilei Tian, Rile Hu, Yong Guan, Keiichiro Oura, Keiichi Tokuda, Reima Karhila, Mikko Kurimo
A Bayesian approach to Hidden Semi-Markov Model based speech synthesis
Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Rich context modeling for high quality HMM-based TTS
Zhi-Jie Yan, Yao Qian, Frank K. Soong
Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
The HMM synthesis algorithm of an embedded unified speech recognizer and synthesizer
Guntram Strecha, Matthias Wolff, Frank Duckhorn, Sören Wittenberg, Constanze Tschöpe
Syllable HMM based Mandarin TTS and comparison with concatenative TTS
Zhiwei Shuang, Shiyin Kang, Qin Shi, Yong Qin, Lianhong Cai
Pulse density representation of spectrum for statistical speech processing
Yoshinori Shiga
Parameterization of vocal fry in HMM-based speech synthesis
Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj
A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis
Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit
A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis
Ranniery Maia, Tomoki Toda, Keiichi Tokuda, Shinsuke Sakai, Satoshi Nakamura
An improved minimum generation error based model adaptation for HMM-based speech synthesis
Yi-Jian Wu, Long Qin, Keiichi Tokuda
Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models
Matthew Gibson
Speaker adaptation using a parallel phone set pronunciation dictionary for Thai-English bilingual TTS
Anocha Rugchatjaroen, Nattanun Thatphithakkul, Ananlada Chotimongkol, Ausdang Thangthai, Chai Wutiwiwatchai
HMM-based automatic eye-blink synthesis from speech
Michal Dziemianko, Gregor Hofer, Hiroshi Shimodaira
Efficient combination of confidence measures for machine translation
Sylvain Raybaud, David Langlois, Kamel Smaïli
Incremental dialog clustering for speech-to-speech translation
David Stallard, Stavros Tsakalidis, Shirin Saleem
Iterative sentence-pair extraction from quasi-parallel corpora for machine translation
R. Sarikaya, Sameer Maskey, R. Zhang, Ea-Ee Jan, D. Wang, Bhuvana Ramabhadran, S. Roukos
RTTS: towards enterprise-level real-time speech transcription and translation services
Juan M. Huerta, Cheng Wu, Andrej Sakrajda, Sasha Caskey, Ea-Ee Jan, Alexander Faisman, Shai Ben-David, Wen Liu, Antonio Lee, Osamuyimen Stewart, Michael Frissora, David Lubensky
Using syntax in large-scale audio document translation
Jing Zheng, Necip Fazil Ayan, Wen Wang, David Burkett
Context-driven automatic bilingual movie subtitle alignment
Andreas Tsiartas, Prasanta Kumar Ghosh, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Probabilistic effects on French [t] duration
Francisco Torreira, Mirjam Ernestus
On the production of sandhi phenomena in French: psycholinguistic and acoustic data
Odile Bagou, Violaine Michel, Marina Laganaro
Extreme reductions: contraction of disyllables into monosyllables in taiwan Mandarin
Chierh Cheng, Yi Xu
Annotation and features of non-native Mandarin tone quality
Mitchell Peabody, Stephanie Seneff
On-line formant shifting as a function of F0
Kateřina Chládková, Paul Boersma, Václav Jonáš Podlipský
Production boundary between fricative and affricate in Japanese and Korean speakers
Kimiko Yamakawa, Shigeaki Amano, Shuichi Itahashi
Aerodynamics of fricative production in european portuguese
Cátia M. R. Pinho, Luis M. T. Jesus, Anna Barney
Contextual effects on protrusion and lip opening for /i,y/
Anne Bonneau, Julie Buquet, Brigitte Wrobel-Dautcourt
Speech rate effects on european portuguese nasal vowels
Catarina Oliveira, Paula Martins, António Teixeira
Relation of formants and subglottal resonances in Hungarian vowels
Tamás Gábor Csapó, Zsuzsanna Bárkányi, Tekla Etelka Gráczi, Tamás Bőhm, Steven M. Lulich
Simple physical models of the vocal tract for education in speech science
Takayuki Arai
Auto-meshing algorithm for acoustic analysis of vocal tract
Kyohei Hayashi, Nobuhiro Miki
Voice production model employing an interactive boundary-layer analysis of glottal flow
Tokihiko Kaburagi, Katsunori Daimo, Shogo Nakamura
Characteristics of two-dimensional finite difference techniques for vocal tract analysis and voice synthesis
Matt Speed, Damian Murphy, David M. Howard
Adaptation of a predictive model of tongue shapes
Chao Qin, Miguel Á. Carreira-Perpiñán
Using sensor orientation information for computational head stabilisation in 3d electromagnetic articulography (EMA)
Christian Kroos
Collision threshold pressure before and after vocal loading
Laura Enflo, Johan Sundberg, Friedemann Pabst
Gender differences in the realization of vowel-initial glottalization
Elke Philburn
Stability and composition of functional synergies for speech movements in children and adults
Hayo Terband, Frits van Brenk, Pascal van Lieshout, Lian Nijland, Ben Maassen
An analysis of speech rate strategies in aging
Frits van Brenk, Hayo Terband, Pascal van Lieshout, Anja Lowit, Ben Maassen
Variability and stability in collaborative dialogues: turn-taking and filled pauses
Štefan Beňuš
Speaking in the presence of a competing talker
Youyi Lu, Martin Cooke
Polyglot speech prosody control
Harald Romsdorfer
Weighted neural network ensemble models for speech prosody control
Harald Romsdorfer
Cross-language F0 modeling for under-resourced tonal languages: a case study on Thai-Mandarin
Vataya Boonpiam, Anocha Rugchatjaroen, Chai Wutiwiwatchai
Prosodic issues in synthesising thadou, a tibeto-burman tone language
Dafydd Gibbon, Pramod Pandey, D. Mary Kim Haokip, Jolanta Bachan
Advanced unsupervised joint prosody labeling and modeling for Mandarin speech and its application to prosody generation for TTS
Chen-Yu Chiang, Sin-Horng Chen, Yih-Ru Wang
Optimization of t-tilt F0 modeling
Ausdang Thangthai, Anocha Rugchatjaroen, Nattanun Thatphithakkul, Ananlada Chotimongkol, Chai Wutiwiwatchai
A multi-level context-dependent prosodic model applied to durational modeling
Nicolas Obin, Xavier Rodet, Anne Lacheret-Dujour
Sentiment classification in English from sentence-level annotations of emotions regarding models of affect
Alexandre Trilla, Francesc Alías
Identification of contrast and its emphatic realization in HMM based speech synthesis
Leonardo Badino, J. Sebastian Andersson, Junichi Yamagishi, Robert A. J. Clark
How to improve TTS systems for emotional expressivity
Antonio Rui Ferreira Rebordao, Mostafa Al Masum Shaikh, Keikichi Hirose, Nobuaki Minematsu
State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
Yi-Jian Wu, Yoshihiko Nankaku, Keiichi Tokuda
Real voice and TTS accent effects on intelligibility and comprehension for indian speakers of English as a second language
Frederick Weber, Kalika Bali
Improving consistence of phonetic transcription for text-to-speech
Pablo Daniel Agüero, Antonio Bonafonte, Juan Carlos Tulli
On the development of matched and mismatched Italian children's speech recognition systems
Piero Cosi
Combination of acoustic and lexical speaker adaptation for disordered speech recognition
Oscar Saz, Eduardo Lleida, Antonio Miguel
Bilinear transformation space-based maximum likelihood linear regression frameworks
Hwa Jeon Song, Yongwon Jeong, Hyung Soon Kim
Speaking style adaptation for spontaneous speech recognition using multiple-regression HMM
Yusuke Ijima, Takeshi Matsubara, Takashi Nose, Takao Kobayashi
Acoustic class specific VTLN-warping using regression class trees
S. P. Rath, S. Umesh
Speaker normalization for template based speech recognition
Sébastien Demange, Dirk Van Compernolle
Improving the robustness with multiple sets of HMMs
Hans-Günter Hirsch, Andreas Kitzig
On the use of pitch normalization for improving children's speech recognition
Rohit Sinha, Shweta Ghai
Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors
S. P. Rath, S. Umesh, A. K. Sarkar
Speaker adaptation based on two-step active learning
Koichi Shinoda, Hiroko Murakami, Sadaoki Furui
Tree-based estimation of speaker characteristics for speech recognition
Mats Blomberg, Daniel Elenius
A study on the influence of covariance adaptation on jacobian compensation in vocal tract length normalization
D. R. Sanand, S. P. Rath, S. Umesh
On the estimation and the use of confusion-matrices for improving ASR accuracy
Omar Caballero Morales, Stephen J. Cox
A study on soft margin estimation of linear regression parameters for speaker adaptation
Shigeki Matsuda, Yu Tsao, Jinyu Li, Satoshi Nakamura, Chin-Hui Lee
Exploring the role of spectral smoothing in context of children's speech recognition
Shweta Ghai, Rohit Sinha
Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription
K. Thambiratnam, F. Seide
Rapid unsupervised adaptation using frame independent output probabilities of gender and context independent phoneme models
Satoshi Kobashikawa, Atsunori Ogawa, Yoshikazu Yamaguchi, Satoshi Takahashi
Bark-shift based nonlinear speaker normalization using the second subglottal resonance
Shizhen Wang, Yi-Hui Lee, Abeer Alwan
Designing spoken tutorial dialogue with children to elicit predictable but educationally valuable responses
Gregory Aist, Jack Mostow
Optimizing non-native speech recognition for CALL applications
Joost van Doremalen, Helmer Strik, Catia Cucchiarini
Evaluation of English intonation based on combination of multiple evaluation scores
Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino
A language-independent feature set for the automatic evaluation of prosody
Andreas Maier, F. Hönig, V. Zeissler, Anton Batliner, E. Körner, N. Yamanaka, P. Ackermann, Elmar Nöth
Adapting the acoustic model of a speech recognizer for varied proficiency non-native spontaneous speech using read speech with language-specific pronunciation difficulty
Klaus Zechner, Derrick Higgins, René Lawless, Yoko Futagi, Sarah Ohls, George Ivanov
Analysis and utilization of MLLR speaker adaptation technique for learners' pronunciation evaluation
Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose
Control of human generating force by use of acoustic information - study on onomatopoeic utterances for controlling small lifting-force
Miki Iimura, Taichi Sato, Kihachiro Tanaka
Mi-DJ: a multi-source intelligent DJ service
Ching-Hsien Lee, Hsu-Chih Wu
Human voice or prompt generation? can they co-exist in an application?
Géza Németh, Csaba Zainkó, Mátyás Bartalis, Gábor Olaszy, Géza Kiss
Automatic vs. human question answering over multimedia meeting recordings
Quoc Anh Le, Andrei Popescu-Belis
Characterizing silent and pseudo-silent speech using radar-like sensors
John F. Holzrichter
Technologies for processing body-conducted speech detected with non-audible murmur microphone
Tomoki Toda, Keigo Nakamura, Takayuki Nagai, Tomomi Kaino, Yoshitaka Nakajima, Kiyohiro Shikano
Artificial speech synthesizer control by brain-computer interface
Jonathan S. Brumberg, Philip R. Kennedy, Frank H. Guenther
Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface
Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone
Disordered speech recognition using acoustic and sEMG signals
Yunbin Deng, Rupal Patel, James T. Heaton, Glen Colby, L. Donald Gilmore, Joao Cabrera, Serge H. Roy, Carlo J. De Luca, Geoffrey S. Meltzner
Impact of different speaking modes on EMG-based speech recognition
Michael Wand, Szu-Chen Stan Jou, Arthur R. Toth, Tanja Schultz
Synthesizing speech from electromyography using voice transformation techniques
Arthur R. Toth, Michael Wand, Tanja Schultz
Multimodal HMM-based NAM-to-speech conversion
Viet-Anh Tran, Gérard Bailly, Hélène Lœvenbruck, Tomoki Toda
On the semi-supervised learning of multi-layered perceptrons
Jonathan Malkin, Amarnag Subramanya, Jeff Bilmes
Generalized discriminative feature transformation for speech recognition
Roger Hsiao, Tanja Schultz
A fast online algorithm for large margin training of continuous density hidden Markov models
Chih-Chieh Cheng, Fei Sha, Lawrence K. Saul
Maximum mutual information estimation via second order cone programming for large vocabulary continuous speech recognition
Dalei Wu, Baojie Li, Hui Jiang
Hidden conditional random field with distribution constraints for phone classification
Dong Yu, Li Deng, Alex Acero
Deterministic annealing based training algorithm for Bayesian speech recognition
Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Connecting rhythm and prominence in automatic ESL pronunciation scoring
Emily Nava, Joseph Tepperman, Louis Goldstein, Maria Luisa Zubizarreta, Shrikanth S. Narayanan
Evaluating parameters for mapping adult vowels to imitative babbling
Ilana Heintz, Mary Beckman, Eric Fosler-Lussier, Lucie Ménard
Intonation of Japanese sentences spoken by English speakers
Chiharu Tsurutani
KLAIR: a virtual infant for spoken language acquisition research
Mark Huckvale, Ian S. Howard, Sascha Fagel
An articulatory analysis of phonological transfer using real-time MRI
Joseph Tepperman, Erik Bresch, Yoon-Chul Kim, Sungbok Lee, Louis Goldstein, Shrikanth S. Narayanan
Do multiple caregivers speed up language acquisition?
L. ten Bosch, Okko Johannes Räsänen, Joris Driesen, Guillaume Aimetti, Toomas Altosaar, Lou Boves, A. Corns
Grapheme to phoneme conversion using an SMT system
Antoine Laurent, Paul Deléglise, Sylvain Meignier
Lexical and phonetic modeling for Arabic automatic speech recognition
Long Nguyen, Tim Ng, Kham Nguyen, Rabih Zbib, John Makhoul
Assessing context and learning for isizulu tone recognition
Gina-Anne Levow
A sequential minimization algorithm for finite-state pronunciation lexicon models
Simon Dobrišek, Boštjan Vesnicer, France Mihelič
A general-purpose 32 ms prosodic vector for hidden Markov modeling
Kornel Laskowski, Mattias Heldner, Jens Edlund
Vocabulary expansion through automatic abbreviation generation for Chinese voice search
Dong Yang, Yi-cheng Pan, Sadaoki Furui
Perceptual cost function for cross-fading based concatenation
Qi Miao, Alexander Kain, Jan P. H. van Santen
Exploring automatic similarity measures for unit selection tuning
Daniel Tihelka, Jan Romportl
Towards intonation control in unit selection speech synthesis
Cédric Boidin, Olivier Boeffard, Thierry Moudenc, Géraldine Damnati
A novel approach to cost weighting in unit selection TTS
Jerome R. Bellegarda
Maximum likelihood unit selection for corpus-based speech synthesis
Abubeker Gamboa Rosales, Hamurabi Gamboa Rosales, Ruediger Hoffmann
A close look into the probabilistic concatenation model for corpus-based speech synthesis
Shinsuke Sakai, Ranniery Maia, Hisashi Kawai, Satoshi Nakamura
Wavelet-based speaker change detection in single channel speech data
Michael Wiesenegger, Franz Pernkopf
An adaptive threshold computation for unsupervised speaker segmentation
Laura Docio-Fernandez, Paula Lopez-Otero, Carmen Garcia-Mateo
A data-driven approach for estimating the time-frequency binary mask
Gibak Kim, Philipos C. Loizou
A semi-supervised version of heteroscedastic linear discriminant analysis
Haolang Zhou, Damianos Karakos, Andreas G. Andreou
Self-learning vector quantization for pattern discovery from speech
Okko Johannes Räsänen, Unto Kalervo Laine, Toomas Altosaar
Monaural segregation of voiced speech using discriminative random fields
Rohit Prabhavalkar, Zhaozhang Jin, Eric Fosler-Lussier
Advancements in whisper-island detection within normally phonated audio streams
Chi Zhang, John H. L. Hansen
Joint segmentation and classification of dialog acts using conditional random fields
Matthias Zimmermann
Exploring complex vowels as phrase break correlates in a corpus of English speech with proPOSEL, a prosody and POS English lexicon
Claire Brierley, Eric Atwell
Automatic topic detection of recorded voice messages
Caroline Clemens, Stefan Feldes, Karlheinz Schuhmacher, Joachim Stegmann
Identification and automatic detection of parasitic speech sounds
Jindřich Matoušek, Radek Skarnitzl, Pavel Machač, Jan Trmal
Phonetic alignment for speech synthesis in under-resourced languages
D. R. van Niekerk, Etienne Barnard
Improving initial boundary estimation for HMM-based automatic phonetic segmentation
Kalu U. Ogbureke, Julie Carson-Berndsen
Importance of nasality measures for speaker recognition data selection and performance prediction
Howard Lei, Eduardo Lopez-Gonzalo
Exploration of vocal excitation modulation features for speaker recognition
Ning Wang, P. C. Ching, Tan Lee
Speaker identification for whispered speech using modified temporal patterns and MFCCs
Xing Fan, John H. L. Hansen
Speaker diarization for meeting room audio
Hanwu Sun, Tin Lay Nwe, Bin Ma, Haizhou Li
Improving speaker segmentation via speaker identification and text segmentation
Runxin Li, Tanja Schultz, Qin Jin
Overall performance metrics for multi-condition speaker recognition evaluations
David A. van Leeuwen
Speaker identification using warped MVDR cepstral features
Matthias Wölfel, Qian Yang, Qin Jin, Tanja Schultz
Entropy based overlapped speech detection as a pre-processing stage for speaker diarization
Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman
Speech style and speaker recognition: a case study
Marco Grimaldi, Fred Cummins
The majority wins: a method for combining speaker diarization systems
Marijn Huijbregts, David A. van Leeuwen, Franciska M. G. de Jong
Two-wire nuisance attribute projection
Yosef A. Solewicz, Hagai Aronowitz
Acoustic and high-speed digital imaging based analysis of pathological voice contributes to better understanding and differential diagnosis of neurological dysphonias and of mimicking phonatory disorders
Krzysztof Izdebski, Yuling Yan, Melda Kunduk
Normalized modulation spectral features for cross-database voice pathology detection
Maria Markaki, Yannis Stylianou
Speech sample salience analysis for speech cycle detection
C. Mertens, Francis Grenez, Jean Schoentgen
The use of telephone speech recordings for assessment and monitoring of cognitive function in elderly people
Viliam Rapcan, Shona D'Arcy, Nils Penard, Ian H. Robertson, Richard B. Reilly
Optimized feature set to assess acoustic perturbations in dysarthric speech
Sunil Nagaraja, Eduardo Castillo-Guerra
A microphone-independent visualization technique for speech disorders
Andreas Maier, Stefan Wenhardt, Tino Haderlein, Maria Schuster, Elmar Nöth
Evaluation of the effect of the GSM full rate codec on the automatic detection of laryngeal pathologies based on cepstral analysis
Rubén Fraile, Carmelo Sánchez, Juan I. Godino-Llorente, Nicolás Sáenz-Lechón, Víctor Osma-Ruiz, Juana M. Gutiérrez
Cepstral analysis of vocal dysperiodicities in disordered connected speech
A. Alpan, Jean Schoentgen, Y. Maryn, Francis Grenez, P. Murphy
Standard information from patients: the usefulness of self-evaluation (measured with the French version of the VHI)
Lise Crevier-Buchman, Stephanie Borel, Stéphane Hans, Madeleine Menard, Jacqueline Vaissiere
Intelligibility assessment in children with cleft lip and palate in Italian and German
Marcello Scipioni, Matteo Gerosa, Diego Giuliani, Elmar Nöth, Andreas Maier
Universidade de aveiro's voice evaluation protocol
Luis M. T. Jesus, Anna Barney, Ricardo Santos, Janine Caetano, Juliana Jorge, Pedro Sá Couto
Fast speech recognition for voice destination entry in a car navigation system
Hoon Chung, JeonGue Park, HyeonBae Jeon, YunKeun Lee
Improving perceived accuracy for in-car media search
Yun-Cheng Ju, Michael Seltzer, Ivan Tashev
Laying the foundation for in-car alcohol detection by speech
Florian Schiel, Christian Heinrich
A voice search approach to replying to SMS messages in automobiles
Yun-Cheng Ju, Tim Paek
Language modeling for what-with-where on GOOG-411
Charl van Heerden, Johan Schalkwyk, Brian Strope
Very large vocabulary voice dictation for mobile devices
Jan Nouza, Petr Cerva, Jindrich Zdansky
Did you say a BLUE banana? the prosody of contrast and abnormality in bulgarian and dutch
Diana V. Dimitrova, Gisela Redeker, John C. J. Hoeks
A quantitative study of F0 peak alignment and sentence modality
Hansjörg Mixdorff, Hartmut R. Pfitzinger
Closely related languages, different ways of realizing focus
Szu-wei Chen, Bei Wang, Yi Xu
Cross-variety rhythm typology in portuguese
Plínio A. Barbosa, M. Céu Viana, Isabel Trancoso
Pitch adaptation in different age groups: boundary tones versus global pitch
Marie Nilsenová, Marc Swerts, Véronique Houtepen, Heleen Dittrich
Backchannel-inviting cues in task-oriented dialogue
Agustín Gravano, Julia Hirschberg
Perception and production of boundary tones in whispered dutch
W. Heeren, V. J. Van Heuven
Pitch accents and information status in a German radio news corpus
Katrin Schweitzer, Arndt Riester, Michael Walsh, Grzegorz Dogil
Analysis of voice fundamental frequency contours of continuing and terminating prosodic phrases in four swiss German dialects
Adrian Leemann, Keikichi Hirose, Hiroya Fujisaki
Intonational features for identifying regional accents of Italian
Michelina Savino
Analysis and recognition of accentual patterns
Agnieszka Wagner
Using responsive prosodic variation to acknowledge the user's current state
Nigel G. Ward, Rafael Escalante-Ruiz
Intonation segments and segmental intonation
Oliver Niebuhr
The phrase-final accent in kammu: effects of tone, focus and engagement
David House, Anastasia Karlsson, Jan-Olof Svantesson, Damrong Tayanin
Tonal alignment in three varieties of hiberno-English
Raya Kalaldeh, Amelie Dorn, Ailbhe Ní Chasaide
Determining intonational boundaries from the acoustic signal
Lourdes Aguilar, Antonio Bonafonte, Francisco Campillo, David Escudero
Compression and truncation revisited
Claudia K. Ohl, Hartmut R. Pfitzinger
Comparison of Fujisaki-model extractors and F0 stylizers
Hartmut R. Pfitzinger, Hansjörg Mixdorff, Jan Schwarz
Is tonal alignment interpretation independent of methodology?
Caterina Petrone, Mariapaola D'Imperio
Modeling the intonation of topic structure: two approaches
Margaret Zellers, Brechtje Post, Mariapaola D'Imperio
What's in an ontology for spoken language understanding
Silvia Quarteroni, Giuseppe Riccardi, Marco Dinarelli
A fundamental study of shouted speech for acoustic-based security system
Hiroaki Nanjo, Hiroki Mikami, Hiroshi Kawano, Takanobu Nishiura
Evaluating the potential utility of ASR n-best lists for incremental spoken dialogue systems
Timo Baumann, Okko Buß, Michaela Atterer, David Schlangen
Improving the recognition of names by document-level clustering
Bin Zhang, Wei Wu, Jeremy G. Kahn, Mari Ostendorf
Robust dependency parsing for spoken language understanding of spontaneous speech
Frederic Bechet, Alexis Nasr
Semantic role labeling with discriminative feature selection for spoken language understanding
Chao-Hong Liu, Chung-Hsien Wu
A study of new approaches to speaker diarization
Douglas Reynolds, Patrick Kenny, Fabio Castaldo
Redefining the Bayesian information criterion for speaker diarisation
Themos Stafylakis, Vassilis Katsouros, George Carayannis
Speaker diarization using divide-and-conquer
Shih-Sian Cheng, Chun-Han Tseng, Chia-Ping Chen, Hsin-Min Wang
KL realignment for speaker diarization with multiple feature streams
Deepu Vijayasenan, Fabio Valente, Hervé Bourlard
Speech overlap detection in a two-pass speaker diarization system
Marijn Huijbregts, David A. van Leeuwen, Franciska M. G. de Jong
Improved speaker diarization of meeting speech with recurrent selection of representative speech segments and participant interaction pattern modeling
Kyu J. Han, Shrikanth S. Narayanan
Application of differential microphone array for IS-127 EVRC rate determination algorithm
Henry Widjaja, Suryoadhi Wibowo
Estimating the position and orientation of an acoustic source with a microphone array network
Alberto Yoshihiro Nakano, Seiichi Nakagawa, Kazumasa Yamamoto
Singing voice detection in polyphonic music using predominant pitch
Vishweshwara Rao, S. Ramakrishnan, Preeti Rao
Word stress assessment for computer aided language learning
Juan Pablo Arias, Nestor Becerra Yoma, Hiram Vivanco
A non-intrusive signal-based model for speech quality evaluation using automatic classification of background noises
Adrien Leman, Julien Faure, Etienne Parizet
Acoustic event detection for spotting “hot spots” in podcasts
Kouhei Sumi, Tatsuya Kawahara, Jun Ogata, Masataka Goto
Improving detection of acoustic events using audiovisual data and feature level fusion
T. Butko, C. Canton-Ferrer, C. Segura, X. Giró, C. Nadeu, J. Hernando, J. R. Casas
Detecting audio events for semantic video search
M. Bugalho, J. Portêlo, Isabel Trancoso, T. Pellegrini, Alberto Abad
Factor analysis for audio-based video genre classification
Mickael Rouvier, Driss Matrouf, Georges Linarès
Robust audio-based classification of video genre
Mickael Rouvier, Georges Linarès, Driss Matrouf
Fusing audio and video information for online speaker diarization
Joerg Schmalenstroeer, Martin Kelling, Volker Leutnant, Reinhold Haeb-Umbach
Multimodal speaker verification using ancillary known speaker characteristics such as gender or age
Girija Chetty, Michael Wagner
Discovering keywords from cross-modal input: ecological vs. engineering methods for enhancing acoustic repetitions
Guillaume Aimetti, Roger K. Moore, L. ten Bosch, Okko Johannes Räsänen, Unto Kalervo Laine
Incremental composition of static decoding graphs
Miroslav Novák
Evaluation of phone lattice based speech decoding
Jacques Duchateau, Kris Demuynck, Hugo Van hamme
A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit
Jike Chong, Ekaterina Gonina, Youngmin Yi, Kurt Keutzer
Combined low level and high level features for out-of-vocabulary word detection
Benjamin Lecouteux, Georges Linarès, Benoit Favre
Bayes risk approximations using time overlap with an application to system combination
Björn Hoffmeister, Ralf Schlüter, Hermann Ney
Unsupervised estimation of the language model scaling factor
Christopher M. White, Ariya Rastrow, Sanjeev Khudanpur, Frederick Jelinek
Simultaneous estimation of confidence and error cause in speech recognition using discriminative model
Atsunori Ogawa, Atsushi Nakamura
A generalized composition algorithm for weighted finite-state transducers
Cyril Allauzen, Michael Riley, Johan Schalkwyk
Word confidence using duration models
Stefano Scanzio, Pietro Laface, Daniele Colibro, Roberto Gemello
A comparison of audio-free speech recognition error prediction methods
Preethi Jyothi, Eric Fosler-Lussier
Automatic out-of-language detection based on confidence measures derived from LVCSR word and phone lattices
Petr Motlicek
Automatic estimation of decoding parameters using large-margin iterative linear programming
Brian Mak, Tom Ko
Optimization of dereverberation parameters based on likelihood of speech recognizer
Randy Gomez, Tatsuya Kawahara
Application of noise robust MDT speech recognition on the SPEECON and speechdat-car databases
J. F. Gemmeke, Y. Wang, Maarten Van Segbroeck, B. Cranen, Hugo Van hamme
Model based feature enhancement for automatic speech recognition in reverberant environments
Alexander Krueger, Reinhold Haeb-Umbach
A study of mutual front-end processing method based on statistical model for noise robust speech recognition
Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani
Integrating codebook and utterance information in cepstral statistics normalization techniques for robust speech recognition
Guan-min He, Jeih-weih Hung
Reduced complexity equalization of lombard effect for speech recognition in noisy adverse environments
Hynek Bořil, John H. L. Hansen
Unsupervised training scheme with non-stereo data for empirical feature vector compensation
L. Buera, Antonio Miguel, Alfonso Ortega, Eduardo Lleida, Richard M. Stern
Incremental adaptation with VTS and joint adaptively trained systems
F. Flego, M. J. F. Gales
Target speech GMM-based spectral compensation for noise robust speech recognition
Takahiro Shinozaki, Sadaoki Furui
Noise-robust feature extraction based on forward masking
Sheng-Chiuan Chiou, Chia-Ping Chen
Noisy speech recognition by using output combination of discrete-mixture HMMs and continuous-mixture HMMs
Tetsuo Kosaka, You Saito, Masaharu Kato
Adaptive training with noisy constrained maximum likelihood linear regression for noise robust speech recognition
D. K. Kim, M. J. F. Gales
Performance comparisons of the integrated parallel model combination approaches with front-end noise reduction
Guanghu Shen, Soo-Young Suk, Hyun-Yeol Chung
Tuning support vector machines for robust phoneme classification with acoustic waveforms
Jibran Yousafzai, Zoran Cvetković, Peter Sollich
An analytic derivation of a phase-sensitive observation model for noise robust speech recognition
Volker Leutnant, Reinhold Haeb-Umbach
Variational model composition for robust speech recognition with time-varying background noise
Wooil Kim, John H. L. Hansen
Comparison of estimation techniques in joint uncertainty decoding for noise robust speech recognition
Haitian Xu, K. K. Chin
Replacing uncertainty decoding with subband re-estimation for large vocabulary speech recognition in noise
Jianhua Lu, Ji Ming, Roger Woods
Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement
Ramón Fernandez Astudillo, Dorothea Kolossa, Reinhold Orglmeister
Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain
Chanwoo Kim, Kshitiz Kumar, Bhiksha Raj, Richard M. Stern
Transforming features to compensate speech recogniser models for noise
R. C. van Dalen, F. Flego, M. J. F. Gales
Subband temporal modulation spectrum normalization for automatic speech recognition in reverberant environments
Xugang Lu, Masashi Unoki, Satoshi Nakamura
Robust in-car spelling recognition - a tandem BLSTM-HMM approach
Martin Wöllmer, Florian Eyben, Björn Schuller, Yang Sun, Tobias Moosmayr, Nhu Nguyen-Thien
Applying non-negative matrix factorization on time-frequency reassignment spectra for missing data mask estimation
Maarten Van Segbroeck, Hugo Van hamme
Investigation into variants of joint factor analysis for speaker recognition
Lukáš Burget, Pavel Matějka, Valiantsina Hubeika, Jan Černocký
Improved GMM-based speaker verification using SVM-driven impostor dataset selection
Mitchell McLaren, Robbie Vogt, Brendan Baker, Sridha Sridharan
Adaptive individual background model for speaker verification
Yossi Bar-Yosef, Yuval Bistritz
Optimization of discriminative kernels in SVM speaker verification
Shi-Xiong Zhang, Man-Wai Mak
UBM-based sequence kernel for speaker recognition
Zhenchun Lei
GMM kernel by Taylor series for speaker verification
Minqiang Xu, Xi Zhou, Beiqian Dai, Thomas S. Huang
Does session variability compensation in speaker recognition model intrinsic variation under mismatched conditions?
Elizabeth Shriberg, Sachin Kajarekar, Nicolas Scheffer
Variability compensated support vector machines applied to speaker verification
Zahi N. Karam, W. M. Campbell
Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification
Najim Dehak, Réda Dehak, Patrick Kenny, Niko Brümmer, Pierre Ouellet, Pierre Dumouchel
Within-session variability modelling for factor analysis speaker verification
Robbie Vogt, Jason Pelecanos, Nicolas Scheffer, Sachin Kajarekar, Sridha Sridharan
Speaker recognition by Gaussian information bottleneck
Ron M. Hecht, Elad Noor, Naftali Tishby
Variational dynamic kernels for speaker verification
C. Longworth, R. C. van Dalen, M. J. F. Gales
Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition
Howard Lei, Eduardo Lopez
Fast GMM computation for speaker verification using scalar quantization and discrete densities
Guoli Ye, Brian Mak, Man-Wai Mak
Text-independent speaker identification using vocal tract length normalization for building universal background model
A. K. Sarkar, S. Umesh, S. P. Rath
BUT system for NIST 2008 speaker recognition evaluation
Lukáš Burget, Michal Fapšo, Valiantsina Hubeika, Ondřej Glembek, Martin Karafiát, Marcel Kockmann, Pavel Matějka, Petr Schwarz, Jan Černocký
Selection of the best set of shifted delta cepstral features in speaker verification using mutual information
José R. Calvo, Rafael Fernández, Gabriel Hernández
Forensic speaker recognition using traditional features comparing automatic and human-in-the-loop formant tracking
Alberto de Castro, Daniel Ramos, Joaquin Gonzalez-Rodriguez
Open-set speaker identification under mismatch conditions
S. G. Pillay, A. Ariyaeeinia, P. Sivakumaran, M. Pawlewski
Minivectors: an improved GMM-SVM approach for speaker verification
Xavier Anguera
Robustness of phase based features for speaker recognition
R. Padmanabhan, Sree Hari Krishnan Parthasarathi, Hema A. Murthy
The MIT lincoln laboratory 2008 speaker recognition system
D. E. Sturim, W. M. Campbell, Zahi N. Karam, Douglas Reynolds, F. S. Richardson
Speaker recognition on lossy compressed speech using the speex codec
A. R. Stauffer, A. D. Lawson
Text-independent speaker verification using rank threshold in large number of speaker models
Haruka Okamoto, Satoru Tsuge, Amira Abdelwahab, Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa
The role of age in factor analysis for speaker identification
Yun Lei, John H. L. Hansen
Do humans and speaker verification system use the same information to differentiate voices?
Juliette Kahn, Solange Rossato
Automatic syllabification for danish text-to-speech systems
Jeppe Beck, Daniela Braga, João Nogueira, Miguel Sales Dias, Luis Coelho
Hybrid approach to grapheme to phoneme conversion for Korean
Jinsik Lee, Byeongchang Kim, Gary Geunbae Lee
Robust LTS rules with the Combilex speech technology lexicon
Korin Richmond, Robert A. J. Clark, Sue Fitt
Letter-to-phoneme conversion by inference of rewriting rules
Vincent Claveau
Online discriminative training for grapheme-to-phoneme conversion
Sittichai Jiampojamarn, Grzegorz Kondrak
Using same-language machine translation to create alternative target sequences for text-to-speech synthesis
Peter Cahill, Jinhua Du, Andy Way, Julie Carson-Berndsen
Watermark recovery from speech using inverse filtering and sign correlation
Robert Morris, Ralph Johnson, Vladimir Goncharoff, Joseph DiVita
Weighted linear prediction for speech analysis in noisy conditions
Jouni Pohjalainen, Heikki Kallasjoki, Kalle J. Palomäki, Mikko Kurimo, Paavo Alku
Log-spectral magnitude MMSE estimators under super-Gaussian densities
Richard C. Hendriks, Richard Heusdens, Jesper Jensen
Speech enhancement in a 2-dimensional area based on power spectrum estimation of multiple areas with investigation of existence of active sources
Yusuke Hioka, Ken'ichi Furuya, Yoichi Haneda, Akitoshi Kataoka
Modulation domain spectral subtraction for speech enhancement
Kuldip Paliwal, Belinda Schwerin, Kamil Wójcicki
Variational loopy belief propagation for multi-talker speech recognition
Steven J. Rennie, John R. Hershey, Peder A. Olsen
Enhancement of binaural speech using codebook constrained iterative binaural wiener filter
Nadir Cazi, T. V. Sreenivas
A semi-blind source separation method with a less amount of computation suitable for tiny DSP modules
Kazunobu Kondo, Makoto Yamada, Hideki Kenmochi
Model-based speech separation: identifying transcription using orthogonality
S. W. Lee, Frank K. Soong, Tan Lee
Enhanced minimum statistics technique incorporating soft decision for noise suppression
Yun-Sik Park, Ji-Hyun Song, Jae-Hun Choi, Joon-Hyuk Chang
Effect of noise reduction on reaction time to speech in noise
Mark Huckvale, Jayne Leak
Joint noise reduction and dereverberation of speech using hybrid TF-GSC and adaptive MMSE estimator
Behdad Dashtbozorg, Hamid Reza Abutalebi
A study on multiple sound source localization with a distributed microphone system
Kook Cho, Takanobu Nishiura, Yoichi Yamashita
Robust minimal variance distortionless speech power spectra enhancement using order statistic filter for microphone array
Tao Yu, John H. L. Hansen
Speech enhancement minimizing generalized euclidean distortion using supergaussian priors
Amit Das, John H. L. Hansen
STFT-based speech enhancement by reconstructing the harmonics
Iman Haji Abolhassani, Sid-Ahmed Selouani, Douglas O'Shaughnessy
Joint speech enhancement and speaker identification using monte carlo methods
Ciira wa Maina, John MacLaren Walsh
Combined discriminative training for multi-stream HMM-based audio-visual speech recognition
Jing Huang, Karthik Visweswariah
Cued speech recognition for augmentative communication in normal-hearing and hearing-impaired subjects
Panikos Heracleous, Denis Beautemps, Noureddine Abboutabit
On acquiring speech production knowledge from articulatory measurements for phoneme recognition
D. Neiberg, G. Ananthakrishnan, Mats Blomberg
Measuring the gap between HMM-based ASR and TTS
John Dines, Junichi Yamagishi, Simon King
Speech recognition with speech synthesis models by marginalising over decision tree leaves
John Dines, Lakshmi Saheer, Hui Liang
Detailed description of triphone model using SSS-free algorithm
Motoyuki Suzuki, Daisuke Honma, Akinori Ito, Shozo Makino
Decision tree acoustic models for ASR
Jitendra Ajmera, Masami Akamine
Compression techniques applied to multiple speech recognition systems
Catherine Breslin, Matt Stuttle, Kate Knill
Graphical models for discrete hidden Markov models in speech recognition
Antonio Miguel, Alfonso Ortega, L. Buera, Eduardo Lleida
Factor analyzed HMM topology for speech recognition
Chuan-Wei Ting, Jen-Tzung Chien
Tied-state multi-path HMnet model using three-domain successive state splitting
Soo-Young Suk, Hiroaki Kojima
Acoustic modeling using exponential families
Vaibhava Goel, Peder A. Olsen
Personalizing synthetic voices for people with progressive speech disorders: judging voice similarity
S. M. Creer, S. P. Cunningham, P. D. Green, K. Fatema
Electrolaryngeal speech enhancement based on statistical voice conversion
Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano
Age recognition for spoken dialogue systems: do we need it?
Maria Wolters, Ravichander Vipperla, Steve Renals
Speech-based and multimodal media center for different user groups
Markku Turunen, Jaakko Hakulinen, Aleksi Melto, Juho Hella, Juha-Pekka Rajaniemi, Erno Mäkinen, Jussi Rantala, Tomi Heimonen, Tuuli Laivo, Hannu Soronen, Mervi Hansen, Pellervo Valkama, Toni Miettinen, Roope Raisamo
Virtual speech reading support for hard of hearing in a domestic multi-media setting
Samer Al Moubayed, Jonas Beskow, Ann-Marie Öster, Giampiero Salvi, Björn Granström, Nic van Son, Ellen Ormel
Real-time correction of closed-captions
Patrick Cardinal, Gilles Boulianne
Universal access: speech recognition for talkers with spastic dysarthria
Harsh Vardhan Sharma, Mark Hasegawa-Johnson
Exploring speech therapy games with children on the autism spectrum
Mohammed E. Hoque, Joseph K. Lane, Rana el Kaliouby, Matthew Goodwin, Rosalind W. Picard
Analyzing GMMs to characterize resonance anomalies in speakers suffering from apnoea
José Luis Blanco, Rubén Fernández, David Pardo, Álvaro Sigüenza, Luis A. Hernández, José Alcázar
On the mutual information between source and filter contributions for voice pathology detection
Thomas Drugman, Thomas Dubuisson, Thierry Dutoit
A system for detecting miscues in dyslexic read speech
Morten Højfeldt Rasmussen, Zheng-Hua Tan, Børge Lindberg, Søren Holdt Jensen
Techniques for rapid and robust topic identification of conversational telephone speech
Jonathan Wintrode, Scott Kulp
Localization of speech recognition in spoken dialog systems: how machine translation can make our lives easier
David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
Algorithms for speech indexing in microsoft recite
Kunal Mukerjee, Shankar Regunathan, Jeffrey Cole
Parallelized viterbi processor for 5,000-word large-vocabulary real-time continuous speech recognition FPGA system
Tsuyoshi Fujinaga, Kazuo Miura, Hiroki Noguchi, Hiroshi Kawaguchi, Masahiko Yoshimoto
SplaSH (spoken language search hawk): integrating time-aligned with text-aligned annotations
Sara Romano, Elvio Cecere, Francesco Cutugno
Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription
Jun Ogata, Masataka Goto
A WFST-based log-linear framework for speaking-style transformation
Graham Neubig, Shinsuke Mori, Tatsuya Kawahara
Clusterrank: a graph based method for meeting summarization
Nikhil Garg, Benoit Favre, Korbinian Reidhammer, Dilek Hakkani-Tür
Leveraging sentence weights in a concept-based optimization framework for extractive meeting summarization
Shasha Xie, Benoit Favre, Dilek Hakkani-Tür, Yang Liu
Hybrids of supervised and unsupervised models for extractive speech summarization
Shih-Hsiang Lin, Yueng-Tien Lo, Yao-Ming Yeh, Berlin Chen
Automatic detection of audio advertisements
I. Dan Melamed, Yeon-Jun Kim
Named entity network based on wikipedia
Sameer Maskey, Wisam Dakka
The rhythm of text and the rhythm of utterances: from metrics to models
Daniel Hirst
Paper 8003 was not available at the time of publication oral presentation of poster papers no time to lose? time shrinking effects enhance the impression of rhythmic “isochrony” and fast speech rate
Petra Wagner, Andreas Windmann
Measuring speech rhythm variation in a model-based framework
Plínio A. Barbosa
Rhythm measures with language-independent segmentation
Anastassia Loukina, Greg Kochanski, Chilin Shih, Elinor Keane, Ian Watson
Investigating changes in the rhythm of maori over time
Margaret Maclagan, Catherine I. Watson, Jeanette King, Ray Harlow, Laura Thompson, Peter Keegan
Effects of mora-timing in English rhythm control by Japanese learners
Shizuka Nakamura, Hiroaki Kato, Yoshinori Sagisaka
The dynamic dimension of the global speech-rhythm attributes
Jan Volín, Petr Pollák
Vowel duration in pre-geminate contexts in Polish
Zofia Malisz
Emotion dimensions and formant position
Martijn Goudbeek, Jean Philippe Goldman, Klaus R. Scherer
Identifying uncertain words within an utterance via prosodic features
Heather Pon-Barry, Stuart Shieber
Evaluating evaluators: a case study in understanding the benefits and pitfalls of multi-evaluator modeling
Emily Mower, Maja J. Matarić, Shrikanth S. Narayanan
Responding to user emotional state by adding emotional coloring to utterances
Jaime C. Acosta, Nigel G. Ward
Analysis of laugh signals for detecting in continuous speech
K. Sudheer Kumar, M. Sri Harish Reddy, K. Sri Rama Murty, B. Yegnanarayana
Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks
Martin Wöllmer, Florian Eyben, Björn Schuller, Ellen Douglas-Cowie, Roddy Cowie
Perceiving surprise on cue words: prosody and semantics interact on right and really
Catherine Lai
Emotion recognition using linear transformations in combination with video
Rok Gajšek, Vitomir Štruc, Simon Dobrišek, France Mihelič
Speaker dependent emotion recognition using prosodic supervectors
Ignacio Lopez-Moreno, Carlos Ortego-Resa, Joaquin Gonzalez-Rodriguez, Daniel Ramos
Physiologically-inspired feature extraction for emotion recognition
Yu Zhou, Yanqing Sun, Junfeng Li, Jianping Zhang, Yonghong Yan
Perceived loudness and voice quality in affect cueing
Irena Yanushevskaya, Christer Gobl, Ailbhe Ní Chasaide
Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions
Chi-Chun Lee, Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan
A detailed study of word-position effects on emotion expression in speech
Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan
CMAC for speech emotion profiling
Norhaslinda Kamaruddin, Abdul Wahab
On the relevance of high-level features for speaker independent emotion recognition of spontaneous speech
Marko Lugger, Bin Yang
Recognising interest in conversational speech - comparing bag of frames and supra-segmental features
Björn Schuller, Gerhard Rigoll
Many-to-many eigenvoice conversion with reference voice
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano
Alleviating the one-to-many mapping problem in voice conversion with context-dependent modeling
Elizabeth Godoy, Olivier Rosec, Thierry Chonavel
Efficient modeling of temporal structure of speech for applications in voice transformation
Binh Phu Nguyen, Masato Akagi
Cross-language voice conversion based on eigenvoices
Malorie Charlier, Yamato Ohtani, Tomoki Toda, Alexis Moinet, Thierry Dutoit
Voice conversion using k-histograms and frame selection
Alejandro José Uriz, Pablo Daniel Agüero, Antonio Bonafonte, Juan Carlos Tulli
Online model adaptation for voice conversion using model-based speech synthesis techniques
Dalei Wu, Baojie Li, Hui Jiang, Qian-Jie Fu
HMM adaptation and voice conversion for the synthesis of child speech: a comparison
Oliver Watts, Junichi Yamagishi, Simon King, Kay Berkling
HMM-based speaker characteristics emphasis using average voice model
Takashi Nose, Junichi Adada, Takao Kobayashi
An evaluation methodology for prosody transformation systems based on chirp signals
Damien Lolive, Nelly Barbot, Olivier Boeffard
Voice morphing based on interpolation of vocal tract area functions using AR-HMM analysis of speech
Yoshiki Nambu, Masahiko Mikawa, Kazuyo Tanaka
A novel model-based pitch conversion method for Mandarin speech
Hsin-Te Hwang, Chen-Yu Chiang, Po-Yi Sung, Sin-Horng Chen
Observation of empirical cumulative distribution of vowel spectral distances and its application to vowel based voice conversion
Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino
Japanese pitch conversion for voice morphing based on differential modeling
Ryuki Tachibana, Zhiwei Shuang, Masafumi Nishimura
A novel technique for voice conversion based on style and content decomposition with bilinear models
Victor Popa, Jani Nurminen, Moncef Gabbouj
Rule-based voice quality variation with formant synthesis
Felix Burkhardt
Fast transcription of unstructured audio recordings
Brandon C. Roy, Deb Roy
Finding allophones: an evaluation on consonants in the TIMIT corpus
Timothy Kempton, Roger K. Moore
Automatic formant extraction for sociolinguistic analysis of large corpora
Keelan Evanini, Stephen Isard, Mark Liberman
Investigating phonetic information reduction and lexical confusability
William Hartmann, Eric Fosler-Lussier
Improving phone recognition performance via phonetically-motivated units
Hyejin Hong, Minhwa Chung
An evaluation of formant tracking methods on an Arabic database
Imen Jemaa, Oussama Rekhis, Kaïs Ouni, Yves Laprie
Comparison of manual and automated estimates of subglottal resonances
Wolfgang Wokurek, Andreas Madsack
Using durational cues in a computational model of spoken-word recognition
Odette Scharenborg
Second language discrimination vowel contrasts by adults speakers with a five vowel system
Bianca Sisinni, Mirko Grimaldi
Three-way laryngeal categorization of Japanese, French, English and Chinese plosives by Korean speakers
Tomohiko Ooigawa, Shigeko Shinohara
The effect of F0 peak-delay on the L1 / L2 perception of English lexical stress
Shinichi Tokuma, Yi Xu
Lexical tone production by Cantonese speakers with parkinson's disease
Joan Ka-Yin Ma
Acoustic cues of palatalisation in plosive + lateral onset clusters
Daniela Müller, Sidney Martin Mota
Perception of English compound vs. phrasal stress: natural vs. synthetic speech
Irene Vogel, Arild Hestvik, H. Timothy Bunnell, Laura Spinu
New method for delexicalization and its application to prosodic tagging for text-to-speech synthesis
Martti Vainio, Antti Suni, Tuomo Raitio, Jani Nurminen, Juhani Järvikivi, Paavo Alku
Speech rate and pauses in non-native Finnish
Minnaleena Toivola, Mietta Lennes, Eija Aho
Modelling similarity perception of intonation
Uwe D. Reichel, Felicitas Kleber, Raphael Winkelmann
Studying L2 suprasegmental features in asian Englishes: a position paper
Helen Meng, Chiu-yu Tseng, Mariko Kondo, Alissa Harrison, Tanya Viscelgia
Classification of disfluent phenomena as fluent communicative devices in specific prosodic contexts
Helena Moniz, Isabel Trancoso, Ana Isabel Mata
Cross-cultural perception of discourse phenomena
Rolf Carlson, Julia Hirschberg
Modelling vocabulary growth from birth to young adulthood
Roger K. Moore, L. ten Bosch
Adaptive non-negative matrix factorization in a computational model of language acquisition
Joris Driesen, L. ten Bosch, Hugo Van hamme
Classifying clear and conversational speech based on acoustic features
Akiko Amano-Kusumoto, John-Paul Hosom, Izhak Shafran
The acoustic characteristics of Russian vowels in children of 6 and 7 years of age
Elena E. Lyakso, Olga V. Frolova, Aleks S. Grigoriev
Japanese children's acquisition of prosodic Politeness expressions
Takaaki Shochi, Donna Erickson, Kaoru Sekiyama, Albert Rilliard, Véronique Aubergé
Perceptual training of singleton and geminate stops in Japanese language by Korean learners
Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka
Resources for speech research: present and future infrastructure needs
Lou Boves, Rolf Carlson, Erhard Hinrichs, David House, Steven Krauwer, Lothar Lemnitzer, Martti Vainio, Peter Wittenburg
Speech recordings via the internet: an overview of the VOYS project in scotland
Catherine Dickie, Felix Schaeffler, Christoph Draxler, Klaus Jänsch
The multi-session audio research project (MARP) corpus: goals, design and initial findings
A. D. Lawson, A. R. Stauffer, E. J. Cupples, S. J. Wenndt, W. P. Bray, J. J. Grieco
Structure and annotation of Polish LVCSR speech database
Katarzyna Klessa, Grażyna Demenko
Balanced corpus of informal spoken Czech: compilation, design and findings
Martina Waclawičová, Michal Křen, Lucie Válková
JTrans: an open-source software for semi-automatic text-to-speech alignment
C. Cerisara, O. Mella, D. Fohr
Predicting the quality of multimodal systems based on judgments of single modalities
Ina Wechsung, Klaus-Peter Engelbrecht, Anja B. Naumann, Stefan Schaffer, Julia Seebode, Florian Metze, Sebastian Möller
Auto-checking speech transcriptions by multiple template constrained posterior
Lijuan Wang, Shenghao Qin, Frank K. Soong
Subjective experiments on influence of response timing in spoken dialogues
Toshihiko Itoh, Norihide Kitaoka, Ryota Nishimura
Usability study of VUI consistent with GUI focusing on age-groups
Jun Okamoto, Tomoyuki Kato, Makoto Shozakai
Annotating communicative function and semantic content in dialogue act for construction of consulting dialogue systems
Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hideki Kashioka, Satoshi Nakamura
Improved speech summarization with multiple-hypothesis representations and kullback-leibler divergence measures
Shih-Hsiang Lin, Berlin Chen
An improved speech segmentation quality measure: the r-value
Okko Johannes Räsänen, Unto Kalervo Laine, Toomas Altosaar
No sooner said than done? testing incrementality of semantic interpretations of spontaneous speech
Michaela Atterer, Timo Baumann, David Schlangen
Role of natural language understanding in voice local search
Junlan Feng, Srinivas Banglore, Mazin Gilbert
Recognition and correction of voice web search queries
Keith Vertanen, Per Ola Kristensson
Semantic context effects in the recognition of acoustically unreduced and reduced words
Chao Wang, Johan Schalkwyk, Roberto Sicconi, Geoffrey Zweig, Marco van de Ven, Benjamin V. Tucker, Mirjam Ernestus
Context effects and the processing of ambiguous words: further evidence from semantic incongruence
Michael C. W. Yip
The roles of reconstruction and lexical storage in the comprehension of regular pronunciation variants
Mirjam Ernestus
Lexical embedding in spoken dutch
Odette Scharenborg, Stefanie Okolowski
Real-time lexical competitions during speech-in-speech comprehension
Véronique Boulenger, Michel Hoen, François Pellegrino, Fanny Meunier
Discovering consistent word confusions in noise
Martin Cooke
A large greek-English dictionary with incorporated speech and language processing tools
Dimitrios P. Lyras, George Kokkinakis, Alexandros Lazaridis, Kyriakos Sgarbas, Nikos Fakotakis
Predicting children's reading ability using evaluator-informed features
Matthew Black, Joseph Tepperman, Sungbok Lee, Shrikanth S. Narayanan
Automatic intonation classification for speech training systems
György Szaszák, Dávid Sztahó, Klára Vicsi
Automated pronunciation scoring using confidence scoring and landmark-based SVM
Su-Youn Yoon, Mark Hasegawa-Johnson, Richard Sproat
ASR based pronunciation evaluation with automatically generated competing vocabulary
Carlos Molina, Nestor Becerra Yoma, Jorge Wuth, Hiram Vivanco
High performance automatic mispronunciation detection method based on neural network and TRAP features
Hongyan Li, Shijin Wang, Jiaen Liang, Shen Huang, Bo Xu
The semi-supervised switchboard transcription project
Amarnag Subramanya, Jeff Bilmes
Maximum mutual information multi-phone units in direct modeling
Geoffrey Zweig, Patrick Nguyen
Profiling large-vocabulary continuous speech recognition on embedded devices: a hardware resource sensitivity analysis
Kai Yu, Rob A. Rutenbar
Continuous speech recognition using attention shift decoding with soft decision
Ozlem Kalinli, Shrikanth S. Narayanan
Towards using hybrid word and fragment units for vocabulary independent LVCSR systems
Ariya Rastrow, Abhinav Sethy, Bhuvana Ramabhadran, Frederick Jelinek
Unsupervised training of an HMM-based speech recognizer for topic classification
Herbert Gish, Man-hung Siu, Arthur Chan, Bill Belfield
The case for case-based automatic speech recognition
Viktoria Maier, Roger K. Moore
A self-labeling speech corpus: collecting spoken words with an online educational game
Ian McGraw, Alexander Gruenstein, Andrew Sutherland
A noise robust method for pattern discovery in quantized time series: the concept matrix approach
Okko Johannes Räsänen, Unto Kalervo Laine, Toomas Altosaar
Using parallel architectures in speech recognition
Patrick Cardinal, Pierre Dumouchel, Gilles Boulianne
Example-based speech recognition using formulaic phrases
Christopher J. Watkins, Stephen J. Cox
Parallel fast likelihood computation for LVCSR using mixture decomposition
Naveen Parihar, Ralf Schlüter, David Rybach, Eric A. Hansen
An indexing weight for voice-to-text search
Chen Liu
On invariant structural representation for speech recognition: theoretical validation and experimental improvement
Yu Qiao, Nobuaki Minematsu, Keikichi Hirose
Articulatory feature asynchrony analysis and compensation in detection-based ASR
I-Fan Chen, Hsin-Min Wang
CRANDEM: conditional random fields for word recognition
Jeremy Morris, Eric Fosler-Lussier
HEAR: an hybrid episodic-abstract speech recognizer
Sébastien Demange, Dirk Van Compernolle
Constrained probabilistic subspace maps applied to speech enhancement
Kaustubh Kalgaonkar, Mark A. Clements
Reconstructing clean speech from noisy MFCC vectors
Ben Milner, Jonathan Darch, Ibrahim Almajai
An evaluation of objective quality measures for speech intelligibility prediction
Cees H. Taal, Richard C. Hendriks, Richard Heusdens, Jesper Jensen, Ulrik Kjems
Performance comparison of HMM and VQ based single channel speech separation
M. H. Radfar, W. -Y. Chan, R. M. Dansereau, W. Wong
Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment
Yosuke Izumi, Kenta Nishiki, Shinji Watanabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama
Enhancing audio speech using visual speech features
Ibrahim Almajai, Ben Milner
Classifying turn-level uncertainty using word-level prosody
Diane Litman, Mihai Rotaru, Greg Nicholas
Detecting subjectivity in multiparty speech
Gabriel Murray, Giuseppe Carenini
Pitch contour parameterisation based on linear stylisation for emotion recognition
Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps
Feature-based and channel-based analyses of intrinsic variability in speaker verification
Martin Graciarena, Tobias Bocklet, Elizabeth Shriberg, Andreas Stolcke, Sachin Kajarekar
Robust angry speech detection employing a TEO-based discriminative classifier combination
Wooil Kim, John H. L. Hansen
Improving emotion recognition using class-level spectral features
Dmitri Bitouk, Ani Nenkova, Ragini Verma
Arousal and valence prediction in spontaneous emotional speech: felt versus perceived emotion
Khiet P. Truong, David A. van Leeuwen, Mark A. Neerincx, Franciska M. G. de Jong
Dimension reduction approaches for SVM based speaker age estimation
Gil Dobry, Ron M. Hecht, Mireille Avigal, Yaniv Zigel
ANN based decision fusion for speech emotion recognition
Lu Xu, Mingxing Xu, Dali Yang
Processing affected speech within human machine interaction
Bogdan Vlasenko, Andreas Wendemuth
Emotion recognition from speech using extended feature selection and a simple classifier
Ali Hassan, Robert I. Damper
Optimal event search using a structural cost function - improvement of structure to speech conversion
Daisuke Saito, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose
Deriving vocal tract shapes from electromagnetic articulograph data via geometric adaptation and matching
Ziad Al Bawab, Lorenzo Turicchia, Richard M. Stern, Bhiksha Raj
Towards unsupervised articulatory resynthesis of German utterances using EMA data
Ingmar Steiner, Korin Richmond
The klattgrid speech synthesizer
David Weenink
Development of a kenyan English text to speech system: a method of developing a TTS for a previously undefined English dialect
Mucemi Gakuru
Feedback loop for prosody prediction in concatenative speech synthesis
Javier Latorre, Sergio Gracia, Masami Akamine
Assessing a speaker for fast speech in unit selection speech synthesis
Donata Moers, Petra Wagner
Unit selection based speech synthesis for poor channel condition
Ling Cen, Minghui Dong, Paul Chan, Haizhou Li
Vocalic sandwich, a unit designed for unit selection TTS
Didier Cadic, Cédric Boidin, Christophe d'Alessandro
Speech synthesis based on the plural unit selection and fusion method using FWF model
Ryo Morinaka, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima
Speech synthesis without a phone inventory
Matthew P. Aylett, Simon King, Junichi Yamagishi
Context-dependent additive log f_0 model for HMM-based speech synthesis
Heiga Zen, Norbert Braunschweiler
Real-time live broadcast news subtitling system for Spanish
Alfonso Ortega, Jose Enrique Garcia, Antonio Miguel, Eduardo Lleida
Development of the 2008 SRI Mandarin speech-to-text system for broadcast news and conversation
Xin Lei, Wei Wu, Wen Wang, Arindam Mandal, Andreas Stolcke
Multifactor adaptation for Mandarin broadcast news and conversation speech recognition
Wen Wang, Arindam Mandal, Xin Lei, Andreas Stolcke, Jing Zheng
Development of the GALE 2008 Mandarin LVCSR system
C. Plahl, Björn Hoffmeister, Georg Heigold, Jonas Lööf, Ralf Schlüter, Hermann Ney
The RWTH aachen university open source speech recognition system
David Rybach, Christian Gollan, Georg Heigold, Björn Hoffmeister, Jonas Lööf, Ralf Schlüter, Hermann Ney
Online detecting end times of spoken utterances for synchronization of live speech and its transcripts
Jie Gao, Qingwei Zhao, Yonghong Yan
Real-time ASR from meetings
Philip N. Garner, John Dines, Thomas Hain, Asmaa El Hannani, Martin Karafiát, Danil Korchagin, Mike Lincoln, Vincent Wan, Le Zhang
Improvements to the LIUM French ASR system based on CMU sphinx: what helps to significantly reduce the word error rate?
Paul Deléglise, Yannick Estève, Sylvain Meignier, Teva Merlin
Merging search spaces for subword spoken term detection
Timo Mertens, Daniel Schneider, Joachim Köhler
A posterior probability-based system hybridisation and combination for spoken term detection
Javier Tejedor, Dong Wang, Simon King, Joe Frankel, José Colás
Stochastic pronunciation modelling for spoken term detection
Dong Wang, Simon King, Joe Frankel
Term-dependent confidence for out-of-vocabulary term detection
Dong Wang, Simon King, Joe Frankel, Peter Bell
A comparison of query-by-example methods for spoken term detection
Wade Shen, Christopher M. White, Timothy J. Hazen
Fast keyword detection using suffix array
Kouichi Katsurada, Shigeki Teshima, Tsuneo Nitta
Understanding speaker-listener interactions
Dirk Heylen
Detecting changes in speech expressiveness in participants of a radio program
Plínio A. Barbosa
An audio-visual approach to measuring discourse synchrony in multimodal conversation data
Nick Campbell
Towards flexible representations for analysis of accommodation of temporal features in spontaneous dialogue speech
Spyros Kousidis, David Dorran, Ciaran McDonnell, Eugene Coyle
Are we `in sync': turn-taking in collaborative dialogues
Štefan Beňuš
An audio-visual attention system for online association learning
Martin Heckmann, Holger Brandl, Xavier Domont, Bram Bolder, Frank Joublin, Christian Goerick
A human benchmark for language recognition
Rosemary Orr, David A. van Leeuwen
Large margin estimation of Gaussian mixture model parameters with extended baum-welch for spoken language recognition
Donglai Zhu, Bin Ma, Haizhou Li
Linguistically-motivated automatic classification of regional French varieties
Cécile Woehrling, Philippe Boula de Mareüil, Martine Adda-Decker
Discriminative acoustic language recognition via channel-compensated GMM statistics
Niko Brümmer, Albert Strasheim, Valiantsina Hubeika, Pavel Matějka, Lukáš Burget, Ondřej Glembek
Language score calibration using adapted Gaussian back-end
Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel
A framework for discriminative SVM/GMM systems for language recognition
W. M. Campbell, Zahi N. Karam
Functional data analysis as a tool for analyzing speech dynamics - a case study on the French word c'était
Michele Gubian, Francisco Torreira, Helmer Strik, Lou Boves
Large-scale analysis of formant frequency estimation variability in conversational telephone speech
Nancy F. Chen, Wade Shen, Joseph Campbell, Reva Schwartz
Developing an automatic functional annotation system for british English intonation
Saandia Ali, Daniel Hirst
Intrinsic vowel duration and the post-vocalic voicing effect: some evidence from dialects of north american English
Joshua Tauberer, Keelan Evanini
Investigating /l/ variation in English through forced alignment
Jiahong Yuan, Mark Liberman
Structural analysis of dialects, sub-dialects and sub-sub-dialects of Chinese
Xuebin Ma, Akira Nemoto, Nobuaki Minematsu, Yu Qiao, Keikichi Hirose
Voice activity detection using singular value decomposition-based filter
Hwa Jeon Song, Sung Min Ban, Hyung Soon Kim
Voice activity detection using partially observable Markov decision process
Chiyoun Park, Namhoon Kim, Jeongmi Cho
High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy
Zheng-Hua Tan, Børge Lindberg
Fusing fast algorithms to achieve efficient speech detection in FM broadcasts
Stéphane Pigeon, Patrick Verlinde
Robust speech recognition using VAD-measure-embedded decoder
Tasuku Oonishi, Paul R. Dixon, Koji Iwano, Sadaoki Furui
Investigating privacy-sensitive features for speech detection in multiparty conversations
Sree Hari Krishnan Parthasarathi, Mathew Magimai-Doss, Hervé Bourlard, Daniel Gatica-Perez
Evaluation of external and internal articulator dynamics for pronunciation learning
Lan Wang, Hui Chen, JianJun Ouyang
Robust audio-visual speech synchrony detection by generalized bimodal linear prediction
Kshitiz Kumar, Jiri Navratil, Etienne Marcheret, Vit Libal, Gerasimos Potamianos
Acoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden Markov models
Atef Ben Youssef, Pierre Badin, Gérard Bailly, Panikos Heracleous
Speaker discriminability for visual speech modes
Jeesun Kim, Chris Davis, Christian Kroos, Harold Hill
Audio-visual prosody of social attitudes in vietnamese: building and evaluating a tones balanced corpus
Dang-Khoa Mac, Véronique Aubergé, Albert Rilliard, Eric Castelli
Direct, modular and hybrid audio to visual speech conversion methods - a comparative study
Gyorgy Takacs
How similar are clusters resulting from schwa deletion in French to identical underlying clusters?
Audrey Bürki, Cécile Fougeron, Christophe Veaux, Ulrich H. Frauenfelder
Word-final [t]-deletion: an analysis on the segmental and sub-segmental level
Barbara Schuppler, Wim van Dommelen, Jacques Koreman, Mirjam Ernestus
Rarefaction gestures and coarticulation in mangetti dune !xung clicks
Amanda Miller, Abigail Scott, Bonny Sands, Sheena Shah
The acoustics of mangetti dune !xung clicks
Amanda Miller, Sheena Shah
Acoustic characteristics of ejectives in amharic
Hussien Seid, S. Rajendran, B. Yegnanarayana
Sentence-final particles in hong kong Cantonese: are they tonal or intonational?
Wing Li Wu
Same tone, different category: linguistic-tonetic variation in the areal tone acoustics of chuqu wu
William Steed, Phil Rose
Why would aspiration lower the pitch of the following vowel? observations from leng-shui-jiang Chinese
Caicai Zhang
Dialectal characteristics of osaka and tokyo Japanese: analyses of phonologically identical words
Kanae Amino, Takayuki Arai
Categories and gradience in intonation: evidence from linguistics and neurobiology
Brechtje Post, Francis Nolan, Emmanuel Stamatakis, Toby Hudson
Exploring vocalization of /l/ in English: an EPG and EMA study
Mitsuhiro Nakamura
The monophthongs and diphthongs of north-eastern welsh: an acoustic study
Robert Mayr, Hannah Davies
Voicing profile of Polish sonorants: [r] in obstruent clusters
J. Sieczkowska, Bernd Möbius, Antje Schweitzer, Michael Walsh, Grzegorz Dogil
A user modeling-based performance analysis of a wizarded uncertainty-adaptive dialogue system corpus
Kate Forbes-Riley, Diane Litman
Using dialogue-based dynamic language models for improving speech recognition
Juan Manuel Lucas-Cuesta, Fernando Fernández, Javier Ferreiros
Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection
Lihong Li, Jason D. Williams, Suhrid Balakrishnan
Hybridisation of expertise and reinforcement learning in dialogue systems
Romain Laroche, Ghislain Putois, Philippe Bretier, Bernadette Bouchon-Meunier
Bayesian learning of confidence measure function for generation of utterances and motions in object manipulation dialogue task
Komei Sugiura, Naoto Iwahashi, Hideki Kashioka, Satoshi Nakamura
Predicting how it sounds: re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems
Cédric Boidin, Verena Rieser, Lonneke van der Plas, Oliver Lemon, Jonathan Chevelu
Experiments on automatic prosodic labeling
Antje Schweitzer, Bernd Möbius
German boundary tones show categorical perception and a perceptual magnet effect when presented in different contexts
Katrin Schneider, Grzegorz Dogil, Bernd Möbius
Eye tracking for the online evaluation of prosody in speech synthesis: not so fast!
Michael White, Rajakrishnan Rajkumar, Kiwako Ito, Shari R. Speer
Prosodic analysis of foreign-accented English
Hansjörg Mixdorff, John Ingram
Perception of the evolution of prosody in the French broadcast news style
Philippe Boula de Mareüil, Albert Rilliard, Alexandre Allauzen
Prosodic effects on vowel production: evidence from formant structure
Yoonsook Mo, Jennifer Cole, Mark Hasegawa-Johnson
An adaptive BIC approach for robust audio stream segmentation
Janez Žibert, Andrej Brodnik, France Mihelič
Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach
Vaishali Patil, Shrikant Joshi, Preeti Rao
Signature cluster model selection for incremental Gaussian mixture cluster modeling in agglomerative hierarchical speaker clustering
Kyu J. Han, Shrikanth S. Narayanan
Speaker segmentation and clustering for simultaneously presented speech
Lingyun Gu, Richard M. Stern
Trimmed KL divergence between Gaussian mixtures for robust unsupervised acoustic anomaly detection
Nash Borges, Gerard G. L. Meyer
How to loose confidence: probabilistic linear machines for multiclass classification
Hui Lin, Jeff Bilmes, Koby Crammer
Quantifying wideband speech codec degradations via impairment factors: the new ITU-t p.834.1 methodology and its application to the g.711.1 codec
Sebastian Möller, Nicolas Côté, Atsuko Kurashima, Noritsugu Egi, Akira Takahashi
SUXES - user experience evaluation method for spoken and multimodal interaction
Markku Turunen, Jaakko Hakulinen, Aleksi Melto, Tomi Heimonen, Tuuli Laivo, Juho Hella
Results of the n-best 2008 dutch speech recognition evaluation
David A. van Leeuwen, Judith Kessens, Eric Sanders, Henk van den Heuvel
SHoUT, the university of twente submission to the n-best 2008 speech recognition evaluation for dutch
Marijn Huijbregts, Roeland Ordelman, Laurens van der Werff, Franciska M. G. de Jong
NIST 2008 speaker recognition evaluation: performance across telephone and room microphone channels
Alvin F. Martin, Craig S. Greenberg
The ester 2 evaluation campaign for the rich transcription of French radio broadcasts
Sylvain Galliano, Guillaume Gravier, Laura Chaubard
Differential vector quantization of feature vectors for distributed speech recognition
Jose Enrique Garcia, Alfonso Ortega, Antonio Miguel, Eduardo Lleida
Arithmetic coding of sub-band residuals in FDLP speech/audio codec
Petr Motlicek, Sriram Ganapathy, Hynek Hermansky
Pitch variation estimation
Tom Bäckström, Stefan Bayer, Sascha Disch
Soft decision-based acoustic echo suppression in a frequency domain
Yun-Sik Park, Ji-Hyun Song, Jae-Hun Choi, Joon-Hyuk Chang
Fine-granular scalable MELP coder based on embedded vector quantization
Mouloud Djamah, Douglas O'Shaughnessy
Joint quantization strategies for low bit-rate sinusoidal coding
Emre Unver, Stephane Villette, Ahmet Kondoz
Steganographic band width extension for the AMR codec of low-bit-rate modes
Akira Nishimura
Ultra low bit-rate speech coding based on unit-selection with joint spectral-residual quantization: no transmission of any residual information
V. Ramasubramanian, D. Harish
On the cost of backward compatibility for communication codecs
Konstantin Schmidt, Markus Schnell, Nikolaus Rettelbach, Manfred Lutzky, Jochen Issing
A media-specific FEC based on huffman coding for distributed speech recognition
Young Han Lee, Hong Kook Kim
Classification-based strategies for combining multiple 5-w question answering systems
Sibel Yaman, Dilek Hakkani-Tür, Gokhan Tur, Ralph Grishman, Mary Harper, Kathleen R. McKeown, Adam Meyers, Kartavya Sharma
Combining semantic and syntactic information sources for 5-w question answering
Sibel Yaman, Dilek Hakkani-Tür, Gokhan Tur
Phrase and word level strategies for detecting appositions in speech
Benoit Favre, Dilek Hakkani-Tür
Error correction of proportions in spoken opinion surveys
Nathalie Camelin, Renato De Mori, Frederic Bechet, Géraldine Damnati
Transformation-based learning for semantic parsing
F. Jurčíček, M. Gašić, S. Keizer, F. Mairesse, B. Thomson, K. Yu, S. Young
Large-scale Polish SLU
Patrick Lehnen, Stefan Hahn, Hermann Ney, Agnieszka Mykowiecka
Optimizing CRFs for SLU tasks in various languages using modified training criteria
Stefan Hahn, Patrick Lehnen, Georg Heigold, Hermann Ney
Learning lexicons from spoken utterances based on statistical model selection
Ryo Taguchi, Naoto Iwahashi, Takashi Nose, Kotaro Funakoshi, Mikio Nakano
Improving speech understanding accuracy with limited training data using multiple language models and multiple understanding models
Masaki Katsumaru, Mikio Nakano, Kazunori Komatani, Kotaro Funakoshi, Tetsuya Ogata, Hiroshi G. Okuno
Low-cost call type classification for contact center calls using partial transcripts
Youngja Park, Wilfried Teiken, Stephen C. Gates
A new quality measure for topic segmentation of text and speech
Mehryar Mohri, Pedro Moreno, Eugene Weinstein
Concept segmentation and labeling for conversational speech
Marco Dinarelli, Alessandro Moschitti, Giuseppe Riccardi
A noise-type and level-dependent MPO-based speech enhancement architecture with variable frame analysis for noise-robust speech recognition
Vikramjit Mitra, Bengt J. Borgstrom, Carol Y. Espy-Wilson, Abeer Alwan
Complementarity of MFCC, PLP and Gabor features in the presence of speech-intrinsic variabilities
Bernd T. Meyer, Birger Kollmeier
Noise robustness of tract variables and their application to speech recognition
Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson, Elliot Saltzman, Louis Goldstein
Articulatory phonological code for word classification
Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis Goldstein, Elliot Saltzman
Robust keyword spotting with rapidly adapting point process models
Aren Jansen, Partha Niyogi
Automatically rating pronunciation through articulatory phonology
Joseph Tepperman, Louis Goldstein, Sungbok Lee, Shrikanth S. Narayanan
Learning the structure of human-computer and human-human dialogs
David Griol, Giuseppe Riccardi, Emilio Sanchis
Pause and gap length in face-to-face interaction
Jens Edlund, Mattias Heldner, Julia Hirschberg
Modeling other talkers for improved dialog act recognition in meetings
Kornel Laskowski, Elizabeth Shriberg
A closer look at quality judgments of spoken dialog systems
Klaus-Peter Engelbrecht, Felix Hartard, Florian Gödde, Sebastian Möller
New methods for the analysis of repeated utterances
Geoffrey Zweig
The effects of different voices for speech-based in-vehicle interfaces: impact of young and old voices on driving performance and attitude
Ing-Marie Jonsson, Nils Dahlbäck
In search of non-uniqueness in the acoustic-to-articulatory mapping
G. Ananthakrishnan, D. Neiberg, Olov Engwall
Estimation of articulatory gesture patterns from speech acoustics
Prasanta Kumar Ghosh, Shrikanth S. Narayanan, Pierre Divenyi, Louis Goldstein, Elliot Saltzman
Formant trajectories for acoustic-to-articulatory inversion
I. Yücel Özbek, Mark Hasegawa-Johnson, Mübeccel Demirekler
A robust variational method for the acoustic-to-articulatory problem
Blaise Potard, Yves Laprie
Comparison of vowel structures of Japanese and English in articulatory and auditory spaces
Jianwu Dang, Mark Tiede, Jiahong Yuan
The articulatory and acoustic impact of scottish English /r/ on the preceding vowel-onset
Janine Lilienthal
Static and dynamic modulation spectrum for speech recognition
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
2-d processing of speech for multi-pitch analysis
Tianyu T. Wang, Thomas F. Quatieri
A correlation-maximization denoising filter used as an enhancement frontend for noise robust bird call classification
Wei Chu, Abeer Alwan
Preliminary inversion mapping results with a new EMA corpus
Korin Richmond
Time-varying autoregressive tests for multiscale speech analysis
Daniel Rudoy, Thomas F. Quatieri, Patrick J. Wolfe
Audio keyword extraction by unsupervised word discovery
Armando Muscariello, Guillaume Gravier, Frédéric Bimbot
ASR corpus design for resource-scarce languages
Etienne Barnard, Marelie Davel, Charl van Heerden
Pronunciation dictionary development in resource-scarce environments
Marelie Davel, Olga Martirosian
XTrans: a speech annotation and transcription tool
Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee
How to select a good training-data subset for transcription: submodular active selection for sequences
Hui Lin, Jeff Bilmes
Improving acceptability assessment for the labelling of affective speech corpora
Zoraida Callejas, Ramón López-Cózar
The broadcast narrow band speech corpus: a new resource type for large scale language recognition
Christopher Cieri, Linda Brandschain, Abby Neely, David Graff, Kevin Walker, Chris Caruso, Alvin F. Martin, Craig S. Greenberg
A novel codebook search technique for estimating the open quotient
Yen-Liang Shue, Jody Kreiman, Abeer Alwan
Long term examination of intra-session and inter-session speaker variability
A. D. Lawson, A. R. Stauffer, B. Y. Smolenski, B. B. Pokines, M. Leonard, E. J. Cupples
Distorted visual information influences audiovisual perception of voicing
Ragnhild Eg, Dawn Behne
Perceived naturalness of a synthesizer of disordered voices
Samia Fraj, Francis Grenez, Jean Schoentgen
Audio-visual speech asynchrony modeling in a talking head
Alexey Karpov, Liliya Tsirulnik, Zdeněk Krňoul, Andrey Ronzhin, Boris Lobanov, Miloš Železný
The effects of fundamental frequency and formant space on speaker discrimination through bone-conducted ultrasonic hearing
Takayuki Kagomiya, Seiji Nakagawa
Automatic detection and prediction of topic changes through automatic detection of register variations and pause duration
Céline De Looze, Stéphane Rauzy
Analyzing features for automatic age estimation on cross-sectional data
Werner Spiegl, Georg Stemmer, Eva Lasarcyk, Varada Kolhatkar, Andrew Cassidy, Blaise Potard, Stephen Shum, Young Chol Song, Puyang Xu, Peter Beyerlein, James Harnsberger, Elmar Nöth
Intercultural differences in evaluation of pathological voice quality: perceptual and acoustical comparisons between RASATI and GRBASI scales
Emi Juliana Yamauchi, Satoshi Imaizumi, Hagino Maruyama, Tomoyuki Haji
F0 cues for the discourse functions of “hã” in hindi
Kalika Bali
Audio spatialisation strategies for multitasking during teleconferences
Stuart N. Wrigley, Simon Tucker, Guy J. Brown, Steve Whittaker
Speech rate effects on linguistic change
Alexsandro R. Meireles, Plínio A. Barbosa
Mandarin spontaneous narrative planning - prosodic evidence from national taiwan university lecture corpus
Chiu-yu Tseng, Zhao-yu Su, Lin-shan Lee
Investigation into bottle-neck features for meeting speech recognition
František Grézl, Martin Karafiát, Lukáš Burget
Multi-stream to many-stream: using spectro-temporal features for ASR
Sherry Y. Zhao, Suman Ravuri, Nelson Morgan
Tandem representations of spectral envelope and modulation frequency features for ASR
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
Entropy-based feature analysis for speech recognition
Panji Setiawan, Harald Höge, Tim Fingscheidt
Hierarchical processing of the modulation spectrum for GALE Mandarin LVCSR system
Fabio Valente, Mathew Magimai-Doss, C. Plahl, Suman Ravuri
Hill-climbing feature selection for multi-stream ASR
David Gelbart, Nelson Morgan, Alexey Tsymbal
Robust F0 estimation based on log-time scale autocorrelation and its application to Mandarin tone recognition
Yusuke Kida, Masaru Sakai, Takashi Masuko, Akinori Kawamura
Invariant-integration method for robust feature extraction in speaker-independent speech recognition
Florian Müller, Alfred Mertins
Discriminative feature transformation using output coding for speech recognition
Omid Dehzangi, Bin Ma, Eng Siong Chng, Haizhou Li
Discriminant spectrotemporal features for phoneme recognition
Nima Mesgarani, G. S. V. S. Sivaram, Sridhar Krishna Nemala, Mounya Elhilali, Hynek Hermansky
Auditory model based optimization of MFCCs improves automatic speech recognition performance
Saikat Chatterjee, Christos Koniaris, W. Bastiaan Kleijn
Pronunciation-based ASR for names
Henk van den Heuvel, Bert Réveil, Jean-Pierre Martens
How speaker tongue and name source language affect the automatic recognition of spoken names
Bert Réveil, Jean-Pierre Martens, Bart D'hoore
Online generation of acoustic models for multilingual speech recognition
Martin Raab, Guillermo Aradilla, Rainer Gruhn, Elmar Nöth
Basic speech recognition for spoken dialogues
Charl van Heerden, Etienne Barnard, Marelie Davel
Tonal articulatory feature for Mandarin and its application to conversational LVCSR
Qingqing Zhang, Jielin Pan, Yonghong Yan
Effects of language mixing for automatic recognition of Cantonese-English code-mixing utterances
Houwei Cao, P. C. Ching, Tan Lee
A one-step tone recognition approach using MSD-HMM for continuous speech
Changliang Liu, Fengpei Ge, Fuping Pan, Bin Dong, Yonghong Yan
Stream-based context-sensitive phone mapping for cross-lingual speech recognition
Khe Chai Sim, Haizhou Li
Human translations guided language discovery for ASR systems
Sebastian Stüker, Laurent Besacier, Alex Waibel
Article |
---|