doi: 10.21437/Interspeech.2010
ISSN: 2958-1796
A procedure for estimating gestural scores from natural speech
Hosung Nam, Vikramjit Mitra, Mark Tiede, Elliot Saltzman, Louis Goldstein, Carol Espy-Wilson, Mark Hasegawa-Johnson
On the interdependencies between voice quality, glottal gaps, and voice-source related acoustic measures
Yen-Liang Shue, Gang Chen, Abeer Alwan
Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems
Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino
Phase equalization-based autoregressive model of speech signals
Sadao Hiroya, Takemi Mochida
Articulatory-functional modeling of speech prosody: a review
Yi Xu, Santitham Prom-on
Two new estimation methods for a superpositional intonation model
Humberto M. Torres, Hansjörg Mixdorff, Jorge A. Gurlekian, Hartmut R. Pfitzinger
A discriminative splitting criterion for phonetic decision trees
Simon Wiesler, Georg Heigold, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney
Canonical state models for automatic speech recognition
Mark J. F. Gales, Kai Yu
Restructuring exponential family mixture models
Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder Olsen
Unsupervised discovery and training of maximally dissimilar cluster models
Françoise Beaufays, Vincent Vanhoucke, Brian Strope
Probabilistic state clustering using conditional random field for context-dependent acoustic modelling
Khe Chai Sim
Integrate template matching and statistical modeling for speech recognition
Xie Sun, Yunxin Zhao
Boosting systems for LVCSR
George Saon, Hagen Soltau
Incorporating sparse representation phone identification features in automatic speech recognition using exponential families
Vaibhava Goel, Tara N. Sainath, Bhuvana Ramabhadran, Peder Olsen, David Nahamoo, Dimitri Kanevsky
Integrating MLP features and discriminative training in data sampling based ensemble acoustic modeling
Xin Chen, Yunxin Zhao
Semi-supervised training of Gaussian mixture models by conditional entropy minimization
Jui-Ting Huang, Mark Hasegawa-Johnson
A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR
Guangchuan Shi, Yu Shi, Qiang Huo
Improvements to generalized discriminative feature transformation for speech recognition
Roger Hsiao, Florian Metze, Tanja Schultz
Parallel training of neural networks for speech recognition
Karel Veselý, Lukáš Burget, František Grézl
The use of sense in unsupervised training of acoustic models for ASR systems
Rita Singh, Benjamin Lambert, Bhiksha Raj
Boosted mixture learning of Gaussian mixture HMMs for speech recognition
Jun Du, Yu Hu, Hui Jiang
On the exploitation of hidden Markov models and linear dynamic models in a hybrid decoder architecture for continuous speech recognition
Volker Leutnant, Reinhold Haeb-Umbach
Context dependent modelling approaches for hybrid speech recognizers
Alberto Abad, Thomas Pellegrini, Isabel Trancoso, João Neto
A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination
Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi
Decision tree state clustering with word and syllable features
Hank Liao, Chris Alberti, Michiel Bacchiani, Olivier Siohan
A duration modeling technique with incremental speech rate normalization
Hiroshi Fujimura, Takashi Masuko, Mitsuyoshi Tachimori
Long short-term memory networks for noise robust speech recognition
Martin Wöllmer, Yang Sun, Florian Eyben, Björn Schuller
One-model speech recognition and synthesis based on articulatory movement HMMs
Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada
Acoustic modeling with bootstrap and restructuring for low-resourced languages
Xiaodong Cui, Jian Xue, Pierre L. Dognin, Upendra V. Chaudhari, Bowen Zhou
Lecture speech recognition by combining word graphs of various acoustic models
Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Kato
Semi-parametric trajectory modelling using temporally varying feature mapping for speech recognition
Khe Chai Sim, Shilin Liu
Deep-structured hidden conditional random fields for phonetic recognition
Dong Yu, L. Deng
Semi-supervised learning for improved expression of uncertainty in discriminative classifiers
Jonathan Malkin, Jeff Bilmes
Modeling posterior probabilities using the linear exponential family
Peder Olsen, Vaibhava Goel, Charles Micchelli, John R. Hershey
Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation
Fabrice Lefèvre, François Mairesse, Steve Young
Techniques for topic detection based processing in spoken dialog systems
Rajesh Balchandran, Leonid Rachevsky, Bhuvana Ramabhadran, Miroslav Novák
Optimizing spoken dialogue management with fitted value iteration
Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin
Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems
F. Jurčíček, B. Thomson, S. Keizer, François Mairesse, M. Gašić, Kai Yu, Steve Young
Is it possible to predict task completion in automated troubleshooters?
Alexander Schmitt, Michael Scholz, Wolfgang Minker, Jackson Liscombe, David Suendermann
Minimally invasive surgery for spoken dialog systems
David Suendermann, Jackson Liscombe, Roberto Pieraccini
New technique to enhance the performance of spoken dialogue systems based on dialogue states-dependent language models and grammatical rules
Ramón López-Cózar, David Griol
A stochastic finite-state transducer approach to spoken dialog management
Lluís-F. Hurtado, Joaquin Planells, Encarna Segarra, Emilio Sanchis, David Griol
Enhanced monitoring tools and online dialogue optimisation merged into a new spoken dialogue system design experience
Romain Laroche, Philippe Bretier, Ghislain Putois
Optimising a handcrafted dialogue system design
Romain Laroche, Ghislain Putois, Philippe Bretier
Utterance selection for speech acts in a cognitive tourguide scenario
Felix Putze, Tanja Schultz
Lexical entrainment of real users in the let's go spoken dialog system
Gabriel Parent, Maxine Eskenazi
Combining user intention and error modeling for statistical dialog simulators
Silvia Quarteroni, Meritxell González, Giuseppe Riccardi, Sebastian Varges
Parallel processing of interruptions and feedback in companions affective dialogue system
Jaakko Hakulinen, Markku Turunen, Raúl Santos de la Camara, Nigel Crook
Dynamic language modeling using Bayesian networks for spoken dialog systems
Antoine Raux, Neville Mehta, Deepak Ramachandran, Rakesh Gupta
Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act n-gram
Sunao Hara, Norihide Kitaoka, Kazuya Takeda
Dialogue act detection in error-prone spoken dialogue systems using partial sentence tree and latent dialogue act matrix
Wei-Bin Liang, Chung-Hsien Wu, Yu-Cheng Hsiao
Detection of hot spots in poster conversations based on reactive tokens of audience
Tatsuya Kawahara, Kouhei Sumi, Zhi-Qiang Chang, Katsuya Takanashi
Psychological evaluation of a group communication activation robot in a party game
Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi
Analyzing user utterances in barge-in-able spoken dialogue system for improving identification accuracy
Kyoko Matsuyama, Kazunori Komatani, Ryu Takeda, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
Pitch similarity in the vicinity of backchannels
Mattias Heldner, Jens Edlund, Julia Hirschberg
A rule-based backchannel prediction model using pitch and pause information
Khiet P. Truong, Ronald Poppe, Dirk Heylen
Detecting categorical perception in continuous discrimination data
Paul Boersma, Kateřina Chládková
The interrelation between the stimulus range and the number of response categories in vowel categorization
Titia Benders, Paola Escudero
The relation between pitch perception preference and emotion identification
Marie Nilsenová, Martijn Goudbeek, Luuk Kempen
Competition in the perception of spoken Japanese words
Takashi Otake, James M. McQueen, Anne Cutler
Influence of musical training on perception of L2 speech
Makiko Sadakata, Lotte van der Zanden, Kaoru Sekiyama
Full body aero-tactile integration in speech perception
Donald Derrick, Bryan Gick
Nucleus position within the intonation phrase: a typological study of English, Czech and Hungarian
Tomáš Duběda, Katalin Mády
Focus-sensitive operator or focus inducer: always and only
Yong-cheol Lee, Satoshi Nambu
F0 declination in English and Mandarin broadcast news speech
Jiahong Yuan, Mark Liberman
Frequency of occurrence effects on pitch accent realisation
Katrin Schweitzer, Michael Walsh, Bernd Möbius, Hinrich Schütze
On the automatic toBI accent type identification from data
César González-Ferreras, Carlos Vivaracho-Pascual, David Escudero-Mancebo, Valentín Cardeñoso-Payo
AutoBI - a tool for automatic toBI annotation
Andrew Rosenberg
A classifier-based target cost for unit selection speech synthesis trained on perceptual data
Volker Strom, Simon King
Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech
Wei Zhang, Xiaodong Cui
Speech database reduction method for corpus-based TTS system
Mitsuaki Isogai, Hideyuki Mizuno
Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier
Heng Lu, Zhen-Hua Ling, Si Wei, Lirong Dai, Ren-Hua Wang
Using robust viterbi algorithm and HMM-modeling in unit selection TTS to replace units of poor quality
Hanna Silén, Elina Helander, Jani Nurminen, Konsta Koppinen, Moncef Gabbouj
Automatic detection of abnormal stress patterns in unit selection synthesis
Yeon-Jun Kim, Mark C. Beutnagel
Enhancements of viterbi search for fast unit selection synthesis
Daniel Tihelka, Jiří Kala, Jindřich Matoušek
Accurate pitch marking for prosodic modification of speech segments
Thomas Ewender, Beat Pfister
A novel hybrid approach for Mandarin speech synthesis
Shifeng Pan, Meng Zhang, Jianhua Tao
Modeling liaison in French by using decision trees
Josafá de Jesus Aguiar Pontes, Sadaoki Furui
Improvement on plural unit selection and fusion
Jian Luan, Jian Li
Improving speech synthesis of machine translation output
Alok Parlikar, Alan W. Black, Stephan Vogel
Paraphrase generation to improve text-to-speech synthesis
Ghislain Putois, Jonathan Chevelu, Cédric Boidin
Phone mismatch penalty matrices for two-stage keyword spotting via multi-pass phone recognizer
Chang Woo Han, Shin Jae Kang, Chul Min Lee, Nam Soo Kim
English spoken term detection in multilingual recordings
Petr Motlicek, Fabio Valente, Philip N. Garner
A hybrid approach to robust word lattice generation via acoustic-based word detection
Icksang Han, Chiyoun Park, Jeongmi Cho, Jeongsu Kim
Direct observation of pruning errors (DOPE): a search analysis tool
Volker Steinbiss, Martin Sundermeyer, Hermann Ney
Direct construction of compact context-dependency transducers from data
David Rybach, Michael Riley
Incremental composition of static decoding graphs with label pushing
Miroslav Novák
A novel path extension framework using steady segment detection for Mandarin speech recognition
Zhanlei Yang, Wenju Liu
On the relation of Bayes risk, word error, and word posteriors in ASR
Ralf Schlüter, Markus Nußbaum-Thom, Hermann Ney
Time conditioned search in automatic speech recognition reconsidered
D. Nolden, Hermann Ney, Ralf Schlüter
Efficient data selection for speech recognition based on prior confidence estimation using speech and context independent models
Satoshi Kobashikawa, Taichi Asami, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi
A novel confidence measure based on marginalization of jointly estimated error cause probabilities
Atsunori Ogawa, Atsushi Nakamura
CRF-based combination of contextual features to improve a posteriori word-level confidence measures
Julien Fayolle, Fabienne Moreau, Christian Raymond, Guillaume Gravier, Patrick Gros
Recognition of spontaneous conversational speech using long short-term memory phoneme predictions
Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll
Improving ASR error detection with non-decoder based features
Thomas Pellegrini, Isabel Trancoso
Phoneme classification and lattice rescoring based on a k-NN approach
Ladan Golipour, Douglas O'Shaughnessy
Online adaptive learning for speech recognition decoding
Jeff Bilmes, Hui Lin
Improvements of search error risk minimization in viterbi beam search for speech recognition
Takaaki Hori, Shinji Watanabe, Atsushi Nakamura
Evaluation of a silent speech interface based on magnetic sensing
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko
Advanced speech communication system for deaf people
Rubén San-Segundo, Verónica López, Raquel Martín, Syaheerah Lufti, Javier Ferreiros, Ricardo Córdoba, José Manuel Pardo
Unsupervised acoustic model adaptation for multi-origin non native ASR
Sethserey Sam, Eric Castelli, Laurent Besacier
Speech-based automated cognitive status assessment
Dilek Hakkani-Tür, Dimitra Vergyri, Gokhan Tur
Speech recognition with a seamlessly updated language model for real-time closed-captioning
Toru Imai, Shinichi Homma, Akio Kobayashi, Takahiro Oku, Shoei Sato
The comparison between the deletion-based methods and the mixing-based methods for audio CAPTCHA systems
Takuya Nishimoto, Takayuki Watanabe
Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of luxembourgish
Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren
Manipulating treacheoesophageal speech
Rob J. J. H. van Son, Irene Jacobi, Frans Hilgers
Towards mixed language speech recognition systems
David Imseng, Hervé Bourlard, Mathew Magimai Doss
Voice search for development
Etienne Barnard, Johan Schalkwyk, Charl van Heerden, Pedro J. Moreno
Cross-cultural investigation of prosody in verbal feedback in interactional rapport
Gina-Anne Levow, Susan Duncan, Edward T. King
Multimodal speaker diarization using oriented optical flow histograms
Mary Tai Knox, Gerald Friedland
Towards an ASR-free objective analysis of pathological speech
Catherine Middag, Yvan Saeys, Jean-Pierre Martens
Session variability contrasts in the MARP corpus
Keith W. Godin, John H. L. Hansen
Estimation of two-to-one forced selection intelligibility scores by speech recognizers using noise-adapted models
Kazuhiro Kondo, Yusuke Takano
Analysis of gender normalization using MLP and VTLN features
Thomas Schaaf, Florian Metze
Discovering an optimal set of minimally contrasting acoustic speech units: a point of focus for whole-word pattern matching
Guillaume Aimetti, Roger K. Moore, Louis ten Bosch
Improvements to the equal-parameter BIC for speaker diarization
Themos Stafylakis, Xavier Anguera
A multistream multiresolution framework for phoneme recognition
Nima Mesgarani, Samuel Thomas, Hynek Hermansky
Cluster analysis of differential spectral envelopes on emotional speech
Giampiero Salvi, Fabio Tesser, Enrico Zovato, Piero Cosi
Modeling pronunciation variation with context-dependent articulatory feature decision trees
Sam Bowman, Karen Livescu
Ungrounded independent non-negative factor analysis
Bhiksha Raj, Kevin W. Wilson, Alexander Krueger, Reinhold Haeb-Umbach
Signal interaction and the devil function
John R. Hershey, Peder Olsen, Steven J. Rennie
Semi-automated update of automatic transcription system for the Japanese national congress
Yuya Akita, Masato Mimura, Graham Neubig, Tatsuya Kawahara
Language model cross adaptation for LVCSR system combination
Xunying Liu, Mark J. F. Gales, Phil C. Woodland
Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data
Shinji Watanabe, Takaaki Hori, Atsushi Nakamura
Accelerating hierarchical acoustic likelihood computation on graphics processors
Pavel Květoň, Miroslav Novák
Search by voice in Mandarin Chinese
Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno
The AMIDA 2009 meeting transcription system
Thomas Hain, Lukáš Burget, John Dines, Philip N. Garner, Asmaa El Hannani, Marijn Huijbregts, Martin Karafiát, Mike Lincoln, Vincent Wan
Simple and efficient speaker comparison using approximate KL divergence
William M. Campbell, Zahi N. Karam
The IIR NIST SRE 2008 and 2010 summed channel speaker recognition systems
Hanwu Sun, Bin Ma, Chien-Lin Huang, Trung Hieu Nguyen, Haizhou Li
Speaker characterization using long-term and temporal information
Chien-Lin Huang, Hanwu Sun, Bin Ma, Haizhou Li
Score-level compensation of extreme speech duration variability in speaker verification
Sergio Perez-Gomez, Daniel Ramos, Javier Gonzalez-Dominguez, Joaquin Gonzalez-Rodriguez
Speaker recognition experiments using connectionist transformation network features
Alberto Abad, Isabel Trancoso
Speaker recognition using supervised probabilistic principal component analysis
Yun Lei, John H. L. Hansen
Looking for relevant features for speaker role recognition
Benjamin Bigot, Julien Pinquier, Isabelle Ferrané, Régine André-Obrecht
Prosodic speaker verification using subspace multinomial models with intersession compensation
Marcel Kockmann, Lukáš Burget, Ondřej Glembek, Luciana Ferrer, Jan Černocký
The estimation and kernel metric of spectral correlation for text-independent speaker verification
Eryu Wang, Kong Aik Lee, Bin Ma, Haizhou Li, Wu Guo, Lirong Dai
Improving monaural speaker identification by double-talk detection
Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen, Pasi Fränti
Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals
B. Avinash, S. Guruprasad, Bayya Yegnanarayana
A fast implementation of factor analysis for speaker verification
Qingsong Liu, Wei Huang, Dongxing Xu, Hongbin Cai, Beiqian Dai
An investigation into direct scoring methods without SVM training in speaker verification
Ce Zhang, Rong Zheng, Bo Xu
Large margin Gaussian mixture models for speaker identification
Reda Jourani, Khalid Daoudi, Régine André-Obrecht, Driss Aboutajdine
On the use of Gaussian component information in the generative likelihood ratio estimation for speaker verification
Rong Zheng, Bo Xu
Acoustic vector resampling for GMMSVM-based speaker verification
Man-Wai Mak, Wei Rao
A fast speaker indexing using vector quantization and second order statistics with adaptive threshold computation
Konstantin Biatov
Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech
Gang Wang, Xiaojun Wu, Thomas Fang Zheng
On enhancing feature sequence filtering with filter-bank energy transformation in speaker verification with telephone speech
Claudio Garretón, Néstor Becerra Yoma
MAP estimation of subspace transform for speaker recognition
Donglai Zhu, Bin Ma, Kong Aik Lee, Cheung-Chi Leung, Haizhou Li
A longest matching segment approach for text-independent speaker recognition
Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming
Approaching human listener accuracy with modern speaker verification
Ville Hautamäki, Tomi Kinnunen, Mohaddeseh Nosratighods, Kong Aik Lee, Bin Ma, Haizhou Li
Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions
Jouni Pohjalainen, Rahim Saeidi, Tomi Kinnunen, Paavo Alku
The use of subvector quantization and discrete densities for fast GMM computation for speaker verification
Guoli Ye, Brian Mak
Transcript-dependent speaker recognition using mixer 1 and 2
Fred S. Richardson, Joseph P. Campbell
On the potential of glottal signatures for speaker recognition
Thomas Drugman, Thierry Dutoit
Acoustic feature diversity and speaker verification
R. Padmanabhan, Hema A. Murthy
A discriminative performance metric for GMM-UBM speaker identification
Omid Dehzangi, Bin Ma, Eng Siong Chng, Haizhou Li
A novel speaker binary key derived from anchor models
Xavier Anguera, Jean-François Bonastre
Variant time-frequency cepstral features for speaker recognition
Wei-Qiang Zhang, Yan Deng, Liang He, Jia Liu
Exploitation of phase information for speaker recognition
Ning Wang, P. C. Ching, Tan Lee
Effects of the phonological relevance in speaker verification
Yanhua Long, Lirong Dai, Bin Ma, Wu Guo
Topological representation of speech for speaker recognition
Gabriel H. Sierra, Jean-François Bonastre, Driss Matrouf, Jose R. Calvo
Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions
Seyed Omid Sadjadi, John H. L. Hansen
Speaker recognition using the resynthesized speech via spectrum modeling
Xiang Zhang, Chuan Cao, Lin Yang, Hongbin Suo, Jianping Zhang, Yonghong Yan
A factorial sparse coder model for single channel source separation
Robert Peharz, Michael Stark, Franz Pernkopf, Yannis Stylianou
Oriented PCA method for blind speech separation of convolutive mixtures
Yasmina Benabderrahmane, Sid Ahmed Selouani, Douglas O'Shaughnessy
Online Gaussian process for nonstationary speech separation
Hsin-Lung Hsieh, Jen-Tzung Chien
Convexity and fast speech extraction by split bregman method
Meng Yu, Wenye Ma, Jack Xin, Stanley Osher
Reducing musical noise in blind source separation by time-domain sparse filters and split bregman method
Wenye Ma, Meng Yu, Jack Xin, Stanley Osher
Combining monaural and binaural evidence for reverberant speech segregation
John Woodruff, Rohit Prabhavalkar, Eric Fosler-Lussier, DeLiang Wang
Speaker and language adaptive training for HMM-based polyglot speech synthesis
Heiga Zen
Context adaptive training with factorized decision trees for HMM-based speech synthesis
Kai Yu, Heiga Zen, François Mairesse, Steve Young
Roles of the average voice in speaker-adaptive HMM-based speech synthesis
Junichi Yamagishi, Oliver Watts, Simon King, Bela Usabaev
An HMM trajectory tiling (HTT) approach to high quality TTS
Yao Qian, Zhi-Jie Yan, Yijian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong
A perceptual study of acceleration parameters in HMM-based TTS
Yi-Ning Chen, Zhi-Jie Yan, Frank K. Soong
Evaluation of prosodic contextual factors for HMM-based speech synthesis
Shuji Yokomizo, Takashi Nose, Takao Kobayashi
Sinusoidal model parameterization for HMM-based TTS system
Slava Shechtman, Alex Sorin
Improved training of excitation for HMM-based parametric speech synthesis
Yoshinori Shiga, Tomoki Toda, Shinsuke Sakai, Hisashi Kawai
Excitation modeling based on waveform interpolation for HMM-based speech synthesis
June Sig Sung, Doo Hwa Hong, Kyung Hwan Oh, Nam Soo Kim
Formant-based frequency warping for improving speaker adaptation in HMM TTS
Xin Zhuang, Yao Qian, Frank K. Soong, Yijian Wu, Bo Zhang
Improved modelling of speech dynamics using non-linear formant trajectories for HMM-based speech synthesis
Hongwei Hu, Martin J. Russell
Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis
Zhen-Hua Ling, Yu Hu, Lirong Dai
Autoregressive clustering for HMM speech synthesis
Matt Shannon, William Byrne
An implementation of decision tree-based context clustering on graphics processing units
Nicholas Pilkington, Heiga Zen
Quantized HMMs for low footprint text-to-speech synthesis
Alexander Gutkin, Xavi Gonzalvo, Stefan Breuer, Paul Taylor
The role of higher-level linguistic features in HMM-based speech synthesis
Oliver Watts, Junichi Yamagishi, Simon King
HMM-based singing voice synthesis system using pitch-shifted pseudo training data
Ayami Mase, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
An unsupervised approach to creating web audio contents-based HMM voices
Jinfu Ni, Hisashi Kawai
Conversational spontaneous speech synthesis using average voice model
Tomoki Koriyama, Takashi Nose, Takao Kobayashi
Learning words and speech units through natural interactions
Jonas Hörnstein, José Santos-Victor
Bimodal coherence based scale ambiguity cancellation for target speech extraction and enhancement
Qingju Liu, Wenwu Wang, Philip Jackson
Speech estimation in non-stationary noise environments using timing structures between mouth movements and sound signals
Hiroaki Kawashima, Yu Horii, Takashi Matsuyama
Synthesizing photo-real talking head via trajectory-guided sample selection
Lijuan Wang, Xiaojun Qian, Wei Han, Frank K. Soong
Silent vs vocalized articulation for a portable ultrasound-based silent speech interface
Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel, Cédric Gendrot, Sophie Quattrocchi
Comparison of HMM and TMDN methods for lip synchronisation
Gregor Hofer, Korin Richmond
Rhythm and formant features for automatic alcohol detection
Florian Schiel, Christian Heinrich, Veronika Neumeyer
An exploration of voice source correlates of focus
Irena Yanushevskaya, Christer Gobl, John Kane, Ailbhe Ní Chasaide
Modeling perceived vocal age in american English
James D. Harnsberger, Rahul Shrivastav, W. S. Brown Jr.
Multivariate analysis of vocal fatigue in continuous reading
Marie-José Caraty, Claude Montacié
Frequency-domain delexicalization using surrogate vowels
Alexander Kain, Jan P. H. van Santen
Emotion recognition using imperfect speech recognition
Florian Metze, Anton Batliner, Florian Eyben, Tim Polzehl, Björn Schuller, Stefan Steidl
A novel feature extraction strategy for multi-stream robust emotion identification
Gang Liu, Yun Lei, John H. L. Hansen
Setup for acoustic-visual speech synthesis by concatenating bimodal units
Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger
Towards affective state modeling in narrative and conversational settings
Bart Jochems, Martha Larson, Roeland Ordelman, Ronald Poppe, Khiet P. Truong
Detection of anger emotion in dialog speech using prosody feature and temporal relation of utterances
Narichika Nomoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi
Gesture and speech coordination: the influence of the relationship between manual gesture and speech
Benjamin Roustan, Marion Dohen
Analysis and detection of cognitive load and frustration in drivers' speech
Hynek Bořil, Seyed Omid Sadjadi, Tristan Kleinschmidt, John H. L. Hansen
Acoustic-based recognition of head gestures accompanying speech
Akira Sasou, Yasuharu Hashimoto, Katsuhiko Sakaue
Multimodal dialog in the car: combining speech and turn-and-push dial to control comfort functions
Sandro Castronovo, Angela Mahr, Margarita Pentcheva, Christian Müller
Hands free audio analysis from home entertainment
Danil Korchagin, Philip N. Garner, Petr Motlicek
Affective story teller: a TTS system for emotional expressivity
Mostafa Al Masum Shaikh, Antonio Rui Ferreira Rebordão, Keikichi Hirose
Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization
Shweta Ghai, Rohit Sinha
Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems
Bo Li, Khe Chai Sim
Augmentation of adaptation data
Ravichander Vipperla, Steve Renals, Joe Frankel
Discriminative adaptation based on fast combination of DMAP and dfMLLR
Lukáš Machlica, Zbyněk Zajíc, Luděk Müller
Revisiting VTLN using linear transformation on conventional MFCC
Doddipatla Rama Sanand, Ralf Schlüter, Hermann Ney
Speaker adaptation based on nonlinear spectral transform for speech recognition
Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Speaker adaptation based on system combination using speaker-class models
Tetsuo Kosaka, Takashi Ito, Masaharu Kato, Masaki Kohda
Speaker adaptation in transformation space using two-dimensional PCA
Yongwon Jeong, Young Rok Song, Hyung Soon Kim
On speaker adaptive training of artificial neural networks
Jan Trmal, Jan Zelinka, Luděk Müller
Model synthesis for band-limited speech recognition
Yongjun He, Jiqing Han
Performance estimation of reverberant speech recognition based on reverberant criteria RSR-dn with acoustic parameters
Takahiro Fukumori, Masanori Morise, Takanobu Nishiura
A novel approach for matched reverberant training of HMMs using data pairs
Armin Sehr, Christian Hofmann, Roland Maas, Walter Kellermann
An auditory based modulation spectral feature for reverberant speech recognition
Hari Krishna Maganti, Marco Matassoni
On the potential of channel selection for recognition of reverberated speech with multiple microphones
Martin Wolf, Climent Nadeu
An improved wavelet-based dereverberation for robust automatic speech recognition
Randy Gomez, Tatsuya Kawahara
Methods for robust speech recognition in reverberant environments: a comparison
Rico Petrick, Thomas Fehér, Masashi Unoki, Rüdiger Hoffmann
Integration of multilayer regression analysis with structure-based pronunciation assessment
Masayuki Suzuki, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose
Using non-native error patterns to improve pronunciation verification
Joost van Doremalen, Catia Cucchiarini, Helmer Strik
Regularized-MLLR speaker adaptation for computer-assisted language learning system
Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose
Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques
Kuniaki Hirabayashi, Seiichi Nakagawa
Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment
Hsien-Cheng Liao, Jiang-Chun Chen, Sen-Chia Chang, Ying-Hua Guan, Chin-Hui Lee
CASTLE: a computer-assisted stress teaching and learning environment for learners of English as a second language
Jingli Lu, Ruili Wang, Liyanage C. De Silva, Yang Gao, Jia Liu
Automatic reference independent evaluation of prosody quality using multiple knowledge fusions
Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu
Landmark-based automated pronunciation error detection
Su-Youn Yoon, Mark Hasegawa-Johnson, Richard Sproat
HMM based TTS for mixed language text
Zhiwei Shuang, Shiyin Kang, Yong Qin, Lirong Dai, Lianhong Cai
An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation
Hui Liang, John Dines
Classroom note-taking system for hearing impaired students using automatic speech recognition adapted to lectures
Tatsuya Kawahara, Norihiro Katsumaru, Yuya Akita, Shinsuke Mori
Exploring web-browser based runtimes engines for creating ubiquitous speech interfaces
Paul R. Dixon, Sadaoki Furui
Efficient three-stage pitch estimation for packet loss concealment
Xuejing Sun, Sameer Gadre
On evaluation of the f0 estimation based on time-varying complex speech analysis
Keiichi Funaki
Pitch estimation in noisy speech based on temporal accumulation of spectrum peaks
Feng Huang, Tan Lee
Multi-pitch estimation by a joint 2-d representation of pitch and pitch dynamics
Tianyu T. Wang, Thomas F. Quatieri
On the effect of fundamental frequency on amplitude and frequency modulation patterns in speech resonances
Pirros Tsiakoulis, Alexandros Potamianos
Pitch determination using autocorrelation function in spectral domain
M. Shahidur Rahman, Tetsuya Shimamura
Chirp complex cepstrum-based decomposition for asynchronous glottal analysis
Thomas Drugman, Thierry Dutoit
Exploiting glottal formant parameters for glottal inverse filtering and parameterization
Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle
Glottal parameters estimation on speech using the zeros of the z-transform
Nicolas Sturmel, Christophe d'Alessandro, Boris Doval
Significance of pitch synchronous analysis for speaker recognition using AANN models
Sri Harish Reddy Mallidi, Kishore Prahallad, Suryakanth V. Gangashetty, Bayya Yegnanarayana
On using voice source measures in automatic gender classification of children's speech
Gang Chen, Xue Feng, Yen-Liang Shue, Abeer Alwan
SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech
Wei Chu, Abeer Alwan
Robust and efficient pitch estimation using an iterative ARMA technique
Jung Ook Hong, Patrick J. Wolfe
Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases
Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Hidehisa Nagano, Kunio Kashino
Applying geometric source separation for improved pitch extraction in human-robot interaction
Martin Heckmann, Claudius Gläser, Frank Joublin, Kazuhiro Nakadai
A spectral LF model based approach to voice source parameterisation
John Kane, Mark Kane, Christer Gobl
Glottal-based analysis of the lombard effect
Thomas Drugman, Thierry Dutoit
Constructing Japanese test collections for spoken term detection
Yoshiaki Itoh, Hiromitsu Nishizaki, Xinhui Hu, Hiroaki Nanjo, Tomoyosi Akiba, Tatsuya Kawahara, Seiichi Nakagawa, Tomoko Matsui, Yoichi Yamashita, Kiyoaki Aikawa
Japanese spoken term detection using syllable transition network derived from multiple speech recognizers' outputs
Satoshi Natori, Hiromitsu Nishizaki, Yoshihiro Sekiguchi
Combining Chinese spoken term detection systems via side-information conditioned linear logistic regression
Sha Meng, Wei-Qiang Zhang, Jia Liu
Metric subspace indexing for fast spoken term detection
Taisuke Kaneko, Tomoyosi Akiba
Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping
Chun-an Chan, Lin-shan Lee
Contextual verification for open vocabulary spoken term detection
Daniel Schneider, Timo Mertens, Martha Larson, Joachim Köhler
Augmented set of features for confidence estimation in spoken term detection
Javier Tejedor, Doroteo T. Toledano, Miguel Bautista, Simon King, Dong Wang, José Colás
Cluster-based language model for spoken document retrieval using NMF-based document clustering
Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura
Asymptotically exact noise-corrupted speech likelihoods
Rogier C. van Dalen, Mark J. F. Gales
A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation
Ramón Fernandez Astudillo, Reinhold Orglmeister
Non-negative matrix factorization based compensation of music for automatic speech recognition
Bhiksha Raj, Tuomas Virtanen, Sourish Chaudhuri, Rita Singh
Feature versus model based noise robustness
Kris Demuynck, Xueru Zhang, Dirk Van Compernolle, Hugo Van hamme
SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment
Ji Hun Park, Seon Man Kim, Jae Sam Yoon, Hong Kook Kim, Sung Joo Lee, Yunkeun Lee
Automatic selection of thresholds for signal separation algorithms based on interaural delay
Chanwoo Kim, Richard M. Stern, Kiwan Eom, Jaewon Lee
Channel detectors for system fusion in the context of NIST LRE 2009
Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert
Selecting phonotactic features for language recognition
Rong Tong, Bin Ma, Haizhou Li, Eng Siong Chng
Improved language recognition using mixture components statistics
Abualsoud Hanani, Michael Carey, Martin J. Russell
Using cross-decoder co-occurrences of phone n-grams in SVM-based phonotactic language recognition
Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-Fuentes, German Bordel
Exploiting variety-dependent phones in portuguese variety identification applied to broadcast news transcription
Oscar Koller, Alberto Abad, Isabel Trancoso, Céu Viana
Dialect recognition using a phone-GMM-supervector-based SVM kernel
Fadi Biadsy, Julia Hirschberg, Michael Collins
Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT)
Xiaojun Qian, Frank K. Soong, Helen Meng
Automatic pronunciation scoring using learning to rank and DP-based score segmentation
Liang-Yu Chen, Jyh-Shing Roger Jang
Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system
Wai-Kit Lo, Shuang Zhang, Helen Meng
Adapting a duration synthesis model to rate children's oral reading prosody
Minh Duong, Jack Mostow
Predicting word accuracy for the automatic speech recognition of non-native speech
Su-Youn Yoon, Lei Chen, Klaus Zechner
A new approach for automatic tone error detection in strong accented Mandarin based on dominant set
Taotao Zhu, Dengfeng Ke, Zhenbiao Chen, Bo Xu
Analysis of excitation source information in emotional speech
S. R. M. Prasanna, D. Govind
Acoustic feature analysis in speech emotion primitives estimation
Dongrui Wu, Thomas D. Parsons, Shrikanth S. Narayanan
Spectro-temporal modulations for robust speech emotion recognition
Lan-Ying Yeh, Tai-Shih Chi
Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples
Chi-Chun Lee, Matthew Black, Athanasios Katsamanis, Adam C. Lammert, Brian R. Baucom, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan
A cluster-profile representation of emotion using agglomerative hierarchical clustering
Emily Mower, Kyu J. Han, Sungbok Lee, Shrikanth S. Narayanan
Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm
Björn Schuller, Laurence Devillers
Mandarin digit recognition assisted by selective tone distinction
Xiao-Dong Wang, Kunihiko Owa, Makoto Shozakai
Brazilian portuguese acoustic model training based on data borrowing from other language
Kazuhiko Abe, Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura
Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit
Ngoc Thang Vu, Tim Schlippe, Franziska Kraus, Tanja Schultz
Cross-lingual speaker adaptation via Gaussian component mapping
Houwei Cao, Tan Lee, P. C. Ching
Cross-lingual acoustic modeling for dialectal Arabic speech recognition
Mohamed Elmahdy, Rainer Gruhn, Wolfgang Minker, Slim Abdennadher
Cross-lingual and multi-stream posterior features for low resource LVCSR systems
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
Latent perceptual mapping: a new acoustic modeling framework for speech recognition
Shiva Sundaram, Jerome R. Bellegarda
Unsupervised model adaptation on targeted speech segments for LVCSR system combination
Richard Dufour, Fethi Bougares, Yannick Estève, Paul Deléglise
Incremental word learning using large-margin discriminative training and variance floor estimation
Irene Ayllón Clemente, Martin Heckmann, Alexander Denecke, Britta Wrede, Christian Goerick
State-based labelling for a sparse representation of speech and its application to robust speech recognition
Tuomas Virtanen, Jort F. Gemmeke, Antti Hurmalainen
Similarity scoring for recognizing repeated out-of-vocabulary words
Mirko Hannemann, Stefan Kombrink, Martin Karafiát, Lukáš Burget
Data pruning for template-based automatic speech recognition
Dino Seppi, Dirk Van Compernolle
Improved topic classification and keyword discovery using an HMM-based speech recognizer trained without supervision
Man-Hung Siu, Herbert Gish, Arthur Chan, William Belfield
An analysis of sparseness and regularization in exemplar-based methods for speech classification
Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo
Investigation of full-sequence training of deep belief networks for speech recognition
Abdel-rahman Mohamed, Dong Yu, L. Deng
Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram
Yow-Bang Wang, Lin-shan Lee
Continuous speech recognition with a TF-IDF acoustic model
Geoffrey Zweig, Patrick Nguyen, Jasha Droppo, Alex Acero
SCARF: a segmental conditional random field toolkit for speech recognition
Geoffrey Zweig, Patrick Nguyen
Speaking style dependency of formant targets
Akiko Amano-Kusumoto, John-Paul Hosom, Alexander Kain
Similarity of effects of emotions on the speech organ configuration with and without speaking
Tatsuya Kitamura
A study of intra-speaker and inter-speaker affective variability using electroglottograph and inverse filtered glottal waveforms
Daniel Bone, Samuel Kim, Sungbok Lee, Shrikanth S. Narayanan
Modal analysis of vocal fold vibrations using laryngotopography
Ken-Ichi Sakakibara, Hiroshi Imagawa, Miwako Kimura, Hisayuki Yokonishi, Niro Tayama
Laryngeal voice quality in the expression of focus
Martti Vainio, Matti Airas, Juhani Järvikivi, Paavo Alku
Laryngeal characteristics during the production of geminate consonants
Masako Fujimoto, Kikuo Maekawa, Seiya Funatsu
Numerical study of turbulent flow-induced sound production in presence of a tooth-shaped obstacle: towards sibilant [s] physical modeling
Julien Cisonni, Kazunori Nozaki, Annemie Van Hirtum, Shigeo Wada
Morphological and predictability effects on schwa reduction: the case of dutch word-initial syllables
Iris Hanique, Barbara Schuppler, Mirjam Ernestus
Acoustic-to-articulatory inversion based on local regression
Samer Al Moubayed, G. Ananthakrishnan
Korean lenis, fortis, and aspirated stops: effect of place of articulation on acoustic realization
Mirjam Broersma
Speech synthesis by modeling harmonics structure with multiple function
Toru Nakashika, Ryuki Tachibana, Masafumi Nishimura, Tetsuya Takiguchi, Yasuo Ariki
Physics of body-conducted silent speech - production, propagation and representation of non-audible murmur
Makoto Otani, Tatsuya Hirahara
Multichannel noise reduction using low order RTF estimate
Subhojit Chakladar, Nam Soo Kim, Yu Gwang Jin, Tae Gyoon Kang
Reinforced blocking matrix with cross channel projection for speech enhancement
Inho Lee, Jongsung Yoon, Yoonjae Lee, Hanseok Ko
Masking property based microphone array post-filter design
Ning Cheng, Wenju Liu, Lan Wang
Reduction of broadband noise in speech signals by multilinear subspace analysis
Yusuke Sato, Tetsuya Hoya, Hovagim Bakardjian, Andrzej Cichocki
Novel probabilistic control of noise reduction for improved microphone array beamforming
Jungpyo Hong, Seungho Han, Sangbae Jeong, Minsoo Hahn
Speech enhancement using improved generalized sidelobe canceller in frequency domain with multi-channel postfiltering
Kai Li, Qiang Fu, Yonghong Yan
Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface
Jani Even, Carlos Ishi, Hiroshi Saruwatari, Norihiro Hagita
Multi-channel iterative dereverberation based on codebook constrained iterative multi-channel wiener filter
Ajay Srinivasamurthy, Thippur V. Sreenivas
Speaker-dependent mapping of source and system features for enhancement of throat microphone speech
Anand Joseph Xavier Medabalimi, Sri Harish Reddy Mallidi, Bayya Yegnanarayana
An analytic modeling approach to enhancing throat microphone speech commands for keyword spotting
Jun Cai, Stefano Marini, Pierre Malarme, Francis Grenez, Jean Schoentgen
Single-channel speech enhancement using kalman filtering in the modulation domain
Stephen So, Kamil K. Wójcicki, Kuldip K. Paliwal
Integrated feedback and noise reduction algorithm in digital hearing aids via oscillation detection
Miao Yao, Weiqian Liang
A blind signal-to-noise ratio estimator for high noise speech recordings
Charles Mercier, Roch Lefebvre
Estimation of glottal area function using stereo-endoscopic high-speed digital imaging
Hiroshi Imagawa, Ken-Ichi Sakakibara, Isao T. Tokuda, Mamiko Otsuka, Niro Tayama
Toward aero-acoustical analysis of the sibilant /s/: an oral cavity modeling
Kazunori Nozaki, Youhei Ohnishi, Takeshi Suda, Shigeo Wada, Shinji Shimojo
Effects of wall impedance on transmission and attenuation of higher-order modes in vocal-tract model
Kunitoshi Motoki
Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets
Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube
Speech robot mimicking human articulatory motion
Kotaro Fukui, Toshihiro Kusano, Yoshikazu Mukaeda, Yuto Suzuki, Atsuo Takanishi, Masaaki Honda
Mechanical vocal-tract models for speech dynamics
Takayuki Arai
Prosodic timing analysis for articulatory re-synthesis using a bank of resonators with an adaptive oscillator
Michael C. Brady
Decoding with shrinkage-based language models
Ahmad Emami, Stanley F. Chen, Abraham Ittycheriah, Hagen Soltau, Bing Zhao
Enhanced word classing for model M
Stanley F. Chen, Stephen M. Chu
Improved neural network based language modelling and adaptation
Junho Park, Xunying Liu, Mark J. F. Gales, Phil C. Woodland
Recurrent neural network based language model
Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černocký, Sanjeev Khudanpur
Discriminative language modeling using simulated ASR errors
Preethi Jyothi, Eric Fosler-Lussier
Learning a language model from continuous speech
Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara
Fast converging iterative kalman filtering for speech enhancement using long and overlapped tapered windows with large side lobe attenuation
Stephen So, Kuldip K. Paliwal
Robust noise estimation using minimum correction with harmonicity control
Xuejing Sun, Kuan-Chieh Yen, Rogerio Alves
New insights into subspace noise tracking
Mahdi Triki
Bias considerations for minimum subspace noise tracking
Mahdi Triki, Kees Janse
A corpus-based approach to speech enhancement from nonstationary noise
Ji Ming, Ramji Srinivasan, Danny Crookes
Bandwidth expansion of speech based on wavelet transform modulus maxima vector mapping
Zhe Chen, You-Chi Cheng, Fuliang Yin, Chin-Hui Lee
Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion
Kalu U. Ogbureke, Peter Cahill, Julie Carson-Berndsen
Evaluating a dialog language generation system: comparing the mountain system to other NLG approaches
Brian Langner, Stephan Vogel, Alan W. Black
Active appearance models for photorealistic visual speech synthesis
Wesley Mattheyses, Lukas Latacz, Werner Verhelst
Latent affective mapping: a novel framework for the data-driven analysis of emotion in text
Jerome R. Bellegarda
Native and non-native speaker judgements on the quality of synthesized speech
Anna C. Janska, Robert A. J. Clark
Machine learning for text selection with expressive unit-selection voices
Dominic Espinosa, Michael White, Eric Fosler-Lussier, Chris Brew
Acoustic correlates of meaning structure in conversational speech
Alexei V. Ivanov, Giuseppe Riccardi, S. Ghosh, S. Tonelli, E. A. Stepanov
HMM-based prosodic structure model using rich linguistic context
Nicolas Obin, Xavier Rodet, Anne Lacheret
Audiovisual congruence and pragmatic focus marking
Charlotte Wollermann, Bernhard Schröder, Ulrich Schade
Redescribing intonational categories with functional data analysis
Margaret Zellers, Michele Gubian, Brechtje Post
Exploring goodness of prosody by diverse matching templates
Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu
A language-identification inspired method for spontaneous speech detection
Mickael Rouvier, Richard Dufour, Georges Linarès, Yannick Estève
Speech dominoes and phonetic convergence
Gérard Bailly, Amélie Lelong
A quick sequential forward floating feature selection algorithm for emotion detection from speech
Mátyás Brendel, Riccardo Zaccarelli, Laurence Devillers
Automated vocal emotion recognition using phoneme class specific features
Géza Kiss, Jan P. H. van Santen
Feature selection for pose invariant lip biometrics
Adrian Pass, Jianguo Zhang, Darryl Stewart
Signal-based accent and phrase marking using the fujisaki model
Hussein Hussein, Rüdiger Hoffmann
A study of interplay between articulatory movement and prosodic characteristics in emotional speech production
Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan
Improved phoneme recognition by integrating evidence from spectro-temporal and cepstral features
Shang-wen Li, Liang-che Sun, Lin-shan Lee
Using spectro-temporal features to improve AFE feature extraction for ASR
Suman V. Ravuri, Nelson Morgan
Using harmonic phase information to improve ASR rate
Ibon Saratxaga, Inma Hernáez, Igor Odriozola, Eva Navas, Iker Luengo, Daniel Erro
Speech recognition using long-term phase information
Kazumasa Yamamoto, Eiichi Sueyoshi, Seiichi Nakagawa
Low-dimensional space transforms of posteriors in speech recognition
Jan Zelinka, Jan Trmal, Luděk Müller
Hierarchical bottle neck features for LVCSR
Christian Plahl, Ralf Schlüter, Hermann Ney
Hierarchical neural net architectures for feature extraction in ASR
František Grézl, Martin Karafiát
Mutual information analysis for feature and sensor subset selection in surface electromyography based speech recognition
Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan
Learning from human errors: prediction of phoneme confusions based on modified ASR training
Bernd T. Meyer, Birger Kollmeier
Hidden logistic linear regression for support vector machine based phone verification
Bo Li, Khe Chai Sim
Jointly optimized discriminative features for speech recognition
Tim Ng, Bing Zhang, Long Nguyen
Invariant integration features combined with speaker-adaptation methods
Florian Müller, Alfred Mertins
Multi resolution discriminative models for subvocalic speech recognition
Mark Raugas, Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan
A comparative large scale study of MLP features for Mandarin ASR
Fabio Valente, Mathew Magimai Doss, Christian Plahl, Suman V. Ravuri, Wen Wang
Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients
Cong-Thanh Do, Dominique Pastor, Gaël Le Lan, André Goalic
Speech intelligibility of diagonally localized speech with competing noise using bone-conduction headphones
Kazuhiro Kondo, Takayuki Kanda, Yosuke Kobayashi, Hiroyuki Yagyu
Masking of vowel-analog transitions by vowel-analog distracters
Pierre L. Divenyi
2010, a speech oddity: phonetic transcription of reversed speech
François Pellegrino, Emmanuel Ferragne, Fanny Meunier
Perception on pitch reset at discourse boundaries
Hsin-Yi Lin, Janice Fon
Effect of spatial separation on speech-in-noise comprehension in dyslexic adults
Marjorie Dole, Michel Hoen, Fanny Meunier
Speech categorization context effects in seven- to nine-month-old infants
Ellen Marklund, Francisco Lacerda, Anna Ericsson
Changes in temporal processing of speech across the adult lifespan
Diane Kewley-Port, Larry E. Humes, Daniel Fogerty
Fluency and structural complexity as predictors of L2 oral proficiency
Jared Bernstein, Jian Cheng, Masanori Suzuki
Semantic facilitation in bilingual everyday speech comprehension
Marco van de Ven, Benjamin V. Tucker, Mirjam Ernestus
L2 experience and non-native vowel categorization of L1-Mandarin speakers
Bo-ren Hsieh, Ho-hsien Pan
Cross-lingual talker discrimination
Mirjam Wester
Dajare is not the lowest form of wit
Takashi Otake
Comparison of methods for topic classification in a speech-oriented guidance system
Rafael Torres, Shota Takeuchi, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano
Using dependency parsing and machine learning for factoid question answering on spoken documents
Pere R. Comas, Jordi Turmo, Lluís Màrquez
A spoken term detection framework for recovering out-of-vocabulary words using the web
Carolina Parada, Abhinav Sethy, Mark Dredze, Frederick Jelinek
Improved spoken term detection by discriminative training of acoustic models based on user relevance feedback
Hung-yi Lee, Chia-ping Chen, Ching-feng Yeh, Lin-shan Lee
A lightweight keyword and tag-cloud retrieval algorithm for automatic speech recognition transcripts
Sebastian Tschöpel, Daniel Schneider
Lecture subtopic retrieval by retrieval keyword expansion using subordinate concept
Noboru Kanedera, Tetsuo Funada, Seiichi Nakagawa
Spoken document retrieval for oral presentations integrating global document similarities into local document similarities
Hiroaki Nanjo, Yusuke Iyonaga, Takehiko Yoshimi
Combining word-based features, statistical language models, and parsing for named entity recognition
Joseph Polifroni, Stephanie Seneff
Efficient combined approach for named entity recognition in spoken language
Azeddine Zidouni, Sophie Rosset, Hervé Glotin
Prominence based scoring of speech segments for automatic speech-to-speech summarization
Sree Harsha Yella, Vasudeva Varma, Kishore Prahallad
Maximum lexical cohesion for fine-grained news story segmentation
Zihan Liu, Lei Xie, Wei Feng
Phoneme lattice based texttiling towards multilingual story segmentation
Xiaoxuan Wang, Lei Xie, Bin Ma, Eng Siong Chng, Haizhou Li
The characterization of the relative information content by spectral features for the objective intelligibility assessment of nonlinearly processed speech
Anton Schlesinger, Marinus M. Boone
Analytical assessment and distance modeling of speech transmission quality
Marcel Wältermann, Alexander Raake, Sebastian Möller
An intrusive super-wideband speech quality model: DIAL
Nicolas Côté, Vincent Koehl, Valérie Gautier-Turbin, Alexander Raake, Sebastian Möller
It takes two to tango - assessing the impact of delay on conversational interactivity on perceived speech quality
Sebastian Egger, Raimund Schatz, Stefan Scherer
Comparison of approaches for instrumentally predicting the quality of text-to-speech systems
Sebastian Möller, Florian Hinterleitner, Tiago H. Falk, Tim Polzehl
A hybrid architecture for mobile voice user interfaces
Imre Kiss, Joseph Polifroni, Chao Wang, Ghinwa Choueiter, Mike Phillips
Assessment of spoken and multimodal applications: lessons learned from laboratory and field studies
Markku Turunen, Jaakko Hakulinen, Tomi Heimonen
Improving cross database prediction of dialogue quality using mixture of experts
Klaus-Peter Engelbrecht, Hamed Ketabdar, Sebastian Möller
Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations
Camille Guinaudeau, Guillaume Gravier, Pascale Sébillot
The relevance of timing, pauses and overlaps in dialogues: detecting topic changes in scenario based meetings
Saturnino Luz, Jing Su
Semi-supervised part-of-speech tagging in speech applications
Richard Dufour, Benoit Favre
Memory-based active learning for French broadcast news
Frédéric Tantini, Christophe Cerisara, Claire Gardent
Can conversational word usage be used to predict speaker demographics?
Dan Gillick
Prosodic word-based error correction in speech recognition using prosodic word expansion and contextual information
Chao-Hong Liu, Chung-Hsien Wu
Fully automatic segmentation for prosodic speech corpora
Sarah Hoffmann, Beat Pfister
A novel text-independent phonetic segmentation algorithm based on the microcanonical multiscale formalism
Vahid Khanagha, Khalid Daoudi, Oriol Pont, Hussein Yahia
Phone boundary detection using sample-based acoustic parameters
You-Yu Lin, Yih-Ru Wang, Yuan-Fu Liao
HMM-based automatic visual speech segmentation using facial data
Utpala Musti, Asterios Toutios, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger
Bayes factor based speaker segmentation for speaker diarization
D. Wang, Robert Vogt, Sridha Sridharan
Using high-level information to detect key audio events in a tennis game
Qiang Huang, Stephen Cox
What do you mean, you're uncertain?: the interpretation of cue words and rising intonation in dialogue
Catherine Lai
Coping imbalanced prosodic unit boundary detection with linguistically-motivated prosodic features
Yi-Fen Liu, Shu-Chuan Tseng, Jyh-Shing Roger Jang, C.-H. Alvin Chen
Improving prosodic phrase prediction by unsupervised adaptation and syntactic features extraction
Zhigang Chen, Guoping Hu, Wei Jiang
Perception-based automatic approximation of F0 contours in Cantonese speech
Yujia Li, Tan Lee
Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data
Raul Fernandez, Bhuvana Ramabhadran
Prosody for the eyes: quantifying visual prosody using guided principal component analysis
Erin Cvejic, Jeesun Kim, Chris Davis, Guillaume Gibert
Parallel lexical-tree based LVCSR on multi-core processors
Naveen Parihar, Ralf Schlüter, David Rybach, Eric A. Hansen
Exploring recognition network representations for efficient speech inference on highly parallel platforms
Jike Chong, Ekaterina Gonina, Kisun You, Kurt Keutzer
WFST compression for automatic speech recognition
Diamantino Caseiro
Speech recognizer optimization under speed constraints
Ivan Bulyko
The 2010 CMU GALE speech-to-text system
Florian Metze, Roger Hsiao, Qin Jin, Udhyakumar Nallasamy, Tanja Schultz
Speaker diarization in meeting audio for single distant microphone
Tin Lay Nwe, Hanwu Sun, Bin Ma, Haizhou Li
Extending the punctuation module for european portuguese
Fernando Batista, Helena Moniz, Isabel Trancoso, Hugo Meinedo, Ana Isabel Mata, Nuno Mamede
Utilizing a noisy-channel approach for Korean LVCSR
Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura
The RWTH 2009 quaero ASR evaluation system for English and German
Markus Nußbaum-Thom, Simon Wiesler, Martin Sundermeyer, Christian Plahl, Stefan Hahn, Ralf Schlüter, Hermann Ney
When is indexical information about speech activated? evidence from a cross-modal priming experiment
Benjamin Munson, Renata Solum
The influence of actual and perceived sexual orientation on diadochokinetic rate in women and men
Benjamin Munson
Laryngealization and features for Chinese tonal recognition
Kristine M. Yu
Production and perception of vietnamese short vowels in V1V2 context
Viet Son Nguyen, Eric Castelli, René Carré
Measuring basic tempo across languages and some implications for speech rhythm
Gertraud Fenk-Oczlon, August Fenk
Durational structure of Japanese single/geminate stops in three- and four-mora words spoken at varied rates
Yukari Hirata, Shigeaki Amano
Distribution and trichotomic realization of voiced velars in Japanese - an experimental study
Shin-ichiro Sano, Tomohiko Ooigawa
Specification in context - devoicing processes in Polish, French, american English and German sonorants
Jagoda Sieczkowska, Bernd Möbius, Grzegorz Dogil
Phonetic imitation of Japanese vowel devoicing
Kuniko Nielsen
Post-aspiration in standard Italian: some first cross-regional acoustic evidence
Mary Stevens, John Hajek
Articulatory grounding of southern salentino harmony processes
Mirko Grimaldi, Andrea Calabrese, Francesco Sigona, Luigina Garrapa, Bianca Sisinni
Effects of accent typicality and phonotactic frequency on nonword immediate serial recall performance in Japanese
Yuuki Tanida, Taiji Ueno, Satoru Saito, Matthew A. Lambon Ralph
How abstract is phonetics?
Osamu Fujimura
Data-driven analysis of realtime vocal tract MRI using correlated image regions
Adam C. Lammert, Michael I. Proctor, Shrikanth S. Narayanan
Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis
Michael I. Proctor, Daniel Bone, Athanasios Katsamanis, Shrikanth S. Narayanan
Improved real-time MRI of oral-velar coordination using a golden-ratio spiral view order
Yoon-Chul Kim, Shrikanth S. Narayanan, Krishna S. Nayak
Statistical multi-stream modeling of real-time MRI articulatory speech data
Erik Bresch, Athanasios Katsamanis, Louis Goldstein, Shrikanth S. Narayanan
Predicting unseen articulations from multi-speaker articulatory models
G. Ananthakrishnan, Pierre Badin, Julián Andrés Valdés Vargas, Olov Engwall
Estimating missing data sequences in x-ray microbeam recordings
Chao Qin, Miguel Á. Carreira-Perpiñán
Adaptation of a tongue shape model by local feature transformations
Chao Qin, Miguel Á. Carreira-Perpiñán, Mohsen Farhadloo
Vocal tract contour analysis of emotional speech by the functional data curve representation
Sungbok Lee, Shrikanth S. Narayanan
Locally-weighted regression for estimating the forward kinematics of a geometric vocal tract model
Adam C. Lammert, Louis Goldstein, Khalil Iskarous
Identifying articulatory goals from kinematic data using principal differential analysis
Michael Reimer, Frank Rudzicz
Estimation of speech lip features from discrete cosinus transform
Zuheng Ming, Denis Beautemps, Gang Feng, Sébastien Schmerber
Autoregressive modelling for linear prediction of ultrasonic speech
Farzaneh Ahmadi, Ian V. McLoughlin, Hamid R. Sharifzadeh
Enhanced speech yielding higher intelligibility for all listeners and environments
Takayuki Arai, Nao Hodoshima
Quality conversion of non-acoustic signals for facilitating human-to-human speech communication under harsh acoustic conditions
Seyed Omid Sadjadi, Sanjay A. Patil, John H. L. Hansen
The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion
Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano
A new binary mask based on noise constraints for improved speech intelligibility
Gibak Kim, Philipos C. Loizou
Energy reallocation strategies for speech enhancement in known noise conditions
Yan Tang, Martin Cooke
Effects of enhancement of spectral changes on speech quality and subjective speech intelligibility
Jing Chen, Thomas Baer, Brian C. J. Moore
Prior information for rapid speaker adaptation
Catherine Breslin, K. K. Chin, Mark J. F. Gales, Kate Knill, Haitian Xu
Discriminative adaptation for log-linear acoustic models
Jonas Lööf, Ralf Schlüter, Hermann Ney
Automatic speech recognition of multiple accented English data
Dimitra Vergyri, Lori Lamel, Jean-Luc Gauvain
Shrinkage model adaptation in automatic speech recognition
Jinyu Li, Yu Tsao, Chin-Hui Lee
Unscented transform with online distortion estimation for HMM adaptation
Jinyu Li, Dong Yu, Yifan Gong, L. Deng
HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition
Michael L. Seltzer, Alex Acero
CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection
Dong Wang, Simon King, Nicholas Evans, Raphaël Troncy
Improved spoken term detection by feature space pseudo-relevance feedback
Chia-ping Chen, Hung-yi Lee, Ching-feng Yeh, Lin-shan Lee
Towards spoken term discovery at scale with zero resources
Aren Jansen, Kenneth Church, Hynek Hermansky
Vocabulary independent spoken query: a case for subword units
Evandro Gouvêa, Tony Ezzat
Extractive speech summarization - from the view of decision theory
Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen
The impact of ASR on abstractive vs. extractive meeting summaries
Gabriel Murray, Giuseppe Carenini, Raymond Ng
Binary coding of speech spectrograms using a deep auto-encoder
L. Deng, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, G. Hinton
A super-resolution spectrogram using coupled PLCA
Juhan Nam, Gautham J. Mysore, Joachim Ganseman, Kyogu Lee, Jonathan S. Abel
Fast least-squares solution for sinusoidal, harmonic and quasi-harmonic models
Georgios Tzedakis, Yannis Pantazis, Olivier Rosec, Yannis Stylianou
Sparse component analysis for speech recognition in multi-speaker environment
Afsaneh Asaei, Hervé Bourlard, Philip N. Garner
Intra-frame variability as a predictor of frame classifiability
Trond Skogstad, Torbjørn Svendsen
Autocorrelation and double autocorrelation based spectral representations for a noisy word recognition system
Tetsuya Shimamura, Ngoc Dinh Nguyen
Maximum a posteriori voice conversion using sequential monte carlo methods
Elina Helander, Hanna Silén, Joaquin Míguez, Moncef Gabbouj
Dynamic model selection for spectral voice conversion
Pierre Lanchantin, Xavier Rodet
Speaker-independent HMM-based voice conversion using quantized fundamental frequency
Takashi Nose, Takao Kobayashi
Probabilistic integration of joint density model and speaker model for voice conversion
Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, Nobuaki Minematsu
Text-independent F0 transformation with non-parallel data for voice conversion
Zhi-Zheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li
A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion
Xiaodan Zhuang, Lijuan Wang, Frank K. Soong, Mark Hasegawa-Johnson
Influence of lexical tones on intonation in kammu
Anastasia Karlsson, David House, Jan-Olof Svantesson, Damrong Tayanin
Phonetic realization of second occurrence focus in Japanese
Satoshi Nambu, Yong-cheol Lee
Prosodic grouping and relative clause disambiguation in Mandarin
Jianjing Kuang
Text-based unstressed syllable prediction in Mandarin
Ya Li, Jianhua Tao, Meng Zhang, Shifeng Pan, Xiaoying Xu
flat pitch accents in Czech
Tomáš Duběda
Positional variability of pitch accents in Czech
Tomáš Duběda
Modeling of sentence-medial pauses in bangla readout speech: occurrence and duration
Shyamal Das Mandal, Arup Saha, Tulika Basu, Keikichi Hirose, Hiroya Fujisaki
Declarative sentence intonation patterns in 8 swiss German dialects
Adrian Leemann, Lucy Zuberbühler
Syllable-level prominence detection with acoustic evidence
Je Hun Jeon, Yang Liu
Prosody cues for classification of the discourse particle "hã" in hindi
Sankalan Prasad, Kalika Bali
Interaction of syntax-marked focus and wh-question induced focus in standard Chinese
Yuan Jia, Aijun Li
Prominence detection in Swedish using syllable correlates
Samer Al Moubayed, Jonas Beskow
Automatic analysis of the intonation of a tone language. applying the momel algorithm to spontaneous standard Chinese (beijing)
Na Zhi, Daniel Hirst, Pier Marco Bertinetto
Towards long-range prosodic attribute modeling for language recognition
Raymond W. M. Ng, Cheung-Chi Leung, Ville Hautamäki, Tan Lee, Bin Ma, Haizhou Li
A modified parameterization of the Fujisaki model
Robert Schubert, Oliver Jokisch, Diane Hirschfeld
Within and across sentence boundary language model
Saeedeh Momtazi, Friedrich Faubel, Dietrich Klakow
Impact of word classing on shrinkage-based language models
Ruhi Sarikaya, Stanley F. Chen, Abhinav Sethy, Bhuvana Ramabhadran
Combination of probabilistic and possibilistic language models
Stanislas Oger, Vladimir Popescu, Georges Linarès
On-demand language model interpolation for mobile speech input
Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk
Text normalization based on statistical machine translation and internet user support
Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Schultz
Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension
Tanel Alumäe, Mikko Kurimo
Similar n-gram language model
Christian Gillot, Christophe Cerisara, David Langlois, Jean-Paul Haton
Topic and style-adapted language modeling for Thai broadcast news ASR
Markpong Jongtaveesataporn, Sadaoki Furui
Augmented context features for Arabic speech recognition
Ahmad Emami, Hong-Kwang J. Kuo, Imed Zitouni, Lidia Mangu
A statistical segment-based approach for spoken language understanding
Lucía Ortega, Isabel Galiano, Lluís-F. Hurtado, Emilio Sanchis, Encarna Segarra
Improving back-off models with bag of words and hollow-grams
Benjamin Lecouteux, Raphaël Rubino, Georges Linarès
Study on interaction between entropy pruning and kneser-ney smoothing
Ciprian Chelba, Thorsten Brants, Will Neveitt, Peng Xu
Dynamic language model adaptation using keyword category classification
Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki, Koichi Shinoda
Integration of cache-based model and topic dependent class model with soft clustering and soft voting
Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa
Conditional models for detecting lambda-functions in a spoken language understanding system
Fréderic Duvert, Renato De Mori
Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation
Md. Akmal Haidar, Douglas O'Shaughnessy
Automatic speech recognition system channel modeling
Qun Feng Tan, Kartik Audhkhasi, Panayiotis G. Georgiou, Emil Ettelaie, Shrikanth S. Narayanan
Round-robin discrimination model for reranking ASR hypotheses
Takanobu Oba, Takaaki Hori, Atsushi Nakamura
On-the-fly lattice rescoring for real-time automatic speech recognition
Haşim Sak, Murat Saraçlar, Tunga Güngör
Cantonese tone word learning by tone and non-tone language speakers
Angela Cooper, Yue Wang
Validation of a training method for L2 continuous-speech segmentation
Anne Cutler, Janise Shanley
Linguistic rhythm in foreign accent
Jiahong Yuan
The effect of a word embedded in a sentence and speaking rate variation on the perceptual training of geminate and singleton consonant distinction
Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka
Foreign accent matters most when timing is wrong
Chiharu Tsurutani
Effects of Korean learners' consonant cluster reduction strategies on English speech recognition performance
Hyejin Hong, Jina Kim, Minhwa Chung
The effects of EMA-based augmented visual feedback on the English speakers' acquisition of the Japanese flap: a perceptual study
June S. Levitt, William F. Katz
Perception of voiceless fricatives by Japanese listeners of advanced and intermediate level English proficiency
Hinako Masuda, Takayuki Arai
Perception of estonian vowel categories by native and non-native speakers
Lya Meister, Einar Meister
Spoken English assessment system for non-native speakers using acoustic and prosodic features
Qin Shi, Kun Li, ShiLei Zhang, Stephen M. Chu, Ji Xiao, ZhiJian Ou
Russian infants and children's sounds and speech corpuses for language acquisition studies
Elena E. Lyakso, Olga V. Frolova, Anna V. Kurazhova, Julia S. Gaikova
Language-specific influence on phoneme development: French and drehu data
Julia Monnin, Hélène Lœvenbruck
Did you say susi or shushi? measuring the emergence of robust fricative contrasts in English- and Japanese-acquiring children
Jeffrey J. Holliday, Mary E. Beckman, Chanelle Mays
An empirical comparison of the t3, juicer, HDecode and sphinx3 decoders
Josef R. Novak, Paul R. Dixon, Sadaoki Furui
Tracter: a lightweight dataflow framework
Philip N. Garner, John Dines
Verifying pronunciation dictionaries using conflict analysis
Marelie H. Davel, Febe de Wet
Automatic estimation of transcription accuracy and difficulty
Brandon C. Roy, Soroush Vosoughi, Deb Roy
Creating a linguistic plausibility dataset with non-expert annotators
Benjamin Lambert, Rita Singh, Bhiksha Raj
Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition
Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura
Building transcribed speech corpora quickly and cheaply for many languages
Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, Mike LeBeau
The CHiME corpus: a resource and a challenge for computational hearing in multisource environments
Heidi Christensen, Jon Barker, Ning Ma, Phil D. Green
Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training
Wen Cao, Dongning Wang, Jinsong Zhang, Ziyu Xiong
How children acquire situation understanding skills?: a developmental analysis utilizing multimodal speech behavior corpus
Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa
The influence of expertise and efficiency on modality selection strategies and perceived mental effort
Ina Wechsung, Stefan Schaffer, Robert Schleicher, Anja Naumann, Sebastian Möller
Parameters describing multimodal interaction - definitions and three usage scenarios
Christine Kühnel, Benjamin Weiss, Sebastian Möller
Repair strategies on trial: which error recovery do users like best?
Alexander Zgorzelski, Alexander Schmitt, Tobias Heinroth, Wolfgang Minker
Say what? why users choose to speak their web queries
Maryam Kamvar, Doug Beeferman
The effect of audience familiarity on the perception of modified accent
Jonathan Teutenberg, Catherine I. Watson
On generating combilex pronunciations via morphological analysis
Korin Richmond, Robert A. J. Clark, Sue Fitt
Say it as you mean it - analyzing free user comments in the VOICE awards corpus
Florian Gödde, Sebastian Möller
A new multichannel multi modal dyadic interaction database
Viktor Rozgić, Bo Xiao, Athanasios Katsamanis, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan
SEAME: a Mandarin-English code-switching speech corpus in south-east asia
Dau-Cheng Lyu, Tien-Ping Tan, Eng Siong Chng, Haizhou Li
Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database
Daniel Felps, Christian Geng, Michael Berger, Korin Richmond, Ricardo Gutierrez-Osuna
Investigating articulatory setting - pauses, ready position, and rest - using real-time MRI
Vikram Ramanarayanan, Dani Byrd, Louis Goldstein, Shrikanth S. Narayanan
Articulatory inversion of american English /turnr/ by conditional density modes
Chao Qin, Miguel Á. Carreira-Perpiñán
Can tongue be recovered from face? the answer of data-driven statistical models
Atef Ben Youssef, Pierre Badin, Gérard Bailly
Phrase-medial vowel devoicing in spontaneous French
Francisco Torreira, Mirjam Ernestus
Exploring the mechanism of tonal contraction in taiwan Mandarin
Chierh Cheng, Yi Xu, Michele Gubian
Voice attributes affecting likability perception
Benjamin Weiss, Felix Burkhardt
Turn-alignment using eye-gaze and speech in conversational interaction
Kristiina Jokinen, Kazuaki Harada, Masafumi Nishida, Seiichi Yamamoto
An investigation of formant frequencies for cognitive load classification
Tet Fei Yap, Julien Epps, Eliathamby Ambikairajah, Eric H. C. Choi
Language specific effects of emotion on phoneme duration
Martijn Goudbeek, Mirjam Broersma
Automatic classification of married couples' behavior using audio features
Matthew Black, Athanasios Katsamanis, Chi-Chun Lee, Adam C. Lammert, Brian R. Baucom, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Influence of gestural salience on the interpretation of spoken requests
Gideon Kowadlo, Patrick Ye, Ingrid Zukerman
Robust word recognition using articulatory trajectories and gestures
Vikramjit Mitra, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, Louis Goldstein
Performance estimation of noisy speech recognition considering recognition task complexity
Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino
Estimating noise from noisy speech features with a monte carlo variant of the expectation maximization algorithm
Friedrich Faubel, Dietrich Klakow
Template-based spectral estimation using microphone array for speech recognition
Satoshi Tamura, Eriko Hishikawa, Wataru Taguchi, Satoru Hayamizu
A particle filter feature compensation approach to robust speech recognition
Aleem Mushtaq, Yu Tsao, Chin Hui-Lee
Nonlinear enhancement of onset for robust speech recognition
Chanwoo Kim, Richard M. Stern
Mask estimation in non-stationary noise environments for missing feature based robust speech recognition
Shirin Badiezadegan, Richard C. Rose
Robust automatic speech recognition with decoder oriented ideal binary mask estimation
Lae-Hoon Kim, Kyung-Tae Kim, Mark Hasegawa-Johnson
A robust speech recognition system against the ego noise of a robot
Gökhan Ince, Kazuhiro Nakadai, Tobias Rodemann, Hiroshi Tsujino, Jun-ichi Imura
Empirical mode decomposition for noise-robust automatic speech recognition
Kuo-Hao Wu, Chia-Ping Chen
An effective feature compensation scheme tightly matched with speech recognizer employing SVM-based GMM generation
Wooil Kim, Jun-Won Suh, John H. L. Hansen
Artificial and online acquired noise dictionaries for noise robust ASR
Jort F. Gemmeke, Tuomas Virtanen
Voice activity detection based on conditional random fields using multiple features
Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
A comparative study of noise estimation algorithms for VTS-based robust speech recognition
Yong Zhao, Biing-Hwang Juang
On using missing-feature theory with cepstral features - approximations to the multivariate integral
Frank Seide, Pei Zhao
Using a DBN to integrate sparse classification and GMM-based ASR
Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves
Shape-invariant speech transformation with the phase vocoder
Axel Röbel
A phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity
Kayoko Yanagisawa, Mark Huckvale
Evaluation of speaker mimic technology for personalizing SGD voices
Esther Klabbers, Alexander Kain, Jan P. H. van Santen
Adaptive voice-quality control based on one-to-many eigenvoice conversion
Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano
Applying voice conversion to concatenative singing-voice synthesis
Fernando Villavicencio, Jordi Bonada
Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model
Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, Nobuaki Minematsu
A hierarchical F0 modeling method for HMM-based speech synthesis
Ming Lei, Yijian Wu, Frank K. Soong, Zhen-Hua Ling, Lirong Dai
Training a parametric-based logF0 model with the minimum generation error criterion
Javier Latorre, Mark J. F. Gales, Heiga Zen
Improving Mandarin segmental duration prediction with automatically extracted syntax features
Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu
An intonation model for TTS in sepedi
Daniel R. van Niekerk, Etienne Barnard
Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners
Michael Pucher, Dietmar Schabus, Junichi Yamagishi
A comparison of pronunciation modeling approaches for HMM-TTS
Gabriel Webster, Sacha Krstulović, Kate Knill
HMM-based text-to-articulatory-movement prediction and analysis of critical articulators
Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi
Audio-based sports highlight detection by fourier local auto-correlations
Jiaxing Ye, Takumi Kobayashi, Tetsuya Higuchi
Automatic excitement-level detection for sports highlights generation
Hynek Bořil, Abhijeet Sangwan, Taufiq Hasan, John H. L. Hansen
Detecting novel objects in acoustic scenes through classifier incongruence
Jörg-Hendrik Bach, Jörn Anemüller
A multidomain approach for automatic home environmental sound classification
Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis
Content-based advertisement detection
Patrick Cardinal, Vishwa Gupta, Gilles Boulianne
Identification of abnormal audio events based on probabilistic novelty detection
Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis
Lightly supervised recognition for automatic alignment of large coherent speech recordings
Norbert Braunschweiler, Mark J. F. Gales, Sabine Buchholz
Incremental diarization of telephone conversations
Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman
Audio analytics by template modeling and 1-pass DP based decoding
Srikanth Cherla, V. Ramasubramanian
Perceptual wavelet decomposition for speech segmentation
Mariusz Ziółko, Jakub Gałka, Bartosz Ziółko, Tomasz Drwiȩga
A comparative study of constrained and unconstrained approaches for segmentation of speech signal
Venkatesh Keri, Kishore Prahallad
Automatic discriminative measurement of voice onset time
Morgan Sonderegger, Joseph Keshet
Selective gammatone filterbank feature for robust sound event recognition
Yi Ren Leng, Huy Dat Tran, Norihide Kitaoka, Haizhou Li
Towards a robust face recognition system using compressive sensing
Allen Y. Yang, Zihan Zhou, Yi Ma, S. Shankar Sastry
Sparse representation features for speech recognition
Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Abhinav Sethy
Data selection for language modeling using sparse representations
Abhinav Sethy, Tara N. Sainath, Bhuvana Ramabhadran, Dimitri Kanevsky
Observation uncertainty measures for sparse imputation
Jort F. Gemmeke, Ulpu Remes, Kalle J. Palomäki
Sparse representations for text categorization
Tara N. Sainath, Sameer R. Maskey, Dimitri Kanevsky, Bhuvana Ramabhadran, David Nahamoo, Julia Hirschberg
Sparse auto-associative neural networks: theory and application to speech recognition
Garimella S. V. S. Sivaram, Sriram Ganapathy, Hynek Hermansky
FSM-based pronunciation modeling using articulatory phonological code
Chi Hu, Xiaodan Zhuang, Mark Hasegawa-Johnson
Detailed pronunciation variant modeling for speech transcription
Denis Jouvet, Dominique Fohr, Irina Illina
A minimum classification error approach to pronunciation variation modeling of non-native proper names
Line Adde, Bert Réveil, Jean-Pierre Martens, Torbjørn Svendsen
Acoustics-based phonetic transcription method for proper nouns
Antoine Laurent, Sylvain Meignier, Teva Merlin, Paul Deléglise
Wiktionary as a source for automatic pronunciation extraction
Tim Schlippe, Sebastian Ochs, Tanja Schultz
Learning new word pronunciations from spoken examples
Ibrahim Badr, Ian McGraw, James Glass
Phonetic subspace mixture model for speaker diarization
I-Fan Chen, Shih-Sian Cheng, Hsin-Min Wang
Overlap detection for speaker diarization by fusing spectral and spatial features
Martin Zelenák, Carlos Segura, Javier Hernando
Floor holder detection and end of speaker turn prediction in meetings
Alfred Dielmann, Giulia Garau, Hervé Bourlard
Confidence measures for speaker segmentation and their relation to speaker verification
Carlos Vaquero, Alfonso Ortega, Jesús Villalba, Antonio Miguel, Eduardo Lleida
Decoupling session variability modelling and speaker characterisation
Anthony Larcher, Christophe Lévy, Driss Matrouf, Jean-François Bonastre
Incorporating MAP estimation and covariance transform for SVM based speaker recognition
Cheung-Chi Leung, Donglai Zhu, Kong Aik Lee, Bin Ma, Haizhou Li
Single-speaker/multi-speaker co-channel speech classification
Stéphane Rossignol, Olivier Pietquin
Discriminative training for hierarchical clustering in speaker diarization
Oriol Vinyals, Gerald Friedland, Nelson Morgan
GMM-UBM based open-set online speaker diarization
Jürgen Geiger, Frank Wallhoff, Gerhard Rigoll
A segment-based non-parametric approach for monophone recognition
Ladan Golipour, Douglas O'Shaughnessy
A fast one-pass-training feature selection technique for GMM-based acoustic event detection with audio-visual data
Taras Butko, Climent Nadeu
Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition
Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
On the importance of glottal flow spectral energy for the recognition of emotions in speech
Ling He, Margaret Lech, Nicholas Allen
Real-life emotion-related states detection in call centers: a cross-corpora study
Laurence Devillers, Christophe Vaudable, Clément Chastagnol
Multi-class and hierarchical SVMs for emotion recognition
Ali Hassan, Robert I. Damper
Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm
David Hübner, Bogdan Vlasenko, Tobias Grosser, Andreas Wendemuth
Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling
Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn Schuller, Shrikanth S. Narayanan
Data-dependent evaluator modeling and its application to emotional valence classification from speech
Kartik Audhkhasi, Shrikanth S. Narayanan
Modelling speech line spectral frequencies with dirichlet mixture models
Zhanyu Ma, Arne Leijon
PDF-optimized LSF vector quantization based on beta mixture models
Zhanyu Ma, Arne Leijon
Non-linear predictive vector quantization of feature vectors for distributed speech recognition
Jose Enrique Garcia, Alfonso Ortega, Antonio Miguel, Eduardo Lleida
Superwideband extension of g.718 and g.729.1 speech codecs
Lasse Laaksonen, Mikko Tammi, Vladimir Malenovsky, Tommy Vaillancourt, Mi Suk Lee, Tomofumi Yamanashi, Masahiro Oshikiri, Claude Lamblin, Balazs Kovesi, Lei Miao, Deming Zhang, Jon Gibbs, Holly Francois
A multipulse FEC scheme based on amplitude estimation for CELP codecs over packet networks
José L. Carmona, Angel M. Gómez, Antonio M. Peinado, José L. Pérez-Córdoba, José A. González
Voice quality evaluation of recent open source codecs
Anssi Rämö, Henri Toukomaa
Efficient HMM-based estimation of missing features, with applications to packet loss concealment
Bengt J. Borgström, Per H. Borgström, Abeer Alwan
Speech inventory based discriminative training for joint speech enhancement and low-rate speech coding
Xiaoqiang Xiao, Robert M. Nickel
Quality-based playout buffering with FEC for conversational voIP
Qipeng Gong, Peter Kabal
Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding
Masatsune Tamura, Takehiko Kagoshima, Masami Akamine
A multimodal density function estimation approach to formant tracking
Sundar Harshavardhan, Chandra Sekhar Seelamantula, Thippur V. Sreenivas
Estimation studies of vocal tract shape trajectory using a variable length and lossy kelly-lochbaum model
Heikki Rasilo, Unto K. Laine, Okko Johannes Räsänen
A feature extraction method for automatic speech recognition based on the cochlear nucleus
Serajul Haque, Roberto Togneri
A phoneme recognition framework based on auditory spectro-temporal receptive fields
Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky
Perceptual compensation for effects of reverberation in speech identification: a computer model based on auditory efferent processing
Amy V. Beeston, Guy J. Brown
Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties
Barbara Schuppler, Mirjam Ernestus, Wim van Dommelen, Jacques Koreman
A speech-in-noise test based on spoken digits: comparison of normal and impaired listeners using a computer model
Matthew Robertson, Guy J. Brown, Wendy Lecluyse, Manasa Panda, Christine M. Tan
Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of paralinguistic information: a comparison with cochlear implant simulator
Takayuki Kagomiya, Seiji Nakagawa
Challenging the speech intelligibility index: macroscopic vs. microscopic prediction of sentence recognition in normal and hearing-impaired listeners
Tim Jürgens, Stefan Fredelake, Ralf M. Meyer, Birger Kollmeier, Thomas Brand
Does sentence complexity interfere with intelligibility in noise? evaluation of the oldenburg linguistically and audiologically controlled sentence test (OLACS)
Verena N. Uslar, Thomas Brand, Mirko Hanke, Rebecca Carroll, Esther Ruigendijk, Cornelia Hamann, Birger Kollmeier
Intelligibility predictions for speech against fluctuating masker
Juan-Pablo Ramirez, Hamed Ketabdar, Alexander Raake
An effect of formant amplitude in vowel perception
Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano
Functional imaging of brain regions sensitive to communication sounds in primates
Christopher I. Petkov, Benjamin Wilson
Strategies for statistical spoken language understanding with small amount of data - an empirical study
Ye-Yi Wang
Investigating multiple approaches for SLU portability to a new language
Bassam Jabaian, Laurent Besacier, Fabrice Lefèvre
Learning naturally spoken commands for a robot
Anja Austermann, Seiji Yamada, Kotaro Funakoshi, Mikio Nakano
A semi-supervised cluster-and-label approach for utterance classification
Amparo Albalate, Aparna Suchindranath, David Suendermann, Wolfgang Minker
Classifying dialog acts in human-human and human-machine spoken conversations
Silvia Quarteroni, Giuseppe Riccardi
Exploring speaker characteristics for meeting summarization
Fei Liu, Yang Liu
Semi-supervised extractive speech summarization via co-training algorithm
Shasha Xie, Hui Lin, Yang Liu
Extractive summarization using a latent variable model
Asli Celikyilmaz, Dilek Hakkani-Tür
Hierarchical classification for speech-to-speech translation
Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Rapid development of speech translation using consecutive interpretation
Matthias Paulik, Alex Waibel
Combining many alignments for speech to speech translation
Sameer R. Maskey, Steven J. Rennie, Bowen Zhou
Online SLU model adaptation with a partial oracle
Pierre Gotab, Geraldine Damnati, Frederic Bechet, Lionel Delphin-Poulat
Role of language models in spoken fluency evaluation
Om D. Deshmukh, Harish Doddala, Ashish Verma, Karthik Visweswariah
Social role discovery from spoken language using dynamic Bayesian networks
Sibel Yaman, Dilek Hakkani-Tür, Gokhan Tur
Domain adaptation and compensation for emotion detection
Michelle Hewlett Sanchez, Gokhan Tur, Luciana Ferrer, Dilek Hakkani-Tür
Phrase alignment confidence for statistical machine translation
Sankaranarayanan Ananthakrishnan, Rohit Prasad, Prem Natarajan
Named-entity projection and data-driven morphological decomposition for field maintainable speech-to-speech translation systems
Ian R. Lane, Alex Waibel
Detecting Politeness and efficiency in a cooperative social interaction
Paul M. Brunet, Marcela Charfuelan, Roderick Cowie, Marc Schröder, Hastings Donnan, Ellen Douglas-Cowie
Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity
Nick Campbell, Stefan Scherer
Resources for turn competition in overlap in multi-party conversations: speech rate, pausing and duration
Emina Kurtić, Guy J. Brown, Bill Wells
Disambiguating the functions of conversational sounds with prosody: the case of ‘yeah’
Khiet P. Truong, Dirk Heylen
Prosody and voice quality of vocal social signals: the case of dominance in scenario meetings
Marcela Charfuelan, Marc Schröder, Ingmar Steiner
The prosody of Swedish conversational grunts
D. Neiberg, J. Gustafson
Reliable tracking based on speech sample salience of vocal cycle length perturbations
Christophe Mertens, Francis Grenez, Lise Crevier-Buchman, Jean Schoentgen
Longitudinal changes of selected voice source parameters
Hideki Kasuya, Hajime Yoshida, Satoshi Ebihara, Hiroki Mori
Automatic perceptual categorization of disordered connected speech
Ali Alpan, Jean Schoentgen, Youri Maryn, Francis Grenez
Kinematic analysis of tongue movement control in spastic dysarthria
Heejin Kim, Panying Rong, Torrey M. Loucks, Mark Hasegawa-Johnson
Pre- and short-term posttreatment vocal functioning in patients with advanced head and neck cancer treated with concomitant chemoradiotherapy
Irene Jacobi, Lisette van der Molen, Maya van Rossum, Frans Hilgers
Acoustic analysis of intonation in parkinson's disease
Joan K. Y. Ma, Rüdiger Hoffmann
A hybrid approach to online speaker diarization
Carlos Vaquero, Oriol Vinyals, Gerald Friedland
System output combination for improved speaker diarization
Simon Bozonnet, Nicholas Evans, Xavier Anguera, Oriol Vinyals, Gerald Friedland, Corinne Fredouille
An integrated top-down/bottom-up approach to speaker diarization
Simon Bozonnet, Nicholas Evans, Corinne Fredouille, Dong Wang, Raphaël Troncy
Advances in fast multistream diarization based on the information bottleneck framework
Deepu Vijayasenan, Fabio Valente, Hervé Bourlard
Audio-visual synchronisation for speaker diarisation
Giulia Garau, Alfred Dielmann, Hervé Bourlard
An improved cluster model selection method for agglomerative hierarchical speaker clustering using incremental Gaussian mixture models
Kyu J. Han, Shrikanth S. Narayanan
Dialog prediction for a general model of turn-taking
Nigel G. Ward, Olac Fuentes, Alejandro Vega
Speaker tracking in an unsupervised speech controlled system
Tobias Herbig, Franz Gerl, Wolfgang Minker
MultiBIC: an improved speaker segmentation technique for TV shows
Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo
Automatic speech recognition for assistive writing in speech supplemented word prediction
John-Paul Hosom, Tom Jakobs, Allen Baker, Susan Fager
Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition
Alexey Karpov, Andrey Ronzhin, Konstantin Markov, Miloš Železný
Audio-visual anticipatory coarticulation modeling by human and machine
Louis H. Terry, Karen Livescu, Janet B. Pierrehumbert, Aggelos K. Katsaggelos
Impact of lack of acoustic feedback in EMG-based silent speech recognition
Matthias Janke, Michael Wand, Tanja Schultz
Using prosody to improve Mandarin automatic speech recognition
Chong-Jia Ni, Wenju Liu, Bo Xu
A robust audio-visual speech recognition using audio-visual voice activity detection
Satoshi Tamura, Masato Ishikawa, Takashi Hashiba, Shin'ichi Takeuchi, Satoru Hayamizu
Efficient manycore CHMM speech recognition for audiovisual and multistream data
Dorothea Kolossa, Jike Chong, Steffen Zeiler, Kurt Keutzer
Two-layered audio-visual integration in voice activity detection and automatic speech recognition for robots
Takami Yoshida, Kazuhiro Nakadai
Non-audible murmur recognition based on fusion of audio and visual streams
Panikos Heracleous, Norihiro Hagita
Improved n-gram phonotactic models for language recognition
Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel
A study of term weighting in phonotactic approach to spoken language recognition
Sirinoot Boonsuk, Donglai Zhu, Bin Ma, Atiwong Suchato, Proadpran Punyabukkana, Nattanun Thatphithakkul, Chai Wutiwiwatchai
Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition
Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, Chin-Hui Lee
Hierarchical multilayer perceptron based language identification
David Imseng, Mathew Magimai Doss, Hervé Bourlard
The NIST 2010 speaker recognition evaluation
Alvin F. Martin, Craig S. Greenberg
Bayesian speaker recognition using Gaussian mixture model and laplace approximation
Shih-Sian Cheng, I-Fan Chen, Hsin-Min Wang
What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering
Tomi Kinnunen, Rahim Saeidi, Johan Sandberg, Maria Hansson-Sandsten
Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework
Achintya Kumar Sarkar, S. Umesh
Graph-embedding for speaker recognition
Zahi N. Karam, William M. Campbell
A hybrid modeling strategy for GMM-SVM speaker recognition with adaptive relevance factor
Chang Huai You, Haizhou Li, Kong Aik Lee
Robust mixture modeling using t-distribution: application to speaker ID
Sundar Harshavardhan, Thippur V. Sreenivas
A variable frame length and rate algorithm based on the spectral kurtosis measure for speaker verification
Chi-Sang Jung, Kyu J. Han, Hyunson Seo, Shrikanth S. Narayanan, Hong-Goo Kang
Near field sound source localization based on cross-power spectrum phase analysis with multiple microphones
Kohei Hayashida, Masanori Morise, Takanobu Nishiura
A maximum a posteriori sound source localization in reverberant and noisy conditions
Jinho Choi, Chang D. Yoo
Multichannel source separation based on source location cue with log-spectral shaping by hidden Markov source model
Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto
A DOA estimation algorithm based on equalization-cancellation theory
Duc Thanh Chau, Junfeng Li, Masato Akagi
Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing
Tania Habib, Harald Romsdorfer
On using Gaussian mixture model for double-talk detection in acoustic echo suppression
Ji-Hyun Song, Kyu-Ho Lee, Yun-Sik Park, Sang-Ick Kang, Joon-Hyuk Chang
Catalog-based single-channel speech-music separation
Cemil Demir, A. Taylan Cemgil, Murat Saraçlar
Unvoiced speech segregation based on CASA and spectral subtraction
Ke Hu, DeLiang Wang
Unsupervised sequential organization for cochannel speech separation
Ke Hu, DeLiang Wang
The INTERSPEECH 2010 paralinguistic challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Felix Burkhardt, Laurence Devillers, Christian Müller, Shrikanth S. Narayanan
Age and gender classification from speech using decision level fusion and ensemble based techniques
Florian Lingenfelser, Johannes Wagner, Thurid Vogt, Jonghwa Kim, Elisabeth André
Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence
Je Hun Jeon, Rui Xia, Yang Liu
Fuzzy support vector machines for age and gender classification
Phuoc Nguyen, Trung Le, Dat Tran, Xu Huang, Dharmendra Sharma
Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation
Rok Gajšek, Janez Žibert, Tadej Justin, Vitomir Štruc, Boštjan Vesnicer, France Mihelič
Age recognition based on speech signals using weights supervector
Royi Porat, Dan Lange, Yaniv Zigel
Age and gender classification using fusion of acoustic and prosodic features
Hugo Meinedo, Isabel Trancoso
Brno university of technology system for interspeech 2010 paralinguistic challenge
Marcel Kockmann, Lukáš Burget, Jan Černocký
Combining five acoustic level modeling methods for automatic speaker age and gender recognition
Ming Li, Chi-Sang Jung, Kyu J. Han
Age and gender recognition based on multiple systems - early vs. late fusion
Tobias Bocklet, Georg Stemmer, Viktor Zeissler, Elmar Nöth
Automatic speaker age and gender recognition in the car for tailoring dialog and mobile services
Michael Feld, Felix Burkhardt, Christian Müller
Acoustic correlates of voice quality improvement by voice training
Kiyoaki Aikawa, Junko Uenuma, Tomoko Akitake
Phonetic segmentation of singing voice using MIDI and parallel speech
Minghui Dong, Paul Chan, Ling Cen, Haizhou Li, Jason Teo, Ping Jen Kua
A singing style modeling system for singing voice synthesizers
Keijiro Saino, Makoto Tachibana, Hideki Kenmochi
A fast query by humming system based on notes
Jingzhou Yang, Jia Liu, Wei-Qiang Zhang
Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model
Seokhwan Jo, Sihyun Joo, Chang D. Yoo
Modified spatial audio object coding scheme with harmonic extraction and elimination structure for interactive audio service
Jihoon Park, Kwangki Kim, Jeongil Seo, Minsoo Hahn
Modelling the effect of speaker familiarity and noise on infant word recognition
Christina Bergmann, Michele Gubian, Lou Boves
Unsupervised learning of vowels from continuous speech based on self-organized phoneme acquisition model
Kouki Miyazawa, Hideaki Kikuchi, Reiko Mazuka
Learning speaker normalization using semisupervised manifold alignment
Andrew R. Plummer, Mary E. Beckman, Mikhail Belkin, Eric Fosler-Lussier, Benjamin Munson
Fully unsupervised word learning from continuous speech using transitional probabilities of atomic acoustic events
Okko Johannes Räsänen
Language acquisition and cross-modal associations: computational simulation of the result of infant studies
Louis ten Bosch, Lou Boves
Active word learning under uncertain input conditions
Maarten Versteegh, Louis ten Bosch, Lou Boves
Combining text categorization and dialog modeling for speaker role identification on call center conversations
Rémi Lavalley, Chloé Clavel, Patrice Bellot, Marc El-Bèze
Topic-dependent n-gram models based on optimization of context lengths in LDA
Akira Nakamura, Satoru Hayamizu
Expectations for discourse genre identification: a prosodic study
Nicolas Obin, Volker Dellwo, Anne Lacheret, Xavier Rodet
Dialogue act tagging and segmentation with a single perceptron
Ramon Granell, Stephen Pulman, Carlos-D. Martínez-Hinarejos, José Miguel Benedí
Improving the readability of class lecture ASR results using a confusion network
Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa
Toward detecting voice activity employing soft decision in second-order conditional MAP
Sang-Kyun Kim, Jae-Hun Choi, Sang-Ick Kang, Ji-Hyun Song, Joon-Hyuk Chang
Voice activity detection in a reguarized reproducing kernel hilbert space
Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura
A new VAD framework using statistical model and human knowledge based empirical rule
Ji Wu, Xiao-lei Zhang, Wei Li
Adaptive high accuracy approaches to speech activity detection in noisy and hostile audio environments
Mark Huggins, Brett Smolenski, Aaron Lawson
Robust voice activity detection in stereo recording with crosstalk
Prasanta Kumar Ghosh, Andreas Tsiartas, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization
Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani
Spectral entropy-based voice activity detector for videoconferencing systems
Bowon Lee, Debargha Muhkerjee
The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms
David Dean, Sridha Sridharan, Robert Vogt, Michael Mason
A Bayesian approach to voice activity detection using multiple statistical models and discriminative training
Tao Yu, John H. L. Hansen
Noise robust voice activity detection using features extracted from the time-domain autocorrelation function
Houman Ghaemmaghami, Brendan Baker, Robert Vogt, Sridha Sridharan
VAD-measure-embedded decoder with online model adaptation
Tasuku Oonishi, Koji Iwano, Sadaoki Furui
Robust statistical voice activity detection using a likelihood ratio sign test
Shiwen Deng, Jiqing Han
Automatic turn segmentation in spoken conversations
Alexei V. Ivanov, Giuseppe Riccardi
Turn taking-based conversation detection by using DOA estimation
Yohei Kawaguchi, Masahito Togami, Yasunari Obuchi
Article |
---|