Interspeech 2011

Florence, Italy
27-31 August 2011

General Chairs: Piero Cosi, Renato De Mori
doi: 10.21437/Interspeech.2011

HMM-Based Speech Synthesis I, II

Decision tree-based clustering with outlier detection for HMM-based speech synthesis
Kyung Hwan Oh, June Sig Sung, Doo Hwa Hong, Nam Soo Kim

Prediction of voice aperiodicity based on spectral representations in HMM speech synthesis
Hanna Silén, Elina Helander, Moncef Gabbouj

A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM
Takashi Nose, Takao Kobayashi

Multi-speaker modeling with shared prior distributions and model structures for Bayesian speech synthesis
Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Feature-space transform tying in unified acoustic-articulatory modelling for articulatory control of HMM-based speech synthesis
Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi

The effect of using normalized models in statistical speech synthesis
Matt Shannon, Heiga Zen, William Byrne

Continuous control of the degree of articulation in HMM-based speech synthesis
Benjamin Picart, Thomas Drugman, Thierry Dutoit

Estimation of window coefficients for dynamic feature extraction for HMM-based speech synthesis
Ling-Hui Chen, Yoshihiko Nankaku, Heiga Zen, Keiichi Tokuda, Zhen-Hua Ling, Li-Rong Dai

Inverse filtering based harmonic plus noise excitation model for HMM-based speech synthesis
Zhengqi Wen, Jianhua Tao

Improved HNM-based vocoder for statistical synthesizers
Daniel Erro, Iñaki Sainz, Eva Navas, Inma Hernáez

A statistical phrase/accent model for intonation modeling
Gopala Krishna Anumanchipalli, Luís C. Oliveira, Alan W. Black

Intermediate-state HMMs to capture continuously-changing signal features
Gustav Eje Henter, W. Bastiaan Kleijn

Automatic sentence selection from speech corpora including diverse speech for improved HMM-TTS synthesis quality
Norbert Braunschweiler, Sabine Buchholz

Phonological knowledge guided HMM state mapping for cross-lingual speaker adaptation
Hui Liang, John Dines

Reformulating prosodic break model into segmental HMMs and information fusion
Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet

Multipulse sequences for residual signal modeling
Ranniery Maia, Heiga Zen, Kate Knill, M. J. F. Gales, Sabine Buchholz

Can objective measures predict the intelligibility of modified HMM-based synthetic speech in noise?
Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King

Speech synthesis based on articulatory-movement HMMs with voice-source codebooks
Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada

Large-scale subjective evaluations of speech rate control methods for HMM-based speech synthesizers
Tsuneo Kato, Makoto Yamada, Nobuyuki Nishizawa, Keiichiro Oura, Keiichi Tokuda

HMM-based emphatic speech synthesis using unsupervised context labeling
Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

Speaker Recognition - Modeling, Automatic Procedures, Analysis I-III

Restoring the residual speaker information in total variability modeling for speaker verification
Ce Zhang, Rong Zheng, Bo Xu

New developments in joint factor analysis for speaker verification
Hagai Aronowitz, Oren Barkan

Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories
Joaquin Gonzalez-Rodriguez

Discriminatively trained i-vector extractor for speaker verification
Ondřej Glembek, Lukáš Burget, Niko Brümmer, Oldřich Plchot, Pavel Matějka

Constrained cepstral speaker recognition using matched UBM and JFA training
Michelle Hewlett Sanchez, Luciana Ferrer, Elizabeth Shriberg, Andreas Stolcke

A new perspective on GMM subspace compensation based on PPCA and wiener filtering
Alan McCree, Douglas Sturim, Douglas Reynolds

Data-driven Gaussian component selection for fast GMM-based speaker verification
Ce Zhang, Rong Zheng, Bo Xu

Analysis of i-vector length normalization in speaker recognition systems
Daniel Garcia-Romero, Carol Y. Espy-Wilson

An analysis framework based on random subspace sampling for speaker verification
Weiwu Jiang, Zhifeng Li, Helen Meng

Factor analysis back ends for MLLR transforms in speaker recognition
Nicolas Scheffer, Yun Lei, Luciana Ferrer

Report on performance results in the NIST 2010 speaker recognition evaluation
Craig S. Greenberg, Alvin F. Martin, Bradford N. Barr, George R. Doddington

ivector fusion of prosodic and cepstral features for speaker verification
Marcel Kockmann, Luciana Ferrer, Lukáš Burget, Jan Černocký

i-vector based speaker recognition on short utterances
Ahilan Kanagasundaram, Robbie Vogt, David Dean, Sridha Sridharan, Michael Mason

Study of overlapped speech detection for NIST SRE summed channel speaker recognition
Hanwu Sun, Bin Ma

Super-dirichlet mixture models using differential line spectral frequencies for text-independent speaker identification
Zhanyu Ma, Arne Leijon

Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation
Hon-Bill Yu, Man-Wai Mak

Eigen-voice based anchor modeling system for speaker identification using MLLR super-vector
A. K. Sarkar, S. Umesh

Automatic detection of speaker attributes based on utterance text
Wen Wang, Andreas Kathol, Harry Bratt

Comparison of speaker recognition approaches for real applications
Sandro Cumani, Pier Domenico Batzu, Daniele Colibro, Claudio Vair, Pietro Laface, Vasileios Vasilakakis

Modeling speaker personality using voice
Tim Polzehl, Sebastian Möller, Florian Metze

Structural joint factor analysis for speaker recognition
Marc Ferràs, Koichi Shinoda, Sadaoki Furui

Acoustic forest for SMAP-based speaker verification
Sangeeta Biswas, Marc Ferràs, Koichi Shinoda, Sadaoki Furui

Mixture of auto-associative neural networks for speaker verification
G. S. V. S. Sivaram, Samuel Thomas, Hynek Hermansky

ASR - Feature Extraction I, II

Region dependent transform on MLP features for speech recognition
Tim Ng, Bing Zhang, Spyros Matsoukas, Long Nguyen

Discriminant sub-space projection of spectro-temporal speech features based on maximizing mutual information
Martin Heckmann, Claudius Gläser

Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura

Combining frame and segment level processing via temporal pooling for phonetic classification
Sumit Chopra, Patrick Haffner, Dimitrios Dimitriadis

Improved bottleneck features using pretrained deep neural networks
Dong Yu, Michael L. Seltzer

Minimum classification error based spectro-temporal feature extraction for robust audio classification
Yuan-Fu Liao, Chia-Hsing Lin, We-Der Fang

Integrating recent MLP feature extraction techniques into TRAP architecture
František Grézl, Martin Karafiát

Feature frame stacking in RNN-based tandem ASR systems - learned vs. predefined context
Martin Wöllmer, Björn Schuller, Gerhard Rigoll

Improved acoustic feature combination for LVCSR by neural networks
Christian Plahl, Ralf Schlüter, Hermann Ney

Hierarchical tandem features for ASR in Mandarin
Joel Pinto, Mathew Magimai-Doss, Hervé Bourlard

Analysis and comparison of recent MLP features for LVCSR systems
Fabio Valente, Mathew Magimai-Doss, Wen Wang

Deep learning of speech features for improved phonetic recognition
Jaehyung Lee, Soo-Young Lee

Globality-locality consistent discriminant analysis for phone classification
Heyun Huang, Yang Liu, Jort F. Gemmeke, Louis ten Bosch, Bert Cranen, Lou Boves

Front-end compensation methods for LVCSR under lombard effect
Hynek Bořil, František Grézl, John H. L. Hansen

Classification of fricatives using feature extrapolation of acoustic-phonetic features in telephone speech
Jung-Won Lee, Jeung-Yoon Choi, Hong-Goo Kang

Noise robust feature extraction based on extended weighted linear prediction in LVCSR
Sami Keronen, Jouni Pohjalainen, Paavo Alku, Mikko Kurimo

Comparing different flavors of spectro-temporal features for ASR
Bernd T. Meyer, Suman V. Ravuri, Marc René Schädler, Nelson Morgan

VTLN in the MFCC domain: band-limited versus local interpolation
Ehsan Variani, Thomas Schaaf

Multistream bandpass modulation features for robust speech recognition
Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali

An analysis of automatic speech recognition with multiple microphones
Davide Marino, Thomas Hain

Speaker Recognition - Analysis and Statistics I-III

Harmonic structure transform for speaker recognition
Kornel Laskowski, Qin Jin

Combining evidence from spectral and source-like features for person recognition from humming
Hemant A. Patil, Maulik C. Madhavi, Keshab K. Parhi

Improvements in speaker characterization using spectral subband energy based on harmonic plus noise model
Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo

Implicit segmentation in two-wire speaker recognition
Yosef A. Solewicz, Hagai Aronowitz

Boosting speaker recognition performance with compact representations
Sibel Yaman, Jason Pelecanos, Mohamed Kamal Omar

Partitioning of two-speaker conversation datasets
Carlos Vaquero, Alfonso Ortega, Eduardo Lleida

Intersession compensation and scoring methods in the i-vectors space for speaker recognition
Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre

Kernel alignment maximization for speaker recognition based on high-level features
Szymon Drgas, Adam Dabrowski

Kernel partial least squares for speaker recognition
Balaji Vasan Srinivasan, Daniel Garcia-Romero, Dmitry N. Zotkin, Ramani Duraiswami

Conversational-side-specific inter-session variability compensation
Mohamed Kamal Omar, Jason Pelecanos

A speaker line-up for the likelihood ratio
David A. van Leeuwen, Niko Brümmer

Towards fully Bayesian speaker recognition: integrating out the between-speaker covariance
Jesús Villalba, Niko Brümmer

Variational Bayesian model selection for GMM-speaker verification using universal background model
Timur Pekhovsky, Alexandra Lokhanova

To weight or not to weight: source-normalised LDA for speaker recognition using i-vectors
Mitchell McLaren, David A. van Leeuwen

Maximum entropy based data selection for speaker recognition
Chien-Lin Huang, Bin Ma

Addressing the data-imbalance problem in kernel-based speaker verification via utterance partitioning and speaker comparison
Wei Rao, Man-Wai Mak

Single-channel head orientation estimation based on discrimination of acoustic transfer function
Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki

Maximum likelihood i-vector space using PCA for speaker verification
Zhenchun Lei, Yingchun Yang

Speaker verification using sparse representations on total variability i-vectors
Ming Li, Xiang Zhang, Yonghong Yan, Shrikanth Narayanan

Robust speaker recognition in non-stationary room environments based on empirical mode decomposition
Taufiq Hasan, John H. L. Hansen

Range based multi microphone array fusion for speaker activity detection in small meetings
Jani Even, Panikos Heracleous, Carlos T. Ishi, Norihiro Hagita

Speaker verification robust to talking style variation using multiple kernel learning based on conditional entropy minimization
Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi

Regularized logistic regression fusion for speaker verification
Ville Hautamäki, Kong Aik Lee, Tomi Kinnunen, Bin Ma, Haizhou Li

A longest matching segment approach with Bayesian adaptation - application to noise-robust speaker recognition
Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming

Data selection with kurtosis and nasality features for speaker recognition
Howard Lei, Nikki Mirghafori

Use of the harmonic phase in speaker recognition
Inma Hernáez, Ibon Saratxaga, Jon Sanchez, Eva Navas, Iker Luengo

ASR - Acoustic Models I-III

Conversational speech transcription using context-dependent deep neural networks
Frank Seide, Gang Li, Dong Yu

Sequential classification criteria for NNs in automatic speech recognition
Guangsen Wang, Khe Chai Sim

Grapheme-based automatic speech recognition using KL-HMM
Mathew Magimai-Doss, Ramya Rasipuram, Guillermo Aradilla, Hervé Bourlard

Direct error rate minimization of hidden Markov models
Joseph Keshet, Chih-Chieh Cheng, Mark Stoehr, David McAllester

On the effectiveness of statistical modeling based template matching approach for continuous speech recognition
Xie Sun, Xin Chen, Yunxin Zhao

Comparison of smoothing techniques for robust context dependent acoustic modelling in hybrid NN/HMM systems
Guangsen Wang, Khe Chai Sim

Generalized Baum-welch algorithm and its implication to a new extended Baum-welch algorithm
Roger Hsiao, Tanja Schultz

Word boundary modelling and full covariance Gaussians for Arabic speech-to-text systems
F. Diehl, M. J. F. Gales, X. Liu, M. Tomalin, P. C. Woodland

A fully automated derivation of state-based eigentriphones for triphone modeling with no tied states using regularization
Tom Ko, Brian Mak

Reducing computational complexities of exemplar-based sparse representations with applications to large vocabulary speech recognition
Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky

An i-vector based approach to training data clustering for improved speech recognition
Yu Zhang, Jian Xu, Zhi-Jie Yan, Qiang Huo

Rapid training of acoustic models using graphics processing unit
Senaka Buthpitiya, Ian Lane, Jike Chong

Semi-automatic acoustic model generation from large unsynchronized audio and text chunks
Michele Alessandrini, Giorgio Biagetti, Alessandro Curzi, Claudio Turchetti

Unsupervised testing strategies for ASR
Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei

Acoustic model training with detecting transcription errors in the training data
Gakuto Kurata, Nobuyasu Itoh, Masafumi Nishimura

Towards unsupervised training of speaker independent acoustic models
Aren Jansen, Kenneth Church

Acoustic modeling with bootstrap and restructuring based on full covariance
Xiaodong Cui, Xin Chen, Jian Xue, Peder A. Olsen, John R. Hershey, Bowen Zhou

An i-vector based approach to acoustic sniffing for irrelevant variability normalization based acoustic model training and speech recognition
Jian Xu, Yu Zhang, Zhi-Jie Yan, Qiang Huo

Log-linear optimization of second-order polynomial features with subsequent dimension reduction for speech recognition
Muhammad Ali Tahir, Ralf Schlüter, Hermann Ney

Genre categorization and modeling for broadcast speech transcription
Qingqing Zhang, Lori Lamel, Jean-Luc Gauvain

Individual error minimization learning framework and its applications to speech recognition and utterance verification
Sunghwan Shin, Ho-Young Jung, Biing-Hwang Juang

Effective triphone mapping for acoustic modeling in speech recognition
Sakhia Darjaa, Miloš Cerňak, Marián Trnka, Milan Rusko, Róbert Sabo

Analysis of dialectal influence in pan-Arabic ASR
Udhyakumar Nallasamy, Michael Garbus, Florian Metze, Qin Jin, Thomas Schaaf, Tanja Schultz

Connected digit recognition by means of reservoir computing
Azarakhsh Jalalvand, Fabian Triefenbach, David Verstraeten, Jean-Pierre Martens

Large margin - minimum classification error using sum of shifted sigmoids as the loss function
Madhavi V. Ratnagiri, Biing-Hwang Juang, Lawrence Rabiner

Representing phonological features through a two-level finite state model
Javier M. Olaso, M. Inés Torres, Raquel Justo

Optimization of the Gaussian mixture model evaluation on GPU
Jan Vaněk, Jan Trmal, Josef V. Psutka, Josef Psutka

Robust Speech Recognition I-III

Propagation of uncertainty through multilayer perceptrons for robust automatic speech recognition
Ramón Fernandez Astudillo, João Paulo da Silva Neto

Mapping sparse representation to state likelihoods in noise-robust automatic speech recognition
Katariina Mahkonen, Antti Hurmalainen, Tuomas Virtanen, Jort F. Gemmeke

Uncertainty measures for improving exemplar-based source separation
Heikki Kallasjoki, Ulpu Remes, Jort F. Gemmeke, Tuomas Virtanen, Kalle J. Palomäki

Maximum confidence measure based interaural phase difference estimation for noise masking in dual-microphone robust speech recognition
Hsien-Cheng Liao, Yuan-Fu Liao, Chin-Hui Lee

A performance monitoring approach to fusing enhanced spectrogram channels in robust speech recognition
Shirin Badiezadegan, Richard Rose

Generalized variable parameter HMMs for noise robust speech recognition
Ning Cheng, X. Liu, Lan Wang

Sinusoidal approach for the single-channel speech separation and recognition challenge
P. Mowlaee, R. Saeidi, Zheng-Hua Tan, M. G. Christensen, Tomi Kinnunen, P. Fränti, S. H. Jensen

Semi-supervised single-channel speech-music separation for automatic speech recognition
Cemil Demir, A. Taylan Cemgil, Murat Saraçlar

A level-dependent auditory filter-bank for speech recognition in reverberant environments
HariKrishna Maganti, Marco Matassoni

A multichannel feature-based processing for robust speech recognition
Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani

Feature normalization using structured full transforms for robust speech recognition
Xiong Xiao, Jinyu Li, Eng Siong Chng, Haizhou Li

A robust estimation method of noise mixture model for noise suppression
Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani

A versatile Gaussian splitting approach to non-linear state estimation and its application to noise-robust ASR
Volker Leutnant, Alexander Krueger, Reinhold Haeb-Umbach

Generalized-log spectral mean normalization for speech recognition
Hilman F. Pardede, Koichi Shinoda

Zero-crossing-based channel attentive weighting of cepstral features for robust speech recognition: the ETRI 2011 CHiME challenge system
Young-Ik Kim, Hoon-Young Cho, Sang-Hun Kim

Feature compensation for speech recognition in severely adverse environments due to background noise and channel distortion
Wooil Kim, John H. L. Hansen

Binaural cues for fragment-based speech recognition in reverberant multisource environments
Ning Ma, Jon Barker, Heidi Christensen, Phil D. Green

Sub-band level histogram equalization for robust speech recognition
Vikas Joshi, Raghavendra Bilgi, S. Umesh, L. Garcia, C. Benitez

GMM-based missing-feature reconstruction on multi-frame windows
Ulpu Remes, Yoshihiko Nankaku, Keiichi Tokuda

Improvements of a dual-input DBN for noise robust ASR
Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves

Denoising using optimized wavelet filtering for automatic speech recognition
Randy Gomez, Tatsuya Kawahara

Noise robust speaker-independent speech recognition with invariant-integration features using power-bias subtraction
Florian Müller, Alfred Mertins

ASR - Language Models I, II

Empirical evaluation and combination of advanced language modeling techniques
Tomáš Mikolov, Anoop Deoras, Stefan Kombrink, Lukáš Burget, Jan Černocký

Personalizing model M for voice-search
Geoffrey Zweig, Shuangyu Chang

Sentence selection by direct likelihood maximization for language model adaptation
Takahiro Shinozaki, Yu Kubota, Sadaoki Furui, Eiji Utsunomiya, Yasutaka Shindoh

Feature combination approaches for discriminative language models
Ebru Arısoy, Bhuvana Ramabhadran, Hong-Kwang Jeff Kuo

On-line language model biasing for multi-pass automatic speech recognition
Sankaranarayanan Ananthakrishnan, Stavros Tsakalidis, Rohit Prasad, Premkumar Natarajan

Mandarin word-character hybrid-input neural network language model
Moonyoung Kang, Tim Ng, Long Nguyen

Unary data structures for language models
Jeffrey Sorensen, Cyril Allauzen

Bayesian language model interpolation for mobile speech input
Cyril Allauzen, Michael Riley

On the estimation of discount parameters for language model smoothing
Martin Sundermeyer, Ralf Schlüter, Hermann Ney

N-grams for conditional random fields or a failure-transition(ϕ) posterior for acyclic FSTs
Patrick Lehnen, Stefan Hahn, Hermann Ney

Hybrid language models using mixed types of sub-lexical units for open vocabulary German LVCSR
M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney

Morpheme based factored language models for German LVCSR
Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney

Compound word recombination for German LVCSR
Markus Nußbaum-Thom, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney

Lattice-based risk minimization training for unsupervised language model adaptation
Akio Kobayashi, Takahiro Oku, Shinichi Homma, Toru Imai, Seiichi Nakagawa

Similarity language model
Christian Gillot, Christophe Cerisara

Data sampling and dimensionality reduction approaches for reranking ASR outputs using discriminative language models
Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın

Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition
Ryo Masumura, Seongjun Hahm, Akinori Ito

Large vocabulary SOUL neural network language models
Hai-Son Le, Ilya Oparin, Abdel Messaoudi, Alexandre Allauzen, Jean-Luc Gauvain, François Yvon

Improved spoken query transcription using co-occurrence information
Jonathan Mamou, Abhinav Sethy, Bhuvana Ramabhadran, Ron Hoory, Paul Vozila

Unsupervised latent speaker language modeling
Yik-Cheung Tam, Paul Vozila

Spoken Dialogue Systems I, II

User study of spoken decision support system
Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hisashi Kawai, Satoshi Nakamura

Efficient probabilistic tracking of user goal and dialog history for spoken dialog systems
Antoine Raux, Yi Ma

Tackling a shilly-shally classifier for predicting task success in spoken dialogue interaction
Alexander Schmitt, Alexander Zgorzelski, Wolfgang Minker

Evaluation of listening-oriented dialogue control rules based on the analysis of HMMs
Toyomi Meguro, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka

Large-scale experiments on data-driven design of commercial spoken dialog systems
D. Suendermann, J. Liscombe, J. Bloom, G. Li, Roberto Pieraccini

Comparing system-driven and free dialogue in in-vehicle interaction
Fredrik Kronlid, Jessica Villing, Alexander Berman, Staffan Larsson

Optimizing situated dialogue management in unknown environments
Heriberto Cuayáhuitl, Nina Dethlefs

Acoustic-similarity based technique to improve concept recognition
Om D. Deshmukh, Shajith Ikbal, Ashish Verma, Etienne Marcheret

Dialog methods for improved alphanumeric string capture
Doug Peters, Peter Stubley

Detecting the status of a predictive incremental speech understanding model for real-time decision-making in a spoken dialogue system
David DeVault, Kenji Sagae, David Traum

User simulation in dialogue systems using inverse reinforcement learning
Senthilkumar Chandramohan, Matthieu Geist, Fabrice Lefèvre, Olivier Pietquin

Lossless value directed compression of complex user goal states for statistical spoken dialogue systems
Paul A. Crook, Oliver Lemon

Spoken Language Resources, Evaluation and Standardization I

Measurement of objective intelligibility of Japanese accented English using ERJ (English read by Japanese) database
Nobuaki Minematsu, Koji Okabe, Keisuke Ogaki, Keikichi Hirose

From single-call to multi-call quality: a study on long-term quality integration in audio-visual speech communication
Sebastian Möller, Chihuy Bang, Teele Tamme, Markus Vaalgamaa, Benjamin Weiss

Optimal selection of limited vocabulary speech corpora
Hui Lin, Jeff Bilmes

Open source multi-language audio database for spoken language processing applications
Stephen A. Zahorian, Jiang Wu, Montri Karnjanadecha, Chandra SekharVootkuri, Brian Wong, Andrew Hwang, Eldar Tokhtamyshev

The USC CARE corpus: child-psychologist interactions of children with autism spectrum disorders
Matthew P. Black, Daniel Bone, Marian E. Williams, Phillip Gorrindo, Pat Levitt, Shrikanth Narayanan

Towards a versatile multi-layered description of speech corpora using algebraic relations
Nelly Barbot, Vincent Barreaud, Olivier Boëffard, Laure Charonnat, Arnaud Delhay, Sébastien Le Maguer, Damien Lolive

Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus
Korin Richmond, Phil Hoole, Simon King

A pitch tracking corpus with evaluation on multipitch tracking scenario
Gregor Pirker, Michael Wohlmayr, Stefan Petrik, Franz Pernkopf

On building and evaluating a broadcast-news audio segmentation system
Taras Butko, Climent Nadeu

Time- and acoustic-mediated alignment algorithms for speech recognition evaluation
Simon Dobrišek, France Mihelič

Effects of shortening speech prompts of in-car voice user interfaces on users mental models
Julia Niemann, Kati Schulz, Ina Wechsung

Speech transcript evaluation for information retrieval
Laurens van der Werff, Wessel Kraaij, Franciska de Jong

The Albayzin 2010 language recognition evaluation
Luis Javier Rodriguez-Fuentes, Mikel Penagarikano, Amparo Varona, Mireia Diez, Germán Bordel

Progress and prospects for speech technology: results from three sexennial surveys
Roger K. Moore

Painless WFST cascade construction for LVCSR - transducersaurus
Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose

Second Language Acquisition, Development and Learning I, II

On mispronunciation lexicon generation using joint-sequence multigrams in computer-aided pronunciation training (CAPT)
Xiaojun Qian, Helen Meng, Frank K. Soong

Validating a second language perception model for classroom context - a longitudinal study within the perceptual assimilation model
Bianca Sisinni, Mirko Grimaldi

The role of variability in non-native perceptual learning of a Japanese geminate-singleton fricative contrast
Makiko Sadakata, James M. McQueen

Fluency changes with general progress in L2 proficiency
Jared Bernstein, Jian Cheng, Masanori Suzuki

Tongue gestures awareness and pronunciation training
Slim Ouni

Impact of speaker variability on speech perception in non-native listeners
Wim A. van Dommelen, Valerie Hazan

Acquisition of timing patterns in second language
Mikhail Ordin, Leona Polyanskaya, Christiane Ulbrich

Context-dependent duration modeling with backoff strategy and look-up tables for pronunciation assessment and mispronunciation detection
Hongyan Li, Shen Huang, Shijin Wang, Bo Xu

Perceptual training of vowel length contrast of Japanese by L2 listeners: effects of an isolated word versus a word embedded in sentences
Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka

Similar vowels in L1/L2 production: confused or discerned in early L2 English learners with different amount of exposure
E-Chin Wu

Production and perception of estonian vowels by native and non-native speakers
Lya Meister, Einar Meister

New feature parameters for pronunciation evaluation in English presentations at international conferences
Hiroshi Kibishi, Seiichi Nakagawa

Synchronous reading: learning French orthography by audiovisual training
Gérard Bailly, Will Barbour

Phoneme level non-native pronunciation analysis by an auditory model-based native assessment scheme
Christos Koniaris, Olov Engwall

The open front vowel /æ/ in the production and perception of Czech students of English
Pavel Šturm, Radek Skarnitzl

Error selection for ASR-based English pronunciation training in `my pronunciation coach'
Catia Cucchiarini, Henk van den Heuvel, Eric Sanders, Helmer Strik

An experimental analysis of pitch patterns in Japanese speakers of English with verification by speech re-synthesis
Tomoko Nariai, Kazuyo Tanaka

An analysis of word duration in native speakers and Japanese speakers of English
Tomoko Nariai, Kazuyo Tanaka, Yoshiaki Ito

ASR - Search, Keyword Spotting and Confidence Measures I, II

A template based voice trigger system using bhattacharyya edit distance
Evelyn Kurniawati, Samsudin Ng, Karthik Muralidhar, Sapna George

Acoustic look-ahead for more efficient decoding in LVCSR
D. Nolden, Ralf Schlüter, Hermann Ney

A new epsilon filter for efficient composition of weighted finite-state transducers
Frank Duckhorn, Matthias Wolff, Rüdiger Hoffmann

A bottom-up stepwise knowledge-integration approach to large vocabulary continuous speech recognition using weighted finite state machines
Sabato Marco Siniscalchi, Torbjørn Svendsen, Chin-Hui Lee

Combining information sources for confidence estimation with CRF models
M. S. Seigel, P. C. Woodland

Evaluation of fast spoken term detection using a suffix array
Kouichi Katsurada, Shinta Sawada, Shigeki Teshima, Yurie Iribe, Tsuneo Nitta

Event selection from phone posteriorgrams using matched filters
Keith Kintzley, Aren Jansen, Hynek Hermansky

A piecewise aggregate approximation lower-bound estimate for posteriorgram-based dynamic time warping
Yaodong Zhang, James Glass

OOV detection and recovery using hybrid models with different fragments
Long Qin, Ming Sun, Alexander Rudnicky

AUC optimization based confidence measure for keyword spotting
Haiyang Li, Jiqing Han, Tieran Zheng

An empirical study of multilingual spoken term detection
Zejun Ma, Xiaorui Wang, Bo Xu

Fusing multiple confidence measures for Chinese spoken term detection
Zejun Ma, Xiaorui Wang, Bo Xu

Response probability based decoding algorithm for large vocabulary continuous speech recognition
Zhanlei Yang, Hao Chao, Wenju Liu

Combining lattice-based language dependent and independent approaches for out-of-language detection in LVCSR
Yuxiang Shan, Yan Deng, Jia Liu

Evaluation of tree-trellis based decoding in over-million LVCSR
Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Lattice based discriminative model combination using automatically induced phonetic contexts
Hao Huang, Bing Hu Li

Predicting human perceived accuracy of ASR systems
Taniya Mishra, Andrej Ljolje, Mazin Gilbert

Cross-lingual study of ASR errors: on the role of the context in human perception of near-homophones
I. Vasilescu, D. Yahia, N. Snoeren, Martine Adda-Decker, Lori Lamel

Performance prediction of speech recognition using average-voice-based speech synthesis
Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii

Confidence measures for turkish call center conversations
Ali Haznedaroglu, Levent M. Arslan

Spoken document confidence estimation using contextual coherence
Taichi Asami, Narichika Nomoto, Satoshi Kobashikawa, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi

Speech Enhancement

Evaluating artificial bandwidth extension by conversational tests in car using mobile devices with integrated hands-free functionality
Laura Laaksonen, Ville Myllylä, Riitta Niemistö

Low-frequency bandwidth extension of telephone speech using sinusoidal synthesis and Gaussian mixture model
Hannu Pulakka, Ulpu Remes, Santeri Yrttiaho, Kalle J. Palomäki, Mikko Kurimo, Paavo Alku

Memory-based approximation of the Gaussian mixture model framework for bandwidth extension of narrowband speech
Amr H. Nour-Eldin, Peter Kabal

Speech enhancement by reconstruction from cleaned acoustic features
Philip Harding, Ben Milner

A soft decision-based speech enhancement using acoustic noise classification
Jae-Hun Choi, Sang-Kyun Kim, Joon-Hyuk Chang

A noise estimation method based on speech presence probability and spectral sparseness
Chao Li, Wenju Liu

Improved a posteriori speech presence probability estimation based on cepstro-temporal smoothing and time-frequency correlation
Chao Li, Wenju Liu

A rapid adaptation algorithm for tracking highly non-stationary noises based on Bayesian inference for on-line spectral change point detection
Md Foezur Rahman Chowdhury, Sid-Ahmed Selouani, Douglas O'Shaughnessy

Single channel speech enhancement using MMSE estimation of short-time modulation magnitude spectrum
Kuldip Paliwal, Belinda Schwerin, Kamil Wójcicki

Speech enhancement using masking properties in adverse environments
Atanu Saha, Tetsuya Shimamura

Phoneme-dependent NMF for speech enhancement in monaural mixtures
Bhiksha Raj, Rita Singh, Tuomas Virtanen

Kernel PCA for speech enhancement
Christina Leitner, Franz Pernkopf, Gernot Kubin

Objective intelligibility prediction of speech by combining correlation and distortion based techniques
Angel M. Gomez, Belinda Schwerin, Kuldip Paliwal

Spoken Dialogue & Spoken Language Understanding Systems

Multi-view approach for speaker turn role labeling in TV broadcast news shows
Géraldine Damnati, Delphine Charlet

Evaluation of an integrated authoring tool for building advanced question-answering characters
Sudeep Gandhe, Michael Rushforth, Priti Aggarwal, David Traum

Towards unsupervised spoken language understanding: exploiting query click logs for slot filling
Gokhan Tur, Dilek Hakkani-Tür, Dustin Hillard, Asli Celikyilmaz

Web-enhanced content retrieval for information access dialogue system
Donghyeon Lee, Cheongjae Lee, Minwoo Jeong, Kyungduk Kim, Seokhwan Kim, Junhwi Choi, Gary Geunbae Lee

Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system
Lucie Daubigney, Milica Gašić, Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin, Steve Young

Detection of task-incomplete dialogs based on utterance-and-behavior tag n-gram for spoken dialog systems
Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Shrinkage-based features for natural language call routing
Ruhi Sarikaya, Stanley F. Chen, Bhuvana Ramabhadran

Clustering with modified cosine distance learned from constraints
Leonid Rachevsky, Dimitri Kanevsky, Ruhi Sarikaya, Bhuvana Ramabhadran

Using speaker ID to discover repeat callers of a spoken dialog system
Andrew Fandrianto, Brian Langner, Alan W. Black

Semantic graph clustering for POMDP-based spoken dialog systems
Florian Pinault, Fabrice Lefèvre

Learning place-names from spoken utterances and localization results by mobile robot
Ryo Taguchi, Yuji Yamada, Koosuke Hattori, Taizo Umezaki, Masahiro Hoguro, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano

Active learning for dialogue act classification
Björn Gambäck, Fredrik Olsson, Oscar Täckström

Speaker role recognition using question detection and characterization
Thierry Bazillon, Benjamin Maza, Michael Rouvier, Frederic Bechet, Alexis Nasr

Learning score structure from spoken language for a tennis game
Qiang Huang, Stephen J. Cox

Semi-automated classifier adaptation for natural language call routing
Silke M. Witt

Interactional style detection for versatile dialogue response using prosodic and semantic features
Wei-Bin Liang, Chung-Hsien Wu, Chih-Hung Wang, Jhing-Fa Wang

Quality aspects of multimodal dialog systems: identity, stimulation and success
Christine Kühnel, Benjamin Weiss, Matthias Schulz, Sebastian Möller

Paralinguistic Information - Classification and Detection

On the use of multimodal cues for the prediction of degrees of involvement in spontaneous conversation
Catharine Oertel, Stefan Scherer, Nick Campbell

Anger recognition in spoken dialog using linguistic and para-linguistic information
Narichika Nomoto, Masafumi Tamoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi

Recognition of personality traits from human spoken conversations
A. V. Ivanov, G. Riccardi, A. J. Sporka, J. Franc

Using multiple databases for training in emotion recognition: to unite or to vote?
Björn Schuller, Zixing Zhang, Felix Weninger, Gerhard Rigoll

“would you buy a car from me?” - on the likability of telephone voices
Felix Burkhardt, Björn Schuller, Benjamin Weiss, Felix Weninger

Automatic identification of salient acoustic instances in couples' behavioral interactions using diverse density support vector machines
James Gibson, Athanasios Katsamanis, Matthew P. Black, Shrikanth Narayanan

Predicting speaker changes and listener responses with and without eye-contact
Daniel Neiberg, Joakim Gustafson

Emotion classification using inter- and intra-subband energy variation
Senaka Amarakeerthi, Tin Lay Nwe, Liyanage C. De Silva, Michael Cohen

Emotion classification of infants' cries using duration ratios of acoustic segments
K. Kitahara, S. Michiwiki, M. Sato, S. Matsunaga, M. Yamashita, K. Shinohara

Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions
Bogdan Vlasenko, Dmytro Prylipko, David Philippou-Hübner, Andreas Wendemuth

Intra-, inter-, and cross-cultural classification of vocal affect
Daniel Neiberg, Petri Laukka, Hillary Anger Elfenbein

Applications for Learning, Education, Aged and Handicapped Persons

Verifying human users in speech-based interactions
Sajad Shirali-Shahreza, Yashar Ganjali, Ravin Balakrishnan

Automatic assessment of prosody in high-stakes English tests
Jian Cheng

Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus
Dean Luo, Xuesong Yang, Lan Wang

Off-topic detection in automated speech assessment applications
Jian Cheng, Jianqiang Shen

Towards context-dependent phonetic spelling error correction in children's freely composed text for diagnostic and pedagogical purposes
Sebastian Stüker, Johanna Fay, Kay Berkling

Factored translation models for improving a speech into sign language translation system
V. López-Ludeña, R. San-Segundo, R. Córdoba, J. Ferreiros, J. M. Montero, J. M. Pardo

Formant maps in Hungarian vowels - online data inventory for research, and education
Kálmán Abari, Zsuzsanna Zsófia Rácz, Gábor Olaszy

Automatic subtitling of the basque parliament plenary sessions videos
Germán Bordel, Silvia Nieto, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, Amparo Varona

Generating animated pronunciation from speech through articulatory feature extraction
Yurie Iribe, Silasak Manosavanh, Kouichi Katsurada, Ryoko Hayashi, Chunyue Zhu, Tsuneo Nitta

A tale of two tasks: detecting children's off-task speech in a reading tutor
Wei Chen, Jack Mostow

Problems encountered by Japanese EL2 with English short vowels as illustrated on a 3d vowel chart
Toshiko Isei-Jaakkola, Takatoshi Naka, Keikichi Hirose

Automatic generation of listening comprehension learning material in european portuguese
Thomas Pellegrini, Rui Correia, Isabel Trancoso, Jorge Baptista, Nuno Mamede

Candidate generation for ASR output error correction using a context-dependent syllable cluster-based confusion matrix
Chao-Hong Liu, Chung-Hsien Wu, David Sarwono, Jhing-Fa Wang

Semi-supervised tree support vector machine for online cough recognition
Thai Hoa Huynh, Vu An Tran, Huy Dat Tran

Source Separation and Speech Enhancement

Monaural voiced speech segregation based on pitch and comb filter
Xueliang Zhang, Wenju Liu

Fast and simple iterative algorithm of lp-norm minimization for under-determined speech separation
Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Monaural speech separation based on a 2d processing and harmonic analysis
Azam Rabiee, Saeed Setayeshi, Soo-Young Lee

Underdetermined blind source separation with fuzzy clustering for arbitrarily arranged sensors
Ingrid Jafari, Serajul Haque, Roberto Togneri, Sven Nordholm

On initial seed selection for frequency domain blind speech separation
Dang Hai Tran Vu, Reinhold Haeb-Umbach

Spatial filter calibration based on minimization of modified LSD
Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi

Probabilistic spectrum envelope: categorized audio-features representation for NMF-based sound decomposition
Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki

A high resolution multiple source localization based on generalized cumulant structure (GCS) matrix
Jinho Choi, Chang D. Yoo

Single channel speech music separation using nonnegative matrix factorization with sliding windows and spectral masks
Emad M. Grais, Hakan Erdogan

Perceptually-inspired processing for multichannel Wiener filter
Jorge I. Marin-Hurtado, David V. Anderson

Speech recognition in mixed sound of speech and music based on vector quantization and non-negative matrix factorization
Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa

Reduction of highly nonstationary ambient noise by integrating spectral and locational characteristics of speech and noise for robust ASR
Tomohiro Nakatani, Shoko Araki, Marc Delcroix, Takuya Yoshioka, Masakiyo Fujimoto

Voice processing by dynamic glottal models with applications to speech enhancement
Carlo Drioli, Andrea Calanca

Supervised sparse coding strategy in cochlear implants
Jinqiu Sang, Guoping Li, Hongmei Hu, Mark E. Lutman, Stefan Bleeck

Phonetics and Phonology, Stress, Accent, Rhythm

Chinese and Italian speech rhythm: normalization and the CCI algorithm
Chiara Bertini, Pier Marco Bertinetto, Na Zhi

Rhythm metrics on syllables and feet do not work as expected
Paolo Mairano, Antonio Romano

Applying rhythm features to automatically assess non-native speech
Lei Chen, Klaus Zechner

Prosodic synchrony in co-operative task-based dialogues: a measure of agreement and disagreement
Brian Vaughan

Low and high, short and long by crook or by hook?
Oliver Niebuhr, Astrid Wolf

Estimating speaking rate by means of rhythmicity parameters
Christian Heinrich, Florian Schiel

Comparing word and syllable prominence rated by naïve listeners
Denis Arnold, Bernd Möbius, Petra Wagner

L1/L2 perception of lexical stress with F0 peak-delay: effect of an extra syllable added
Shinichi Tokuma, Yi Xu

Letter-to-phoneme conversion based on two-stage neural network focusing on letter and phoneme contexts
Kheang Seng, Yurie Iribe, Tsuneo Nitta

An international English speech corpus for longitudinal study of accent development
Rosemary Orr, Hugo Quené, Roeland van Beek, Thari Diefenbach, David A. van Leeuwen, Marijn Huijbregts

A corpus-based study of English pronunciation variations
Sunhee Kim, Kyuwhan Lee, Minhwa Chung

Long term average speech spectra in yolngu matha and pitjantjatjara speaking females and males
Hywel Stoakes, Andrew Butcher, Janet Fletcher, Marija Tabain

Context and speaker dependency in the relation of vowel formants and subglottal resonances - evidence from Hungarian
Tekla Etelka Gráczi, Steven M. Lulich, Tamás Gábor Csapó, András Beke

SLP for Speech Translation, Information Extraction and Retrieval

OOV sensitive named-entity recognition in speech
Carolina Parada, Mark Dredze, Frederick Jelinek

Speech translation with grammar driven probabilistic phrasal bilexica extraction
Markus Saers, Dekai Wu, Chi-kiu Lo, Karteek Addanki

An efficient unified extraction algorithm for bilingual data
Christoph Tillmann, Sanjika Hewavitharana

Using features from topic models to alleviate over-generation in hierarchical phrase-based translation
Songfang Huang, Bowen Zhou

An empirical study on improving hierarchical phrase-based translation using alignment features
Songfang Huang, Bowen Zhou

Robust speech translation by domain adaptation
Xiaodong He, Li Deng

Enhancements to the training process of classifier-based speech translator via topic modeling
Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth Narayanan

A scalable approach to building a parallel corpus from the web
Vivek Kumar Rangarajan Sridhar, Luciano Barbosa, Srinivas Bangalore

Spoken term detection results using plural subword models by estimating detection performance for each query
Yoshiaki Itoh, Kohei Iwata, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee

Speechforms: from web to speech and back
Luciano Barbosa, Diamantino Caseiro, Giuseppe Di Fabbrizio, Amanda Stent

Image processing filters for line detection-based spoken term detection
Kazuyuki Noritake, Hiroaki Nanjo, Takehiko Yoshimi

Using latent topic features for named entity extraction in search queries
Joe Polifroni, François Mairesse

Language model expansion using webdata for spoken document retrieval
Ryo Masumura, Seongjun Hahm, Akinori Ito

Effects of query expansion for spoken document passage retrieval
Tomoyosi Akiba, Koichiro Honda

Unsupervised hidden Markov modeling of spoken queries for spoken term detection without speech recognition
Chun-an Chan, Lin-shan Lee

Topic identification from audio recordings using rich recognition results and neural network based classifiers
Roberto Gemello, Franco Mana, Pier Domenico Batzu

Speech Synthesis - Selected Topics

A grammar based approach to style specific phrase prediction
Alok Parlikar, Alan W. Black

Unsupervised features from text for speech synthesis in a speech-to-speech translation system
Oliver Watts, Bowen Zhou

Unsupervised continuous-valued word features for phrase-break prediction without a part-of-speech tagger
Oliver Watts, Junichi Yamagishi, Simon King

Albayzín 2010: a Spanish text to speech evaluation
Francisco Campillo, Francisco Méndez, Montserrat Arza, Laura Docío, Antonio Bonafonte, Eva Navas, Iñaki Sainz

Combining active and semi-supervised learning for homograph disambiguation in Mandarin text-to-speech synthesis
Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai

Automatically creating a diphone set from a speech database
Thomas Ewender, Beat Pfister

Automatic viseme clustering for audiovisual speech synthesis
Wesley Mattheyses, Lukas Latacz, Werner Verhelst

Perceptual quality dimensions of text-to-speech systems
Florian Hinterleitner, Sebastian Möller, Christoph Norrenbrock, Ulrich Heute

A pointwise approach to pronunciation estimation for a TTS front-end
Shinsuke Mori, Graham Neubig

Correlating text with prosody
Mohamed Abou-Zleikha, Julie Carson-Berndsen

“what is… dengue fever?” - modeling and predicting pronunciation errors in a text-to-speech system
Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran

Aperiodicity analysis for quality estimation of text-to-speech signals
Christoph Norrenbrock, Ulrich Heute, Florian Hinterleitner, Sebastian Möller

Human Speech and Sound Perception I, II

Parallels in infants' attention to speech articulation and to physical changes in speech-unrelated objects
Eeva Klintfors, Ellen Marklund, Francisco Lacerda

Speech events are recoverable from unlabeled articulatory data: using an unsupervised clustering approach on data obtained from electromagnetic midsaggital articulography (EMA)
Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Hinrich Schütze

Children's recognition of their own voice: influence of phonological impairment
Sofia Strömbergsson

Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of speaker discrimination information
Takayuki Kagomiya, Seiji Nakagawa

Impact of different feedback mechanisms in EMG-based speech recognition
Christian Herff, Matthias Janke, Michael Wand, Tanja Schultz

Phonotactic constraints and the segmentation of Cantonese speech
Michael C. W. Yip

Reaction time and decision difficulty in the perception of intonation
Katrin Schneider, Grzegorz Dogil, Bernd Möbius

Processing of stress related acoustic cues as indexed by ERPs
Ferenc Honbolygó, Valéria Csépe

On the relationship between perceived accentedness, acoustic similarity, and processing difficulty in foreign-accented speech
Marijt J. Witteman, Andrea Weber, James M. McQueen

The perception boundary between single and geminate stops in 3- and 4-mora Japanese words
Shigeaki Amano, Yukari Hirata

Correlation analysis of acoustic features with perceptual voice quality similarity for similar speaker selection
Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno

Pointing gestures do not influence the perception of lexical stress
Alexandra Jesse, Holger Mitterer

Relationships between phonetic features and speech perception - a statistical investigation from a large anechoic british English corpus
Ian R. Cushing, Francis F. Li, Ken Worrall, Tim Jackson

The representation of speech in a nonlinear auditory model: time-domain analysis of simulated auditory-nerve firing patterns
Guy J. Brown, Tim Jürgens, Ray Meddis, Matthew Robertson, Nicholas R. Clark

An automatic voice pleasantness classification system based on prosodic and acoustic patterns of voice preference
Luis Coelho, Daniela Braga, Miguel Sales-Dias, Carmen Garcia-Mateo

Contributions of F1 and F2 (F2') to the perception of plosive consonants
René Carré, Pierre Divenyi, Willy Serniclaes, Emmanuel Ferragne, Egidio Marsico, Viet-Son Nguyen

Auditory speech processing is affected by visual speech in the periphery
Jeesun Kim, Chris Davis

Visual speech speeds up auditory identification responses
Tim Paris, Jeesun Kim, Chris Davis

Agglomerative hierarchical clustering of emotions in speech based on subjective relative similarity
Ryoichi Takashima, Tohru Nagano, Ryuki Tachibana, Masafumi Nishimura

Optimal syllabic rates and processing units in perceiving Mandarin spoken sentences
Guangting Mai, Gang Peng

Cross-lingual speaker discrimination using natural and synthetic speech
Mirjam Wester, Hui Liang

ASR - New Paradigms and Other Topics

Accelerated parallelizable neural network learning algorithm for speech recognition
Dong Yu, Li Deng

Deep convex net: a scalable architecture for speech pattern classification
Li Deng, Dong Yu

Modeling broad context for tone recognition with conditional random fields
Siwei Wang, Gina-Anne Levow

Improved tonal language speech recognition by integrating spectro-temporal evidence and pitch information with properly chosen tonal acoustic units
Shang-wen Li, Yow-bang Wang, Liang-che Sun, Lin-shan Lee

Kullback-leibler divergence-based ASR training data selection
Evandro Gouvêa, Marelie H. Davel

Articulatory feature classification using nearest neighbors
Arild Brandrud Næss, Karen Livescu, Rohit Prabhavalkar

Continuous episodic memory based speech recognition using articulatory dynamics
Sébastien Demange, Slim Ouni

Graphone model interpolation and Arabic pronunciation generation
T. Li, P. C. Woodland, F. Diehl, M. J. F. Gales

Grapheme-to-phoneme conversion using conditional random fields
Irina Illina, Dominique Fohr, Denis Jouvet

Bilingual acoustic model adaptation by unit merging on different levels and cross-level integration
Ching-Feng Yeh, Chao-Yu Huang, Lin-shan Lee

A qualitative evaluation of phoneme-to-phoneme technology
Marijn Schraagen, Gerrit Bloothooft

Cheap bootstrap of multi-lingual hidden Markov models
Daniele Falavigna, Roberto Gretter

Adaptive stream fusion in multistream recognition of speech
Nima Mesgarani, Samuel Thomas, Hynek Hermansky

Unsupervised audio patterns discovery using HMM-based self-organized units
Man-hung Siu, Herbert Gish, Steve Lowe, Arthur Chan

Nearest neighbors with learned distances for phonetic frame classification
John Labiak, Karen Livescu

Speech Audio Analysis and Classification

Stop consonant recognition by temporal fine structure of burst
Seppo Fagerlund, Unto K. Laine

Phonetic classification using controlled random walks
Katrin Kirchhoff, Andrei Alexandrescu

Keyphrase cloud generation of broadcast news
Luís Marujo, Márcio Viveiros, João Paulo da Silva Neto

Optimized feature extraction and HMMs in subword detectors
Alfonso M. Canterla, Magne H. Johnsen

Real-world speech/non-speech audio classification based on sparse representation features and GPCs
Ziqiang Shi, Jiqing Han, Tieran Zheng

Privacy preserving speaker verification using adapted GMMs
Manas A. Pathak, Bhiksha Raj

Clustering expressive speech styles in audiobooks using glottal source parameters
Éva Székely, João P. Cabral, Peter Cahill, Julie Carson-Berndsen

On the use of the rhythmogram for automatic syllabic prominence detection
Bogdan Ludusan, Antonio Origlia, Francesco Cutugno

Speech modulation features for robust nonnative speech accent detection
Sethserey Sam, Xiong Xiao, Laurent Besacier, Eric Castelli, Haizhou Li, Eng Siong Chng

Frame-level vocal effort likelihood space modeling for improved whisper-island detection
Chi Zhang, John H. L. Hansen

Speaker identification for whispered speech using a training feature transformation from neutral to whisper
Xing Fan, John H. L. Hansen

An accurate and robust gender identification algorithm
Andrea DeMarco, Stephen J. Cox

Deep belief networks for automatic music genre classification
Xiaohong Yang, Qingcai Chen, Shusen Zhou, Xiaolong Wang

Image representation of the subband power distribution for robust sound classification
Jonathan Dennis, Huy Dat Tran, Haizhou Li

Acoustic and visual cues of turn-taking dynamics in dyadic interactions
Bo Xiao, Viktor Rozgić, Athanasios Katsamanis, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth Narayanan

Voice Activity Detection

Voice activity detection in MTF-based power envelope restoration
Masashi Unoki, Xugang Lu, Rico Petrick, Shota Morita, Masato Akagi, Rüdiger Hoffmann

Using spectral fluctuation of speech in multi-feature HMM-based voice activity detection
Miquel Espi, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama

Linear dynamic models for voice activity detection
Kannu Mehta, Chau Khoa Pham, Eng Siong Chng

Detection of shouted speech in the presence of ambient noise
Jouni Pohjalainen, Tuomo Raitio, Paavo Alku

Breath-detection-based telephony speech phrasing
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura

Multi-channel voice activity detection based on conic constraints
Gibak Kim

Multi-sensor voice activity detection based on multiple observation hypothesis testing
Theodoros Petsatodis, Fotios Talantzis, Christos Boukis, Zheng-Hua Tan, Ramjee Prasad

Online speech activity detection in broadcast news
Chao Gao, Guruprasad Saikumar, Saurabh Khanwalkar, Avi Herscovici, Anoop Kumar, Amit Srivastava, Premkumar Natarajan

A real-time speech command detector for a smart control room
Daniel Reich, Felix Putze, Dominic Heger, Joris Ijsselmuiden, Rainer Stiefelhagen, Tanja Schultz

Robust voice activity detector for real world applications using harmonicity and modulation frequency
Ekapol Chuangsuwanich, James Glass

On noise robust voice activity detection
Tomas Dekens, Werner Verhelst

Adaptive regularization framework for robust voice activity detection
Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura

Voice Conversion and Speech Synthesis

Gaussian process experts for voice conversion
Nicholas C. V. Pilkington, Heiga Zen, M. J. F. Gales

Intonation conversion from neutral to expressive speech
Christophe Veaux, Xavier Rodet

Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speech-to-speech translation
Nobuhiko Hattori, Tomoki Toda, Hisashi Kawai, Hiroshi Saruwatari, Kiyohiro Shikano

Adding glottal source information to intra-lingual voice conversion
Javier Pérez, Antonio Bonafonte

Formant-controlled HMM-based speech synthesis
Ming Lei, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, Li-Rong Dai

Analysis of HMM-based lombard speech synthesis
Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku

Discrete/continuous modelling of speaking style in HMM-based speech synthesis: design and evaluation
Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet

Factored MLLR adaptation for singing voice generation
June Sig Sung, Doo Hwa Hong, Shin Jae Kang, Nam Soo Kim

Adaptation of prosody in speech synthesis by changing command values of the generation process model of fundamental frequency
Keikichi Hirose, Keiko Ochi, Ryusuke Mihara, Hiroya Hashimoto, Daisuke Saito, Nobuaki Minematsu

Prosody conversion for emotional Mandarin speech synthesis using the tone nucleus model
Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu

Rapid adaptation of foreign-accented HMM-based speech synthesis
Reima Karhila, Mirjam Wester

The effects of phoneme errors in speaker adaptation for HMM speech synthesis
Bálint Tóth, Tibor Fegyó, Géza Németh

Human Speech Production II

Articulatory reduction in Mandarin Chinese words
Jeffrey Berry, Sunjing Ji, Ian Fasel, Diana Archangeli

Morphological variation in the adult vocal tract: a modeling study of its potential acoustic impact
Adam Lammert, Michael Proctor, Athanasios Katsamanis, Shrikanth Narayanan

Analysis and automatic estimation of children's subglottal resonances
Steven M. Lulich, Harish Arsikere, John R. Morton, Gary K. F. Leung, Abeer Alwan, Mitchell S. Sommers

Acceleration sensor based estimates of subglottal resonances: short vs. long vowels
Wolfgang Wokurek, Andreas Madsack

Comparison of nasalance measurements from accelerometers and microphones and preliminary development of novel features
Nicolas Audibert, Angélique Amelot

The effect of seeing the interlocutor on speech production in different noise types
Michael Fitzpatrick, Jeesun Kim, Chris Davis

Conversing in the presence of a competing conversation: effects on speech production
Vincent Aubanel, Martin Cooke, Julián Villegas, Maria Luisa Garcia Lecumberri

Very short utterances and timing in turn-taking
Mattias Heldner, Jens Edlund, Anna Hjalmarsson, Kornel Laskowski

Validating rt-MRI based articulatory representations via articulatory recognition
Athanasios Katsamanis, Erik Bresch, Vikram Ramanarayanan, Shrikanth Narayanan

An electropalatographic and acoustic study on anticipatory coarticulation in V1#C2V2 sequences in standard Chinese
Yinghao Li, Jiangping Kong

Final /t/ reduction in dutch past-participles: the role of word predictability and morphological decomposability
Iris Hanique, Mirjam Ernestus

Parametrising degree of articulator movement from dynamic MRI data
Zeynab Raeesy, Ladan Baghai-Ravary, John Coleman

Speech and Language Processing-Based Assistive Technologies and Health Applications (Special Session)

Automatic detection of depression in speech using Gaussian mixture modeling with factor analysis
Douglas Sturim, Pedro A. Torres-Carrasquillo, Thomas F. Quatieri, Nicolas Malyska, Alan McCree

Utterance verification for automating the hearing in noise test (HINT)
H. Timothy Bunnell, Jason Lilley, Sigfrid D. Soli, Ivan Pal

Analyzing the nature of ECA interactions in children with autism
Emily Mower, Chi-Chun Lee, James Gibson, Theodora Chaspari, Marian E. Williams, Shrikanth Narayanan

Incorporating speech recognition engine into an intelligent assistive reading system for dyslexic students
Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Evmorfia N. Argyriou, Antonis Symvonis

An investigation of depressed speech detection: features and normalization
Nicholas Cummins, Julien Epps, Michael Breakspear, Roland Goecke

Using prosodic and spectral features in detecting depression in elderly males
Michelle Hewlett Sanchez, Dimitra Vergyri, Luciana Ferrer, Colleen Richey, Pablo Garcia, Bruce Knoth, William Jarrold

Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment
Catherine Middag, Tobias Bocklet, Jean-Pierre Martens, Elmar Nöth

Speech synthesis parameter generation for the assistive silent speech interface MVOCA
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko

Computer-assisted disfluency counts for stuttered speech
Peter A. Heeman, Andy McMillin, J. Scott Yaruss

Spectral features for automatic blind intelligibility estimation of spastic dysarthric speech
Richard Hummel, Wai-Yip Chan, Tiago H. Falk

Extraction of narrative recall patterns for neuropsychological assessment
Emily T. Prud'hommeaux, Brian Roark

Gesture design of hand-to-speech converter derived from speech-to-hand converter based on probabilistic integration model
Aki Kunikoshi, Yu Qiao, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose

Powered wheelchair control using acoustic-based recognition of head gesture accompanying speech
Akira Sasou

Analyzing training dependencies and posterior fusion in discriminant classification of apnea patients based on sustained and connected speech
José Luis Blanco, Rubén Fernández, Doroteo Torre, F. Javier Caminero, Eduardo López

Speech and Audio Processing for Human-Robot Interaction (Special Session)

Using prominence detection to generate acoustic feedback in tutoring scenarios
Lars Schillingmann, Petra Wagner, Christian Munier, Britta Wrede, Katharina Rohlfing

Bayesian extension of MUSIC for sound source localization and tracking
Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno

Speech-based non-prototypical affect recognition for child-robot interaction in reverberated environments
Martin Wöllmer, Felix Weninger, Stefan Steidl, Anton Batliner, Björn Schuller

Blind source separation for robot audition using fixed beamforming with HRTFs
Mounira Maazaoui, Yves Grenier, Karim Abed-Meraim

Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices
Marie Tahon, Agnes Delaborde, Laurence Devillers

Weighted ordered classes - nearest neighbors: a new framework for automatic emotion recognition from speech
Yazid Attabi, Pierre Dumouchel

Prosodic analysis of a corpus of tales
David Doukhan, Albert Rilliard, Sophie Rosset, Martine Adda-Decker, Christophe d'Alessandro

Analysis of acoustic-prosodic features related to paralinguistic information carried by interjections in dialogue speech
Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita

Robust intonation pattern classification in human robot interaction
Martin Heckmann, Kazuhiro Nakadai, Hirofumi Nakajima

ASR for human-symbiotic robot “EMIEW2” with mechanical noise and floor-level noise reduction
Takashi Sumiyoshi, Masahito Togami, Yasunari Obuchi

Speech Technology for Under-Resourced Languages (Special Session)

Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training
Ngoc Thang Vu, Franziska Kraus, Tanja Schultz

Places and manner of articulation of Bangla consonants: an EPG based study
Shyamal Kr. Das Mandal, Somnath Chandra, Swaran Lata, A. K. Datta

Efficient harvesting of internet audio for resource-scarce ASR
Marelie H. Davel, Charl van Heerden, Neil Kleynhans, Etienne Barnard

Automatic prosody generation for serbo-croatian speech synthesis based on regression trees
Milan Sečujski, Darko Pekar, Nikša Jakovljević

Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis
Alexey Karpov, Irina Kipyatkova, Andrey Ronzhin

Cross-language phone recognition when the target language phoneme inventory is not known
Timothy Kempton, Roger K. Moore, Thomas Hain

A paradigm for limited vocabulary speech recognition based on redundant spectro-temporal feature sets
Sourish Chaudhuri, Bhiksha Raj, Tony Ezzat

Gorup: an ontology-driven audio information retrieval system that suits the requirements of under-resourced languages
N. Barroso, K. López de Ipiña, A. Ezeiza, C. Hernández, N. Ezeiza, O. Barroso, U. Susperregi, S. Barroso

Woefzela - an open-source platform for ASR data collection in the developing world
Nic J. de Vries, Jaco Badenhorst, Marelie H. Davel, Etienne Barnard, Alta de Waal

A study on the perception of tone and intonation in Sesotho
Hansjörg Mixdorff, Lehlohonolo Mohasi, 'Malillo Machobane, Thomas Niesler

Developing a broadband automatic speech recognition system for Afrikaans
Febe de Wet, Alta de Waal, Gerhard B. van Huyssteen

Multi-accent speech recognition of Afrikaans, black and white varieties of south african English
Herman Kamper, Thomas Niesler

Perceptual representation of consonant sounds in Thai
C. Tantibundhit, C. Onsuwan, T. Saimai, N. Saimai, S. Thatphithakkul, P. Chootrakool, K. Kosawat, N. Thatphithakkul

A cross-lingual approach to the development of an HMM-based speech synthesis system for malay
Mumtaz B. Mustafa, Raja N. Ainon, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles

Speaker State Challenge - Intoxication and Sleepiness I, II (Special Session)

The INTERSPEECH 2011 speaker state challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Florian Schiel, Jarek Krajewski

Combining multiple phoneme-based classifiers with audio feature-based classifier for the detection of alcohol intoxication
Claude Montacié, Marie-José Caraty

Intoxication detection using phonetic, phonotactic and prosodic cues
Fadi Biadsy, William Yang Wang, Andrew Rosenberg, Julia Hirschberg

Drink and speak: on the automatic classification of alcohol intoxication by acoustic, prosodic and text-based features
Tobias Bocklet, Korbinian Riedhammer, Elmar Nöth

Intoxicated speech detection by fusion of speaker normalized hierarchical features and GMM supervectors
Daniel Bone, Matthew P. Black, Ming Li, Angeliki Metallinou, Sungbok Lee, Shrikanth Narayanan

Attention, sobriety checkpoint! can humans determine by means of voice, if someone is drunk… and can automatic classifiers compete?
Stefan Ultes, Alexander Schmitt, Wolfgang Minker

Does it groove or does it stumble - automatic classification of alcoholic intoxication using prosodic features
Florian Hönig, Anton Batliner, Elmar Nöth

Perception of alcoholic intoxication in speech
Florian Schiel

Detecting sleepiness by fusing classifiers trained with novel acoustic features
Tauhidur Rahman, Soroosh Mariooryad, Shalini Keshavamurthy, Gang Liu, John H. L. Hansen, Carlos Busso

An HMM-based approach to the INTERSPEECH 2011 speaker state challenge
Albino Nogueiras Rodríguez

RANSAC-based training data selection for speaker state recognition
Elif Bozkurt, Engin Erzin, Çiğdem Eroğlu Erdem, A. Tanju Erdem

University of Ljubljana system for interspeech 2011 speaker state challenge
Rok Gajšek, Simon Dobrišek, France Mihelič

Speaker state classification based on fusion of asymmetric SIMPLS and support vector machines
Dong-Yan Huang, Shuzhi Sam Ge, Zhengchen Zhang

Speech Processing Tools (Special Session)

Speech processing tools - an introduction to interoperability
Christoph Draxler, Toomas Altosaar, Sadaoki Furui, Mark Liberman, Peter Wittenburg

Easyalign: an automatic phonetic alignment tool under praat
Jean-Philippe Goldman

Mtrans: a multi-channel, multi-tier speech annotation tool
Julián Villegas, Martin Cooke, Vincent Aubanel, Marco A. Piccolino-Boniforti

The JSafran platform for semi-automatic speech processing
Christophe Cerisara, Claire Gardent

The social signal interpretation framework (SSI) for real time signal processing and recognition
Johannes Wagner, Florian Lingenfelser, Elisabeth André

ELAN - aspects of interoperability and functionality
Han Sloetjes, Peter Wittenburg, Aarthy Somasundaram

Open source voice creation toolkit for the MARY TTS platform
Marc Schröder, Marcela Charfuelan, Sathish Pammi, Ingmar Steiner

Java visual speech components for rapid application development of GUI based speech processing applications
Stefan Steidl, Korbinian Riedhammer, Tobias Bocklet, Florian Hönig, Elmar Nöth

mtalk - a multimodal browser for mobile services
Michael Johnston, Giuseppe Di Fabbrizio, Simon Urbanek

Web-based automatic speech recognition service - webASR
Stuart N. Wrigley, Thomas Hain

A web based speech transcription workplace
Markus Klehr, Andreas Ratzka, Thomas Roß

Winpitch: a multimodal tool for speech analysis of endangered languages
Philippe Martin

Recording caregiver interactions for machine acquisition of spoken language using the KLAIR virtual infant
Mark Huckvale

