ISCA Archive Interspeech 2012 Sessions Search Booklet
  ISCA Archive Sessions Search Booklet

Click on column names to sort.

Searching uses the 'and' of terms e.g. Smith Interspeech matches all papers by Smith in any Interspeech. The order of terms is not significant.

Use double quotes for exact phrasal matches e.g. "acoustic features".

Case is ignored.

Diacritics are optional e.g. lefevre also matches lefèvre (but not vice versa).

It can be useful to turn off spell-checking for the search box in your browser preferences.

If you prefer to scroll rather than page, increase the number in the show entries dropdown.


Interspeech 2012

Portland, OR, USA
9-13 September 2012

General Chair: Richard Sproat
doi: 10.21437/Interspeech.2012

ASR: Deep Neural Networks I, II

Large vocabulary speech recognition using deep tensor neural networks
Dong Yu, Li Deng, Frank Seide

Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessian-free optimization
Brian Kingsbury, Tara N. Sainath, Hagen Soltau

Discriminative feature-space transforms using deep neural networks
George Saon, Brian Kingsbury

Context-dependent MLPs for LVCSR: TANDEM, hybrid or both?
Zoltán Tüske, Martin Sundermeyer, Ralf Schlüter, Hermann Ney

Recurrent neural networks for noise reduction in robust ASR
Andrew L. Maas, Quoc V. Le, Tyler M. O'Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng

Pipelined back-propagation for context-dependent deep neural networks
Xie Chen, Adam Eversole, Gang Li, Dong Yu, Frank Seide

Are sparse representations rich enough for acoustic modeling?
Oriol Vinyals, Li Deng

A initial attempt on task-specific adaptation for deep neural network-based large vocabulary continuous speech recognition
Yeming Xiao, Zhen Zhang, Shang Cai, Jielin Pan, Yonghong Yan

Application of pretrained deep neural networks to large vocabulary speech recognition
Navdeep Jaitly, Patrick Nguyen, Andrew Senior, Vincent Vanhoucke

Cross-lingual and ensemble MLPs strategies for low-resource speech recognition
Yanmin Qian, Jia Liu

Initialization schemes for multilayer perceptron training and their impact on ASR performance using multilingual data
Ngoc Thang Vu, Wojtek Breiter, Florian Metze, Tanja Schultz

Hermitian based hidden activation functions for adaptation of hybrid HMM/ANN models
Sabato Marco Siniscalchi, Jinyu Li, Chin-Hui Lee

Integrating deep neural networks into structural classification approach based on weighted finite-state transducers
Yotaro Kubo, Takaaki Hori, Atsushi Nakamura

Parallel training for deep stacking networks
Li Deng, Brian Hutchinson, Dong Yu

Articulatory feature based multilingual MLPs for low-resource speech recognition
Yanmin Qian, Jia Liu

Uncertainty-driven compensation of multi-stream MLP acoustic models for robust ASR ramon
Ramón Fernandez Astudillo, Alberto Abad, João Paulo da Silva Neto

Phonetics and Phonology I, II

Discrimination of linguistic and non-linguistic vocalizations in spontaneous speech: intra- and inter-corpus perspectives
Felix Weninger, Björn Schuller

Accentual transfer from Swiss-German to French. a study of "francais federal"
Mathieu Avanzi, Pauline Dubosson, Sandra Schwab, Nicolas Obin

Phonology & the interpretation of fine phonetic detail in Berlin German
Stefanie Jannedy, Melanie Weirich

Evaluation of a formant-based speech-driven lip motion generation
Carlos T. Ishi, Chaoran Liu, Hiroshi Ishiguro, Norihiro Hagita

Using spectral measures to differentiate Mandarin and Korean sibilant fricatives
Jeffrey Kallay, Jeffrey Holliday

EFL conversational triads: foreigner-directed speech and hyperarticulation
Hua-Li Jian, Richard Konopka

Syllable perception depends on tone perception
Iris Chuoying Ouyang, Khalil Iskarous

Assessing agreement level between forced alignment models with data from endangered language documentation corpora
Christian T. DiCanio, Hosung Nam, Douglas H. Whalen, H. Timothy Bunnell, Jonathan D. Amith, Rey Castillo Garcia

How consonants, dialect and speech rate affect vowel devoicing?
Masako Fujimoto, Seiya Funatsu, Ichiro Fujimoto

Effects of stress and speech rate on vowel quality in Catalan and Spanish
Marianna Nadeu

Predictability affects vowel dispersion and dynamics in the Buckeye corpus
Michael McAuliffe, Molly Babel

Dialectal and generational variations in vowels in spontaneous speech
Robert Allen Fox, Ewa Jacewicz

Perceiving listener-directed speech: effects of authenticity and lexical neighborhood density
Rebecca Scarborough, Georgia Zellou

Acoustic cues of vowel quality to coda nasal perception in southern Min
Ying Chen, Vsevolod Kapatsinski, Susan Guion-Anderson

Lenition of /d/ in spontaneous Spanish and Catalan
Miquel Simonet, José I. Hualde, Marianna Nadeu

Language Modeling

Morpheme level feature-based language models for German LVCSR
Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney

Tied-state mixture language model for WFST-based speech recognition
Hitoshi Yamamoto, Paul R. Dixon, Shigeki Matsuda, Chiori Hori, Hideki Kashioka

Maximum entropy language model adaptation for mobile speech input
Tanel Alumäe, Kaarel Kaljurand

Supervised and unsupervised web-based language model domain adaptation
Gwénolé Lecorvé, John Dines, Thomas Hain, Petr Motlicek

A hierarchical Bayesian approach for semi-supervised discriminative language modeling
Yik-Cheung Tam, Paul Vozila

Leveraging social annotation for topic language model adaptation
Youzheng Wu, Kazuhiko Abe, Paul R. Dixon, Chiori Hori, Hideki Kashioka

LSTM neural networks for language modeling
Martin Sundermeyer, Ralf Schlüter, Hermann Ney

Phrasal cohort based unsupervised discriminative language modeling
Puyang Xu, Brian Roark, Sanjeev Khudanpur

Deriving conversation-based features from unlabeled speech for discriminative language modeling
Damianos Karakos, Brian Roark, Izhak Shafran, Kenji Sagae, Maider Lehr, Emily Prud'hommeaux, Puyang Xu, Nathan Glenn, Sanjeev Khudanpur, Murat Saraclar, Dan Bikel, Mark Dredze, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philip Koehn, Adam Lopez, Matt Post, Darcey Riley

Performance comparison of training algorithms for semi-supervised discriminative language modeling
Erinç Dikici, Arda Çelebi, Murat Saraçlar

On-the-fly topic adaptation for YouTube video transcription
Kapil Thadani, Fadi Biadsy, Dan Bikel

Spoken Language Understanding and Dialog I, II

Portability of semantic annotations for fast development of dialogue corpora
Bassam Jabaian, Fabrice Lefèvre, Laurent Besacier

Optimization of dialog strategies using automatic dialog simulation and statistical dialog management techniques
David Griol, Zoraida Callejas, Ramón López-Cózar

Preference-learning based inverse reinforcement learning for dialog control
Hiroaki Sugiyama, Toyomi Meguro, Yasuhiro Minami

A data-driven approach to understanding spoken route directions in human-robot dialogue
Raveesh Meena, Gabriel Skantze, Joakim Gustafson

Detecting system-directed utterances using dialogue-level features
Kazunori Komatani, Akira Hirano, Mikio Nakano

An online generated transducer to increase dialog manager coverage
Joaquin Planells, Lluís-F. Hurtado, Emilio Sanchis, Encarna Segarra

A sequential Bayesian dialog agent for computational ethnography
Abe Kazemzadeh, James Gibson, Juanchen Li, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth Narayanan

Clippyscript: a programming language for multi-domain dialogue systems
Frank Seide, Sean McDirmid

Correlation between model-based approximations of grounding-related cognition and user judgments
Klaus-Peter Engelbrecht, Sebastian Möller

Assessment of user simulators for spoken dialogue systems by means of subspace multidimensional clustering
Zoraida Callejas, David Griol, Klaus-Peter Engelbrecht

“help me, i need more user tests!” user simulations as supportive tool in the development process of spoken dialogue systems
Florian Kretzschmar, Sebastian Möller

Caller response timing patterns in spoken dialog systems
Silke M. Witt

A discriminative classification-based approach to information state updates for a multi-domain dialog system
Dilek Hakkani-Tür, Gokhan Tur, Larry Heck, Ashley Fidler, Asli Celikyilmaz

Learning when to listen: detecting system-addressed speech in human-human-computer dialog
Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tür, Larry Heck

Exploiting the semantic web for unsupervised natural language semantic parsing
Gokhan Tur, Minwoo Jeong, Ye-Yi Wang, Dilek Hakkani-Tür, Larry Heck

Prosodic entrainment in an information-driven dialog system
Andrew Fandrianto, Maxine Eskenazi

Speaker Trait Challenge I, II (Special Session)

The INTERSPEECH 2012 speaker trait challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Elmar Nöth, Alessandro Vinciarelli, Felix Burkhardt, Rob van Son, Felix Weninger, Florian Eyben, Tobias Bocklet, Gelareh Mohammadi, Benjamin Weiss

On speaker-independent personality perception and prediction from speech
Tim Polzehl, Katrin Schoenenberg, Sebastian Möller, Florian Metze, Gelareh Mohammadi, Alessandro Vinciarelli

Speaker personality classification using systems based on acoustic-lexical cues and an optimal tree-structured Bayesian network
Kartik Audhkhasi, Angeliki Metallinou, Ming Li, Shrikanth S. Narayanan

Personality traits detection using a parallelized modified SFFS algorithm
Clément Chastagnol, Laurence Devillers

Feature selection for speaker traits
Jouni Pohjalainen, Serdar Kadioglu, Okko Räsänen

A frame pruning approach for paralinguistic recognition tasks
Johannes Wagner, Florian Lingenfelser, Elisabeth André

Modulation spectrum analysis for speaker personality trait recognition
Alexei Ivanov, Xin Chen

A comparison of classification paradigms for speaker likeability determination
Nicholas Cummins, Julien Epps, Jia Min Karen Kua

Predicting likability of speakers with Gaussian processes
Dingchao Lu, Fei Sha

Likability classification - a not so deep neural network approach
Raymond Brueckner, Björn Schuller

Genetic algorithm based feature selection for speaker trait classification
Dongrui Wu

Is 'not bad' good enough? aspects of unknown voices' likability
Benjamin Weiss, Felix Burkhardt

Multi-system fusion of extended context prosodic and cepstral features for paralinguistic speaker trait classification
Michelle Hewlett Sanchez, Aaron Lawson, Dimitra Vergyri, Harry Bratt

The log-Gabor method: speech classification using spectrogram image analysis
Harm Buisman, Eric Postma

Anchor models and WCCN normalization for speaker trait classification
Yazid Attabi, Pierre Dumouchel

Pitch and intonation contribution to speakers' traits classification
Claude Montacié, Marie-José Caraty

Text-dependent pathological voice detection
Gopala Krishna Anumanchipalli, Hugo Meinedo, Miguel Bugalho, Isabel Trancoso, Luís C. Oliveira, Alan W. Black

Intelligibility classification of pathological speech using fusion of multiple high level descriptors
Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth Narayanan

Interspeech pathology challenge: investigations into speaker and sentence specific effects
Anthony Stark, Alireza Bayestehtashk, Meysam Asgari, Izhak Shafran

Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations
Xinhui Zhou, Daniel Garcia-Romero, Nima Mesgarani, Maureen Stone, Carol Espy-Wilson, Shihab Shamma

Detecting intelligibility by linear dimensionality reduction and normalized voice quality hierarchical features
Dong-Yan Huang, Yongwei Zhu, Dajun Wu, Rongshan Yu

Paralinguistics I-III

Novel metrics of speech rhythm for the assessment of emotion
Fabien Ringeval, Mohamed Chetouani, Björn Schuller

Temporal and situational context modeling for improved dominance recognition in meetings
Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll

Audiovisual correlates of basic emotions in blind and sighted people
Marc Swerts, Kitty Leuverink, Madelene Munnik, Vera Nijveld

Combining ranking and classification to improve emotion recognition in spontaneous speech
Houwei Cao, Ragini Verma, Ani Nenkova

Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition
Zixing Zhang, Björn Schuller

Emotion recognition using acoustic and lexical features
Viktor Rozgić, Sankaranarayanan Ananthakrishnan, Shirin Saleem, Rohit Kumar, Aravind Namandi Vembu, Rohit Prasad

Improving recognition of speaker states and traits by cumulative evidence: intoxication, sleepiness, age and gender
Felix Weninger, Erik Marchi, Björn Schuller

Speaker clustering in emotion recognition
Ni Ding, Julien Epps

Automatic detection of conflict escalation in spoken conversations
Samuel Kim, Sree Harsha Yella, Fabio Valente

The entropy of intoxicated speech.lexical creativity and heavy tongues
Uwe D. Reichel, Thomas Kisler

A robust unsupervised arousal rating framework using prosody with cross-corpora evaluation
Daniel Bone, Chi-Chun Lee, Shrikanth S. Narayanan

Unveiling the acoustic properties that describe the valence dimension
Carlos Busso, Tauhidur Rahman

Annotation and recognition of personality traits in spoken conversations from the AMI meetings corpus
Fabio Valente, Samuel Kim, Petr Motlicek

The effects of lexical tones and nasal coda /-n/ to sadness in Taiwan Hakka
Shao-ren Lyu

Confidence measures in speech emotion recognition based on semi-supervised learning
Jun Deng, Björn Schuller

Using i-vector space model for emotion recognition
Rui Xia, Yang Liu

Cries and whispers.classification of vocal effort in expressive speech
Nicolas Obin

Emotional speech: a spectral analysis
Pouria Fewzee, Fakhri Karray

Classifying skewed data: importance weighting to optimize average recall
Andrew Rosenberg

Gaze patterns in turn-taking
Catharine Oertel, Marcin Włodarczak, Jens Edlund, Petra Wagner, Joakim Gustafson

The "audio-visual face cover corpus": investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear
Natalie Fecher

A case study: detecting counselor reflections in psychotherapy for addictions using linguistic features
Doğan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan

Perceptual Learning and Perceptual Cues to Segments and Tones

Extrinsic normalization for vocal tracts depends on the signal, not on attention
Matthias Sjerps, James M. McQueen, Holger Mitterer

Perceptual learning of /f/-/s/ by older listeners
Odette Scharenborg, Esther Janse, Andrea Weber

Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers
Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki

Detection of transition segments in VCV utterances for estimation of the place of closure of oral stops for speech training
J. Jagbandhu, K. S. Nataraj, Prem C. Pandey

Audiovisual discrimination of CV syllables: a simultaneous fMRI-EEG study
Cyril Dubois, Rudolph Sock

Contribution of spectral shapes to tone perception
Natthawut Kertkeidkachorn, Surapol Vorapatratorn, Sirinart Tangruamsub, Proadpran Punyabukkana, Atiwong Suchato

Methodological issues in assessing perceptual representation of consonant sounds in Thai
Charturong Tantibundhit, Chutamanee Onsuwan, P. Phienphanich, Chai Wutiwiwatchai

Pitch and phonological perception of tone in the Suruí language of Rondônia (Brazil): identification task of LHL and LHH tonal patterns
Julien Meyer

The role of creaky voice in Mandarin tone 2 and tone 3 perception
Rui Cao, Ratree Wayland, Edith Kaan

Can litheners retune native categories acroth a thoneme boundary?
Michael D. Tyler, Mona M. Faris

Speech Synthesis: Prosody

Synthetic F0 can effectively convey speaker ID in delexicalized speech
Eric Morley, Esther Klabbers, Jan P. H. van Santen, Alexander Kain, Seyed Hamidreza Mohammadi

Evaluating prosodic processing for incremental speech synthesis
Timo Baumann, David Schlangen

Expressing speaker's intentions through sentence-final intonations for Japanese conversational speech synthesis
Kazuhiko Iwata, Tetsunori Kobayashi

Modeling pause-duration for style-specific speech synthesis
Alok Parlikar, Alan W. Black

Enumerating differences between various communicative functions for purposes of Czech expressive speech synthesis in limited domain
Martin Gruber

Quality analysis of macroprosodic F0 dynamics in text-to-speech signals
Christoph R. Norrenbrock, Florian Hinterleitner, Ulrich Heute, Sebastian Möller

Improved automatic extraction of generation process model commands and its use for generating fundamental frequency contours for training HMM-based speech synthesis
Hiroya Hashimoto, Keikichi Hirose, Nobuaki Minematsu

Discontinuous observation HMM for prosodic-event-based F0 generation
Tomoki Koriyama, Takashi Nose, Takao Kobayashi

Hierarchical English emphatic speech synthesis based on HMM with limited training data
Fanbo Meng, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai

Employing sentence structure: syntax trees as prosody generators
Sarah Hoffmann, Beat Pfister

A stochastic model of singing voice F0 contours for characterizing expressive dynamic components
Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Kunio Kashino

Prosody I, II

Naturalness judgement of prosodic variation of Japanese utterances with prosody modified stimuli
Chiharu Tsurutani, Shunichi Ishihara

Effects of dialectal origin on articulation rate in French
Mathieu Avanzi, Pauline Dubosson, Sandra Schwab

A new approach of speaking rate modeling for Mandarin speech prosody
Chiao-Hua Hsieh, Chen-Yu Chiang, Yih-Ru Wang, Hsiu-Min Yu, Sin-Horng Chen

Modelling pause duration as a function of contextual length
David Doukhan, Albert Rilliard, Sophie Rosset, Christophe D'Alessandro

Production and perception of focus in PFC and non-PFC languages: comparing beijing Mandarin and hainan tsat
Bei Wang, Chenxia Li, Qian Wu, Xiaxia Zhang, Baofeng Wang, Yi Xu

Prosodic realization of focus in statement and question in tibetan (lhasa dialect)
Xiaxia Zhang, Bei Wang, Qian Wu, Yi Xu

Effect of noise type and level on focus related fundamental frequency changes
Martti Vainio, Daniel Aalto, Antti Suni, Anja Arnhold, Tuomo Raitio, Henri Seijo, Juhani Järvikivi, Paavo Alku

Role of prosody in automatic modality recognition of bangla speech
Anal Warsi, Tulika Basu, Debasis Mazumdar

Where to associate stressed additive particles? evidence from speech prosody
Bettina Braun

From PVI to perception: a return to the roots of rhythm in broadcast news
Matthew Benton

A methodology for the study of rhythm in drummed forms of languages: application to Bora Manguare of Amazon
Julien Meyer, Laure Dentel, Frank Seifart

Perception of pitch contours among native tone listeners
Ratree Wayland, Donruethai Laphasradakul, Edith Kaan, Rui Cao

Pitch range control of Japanese boundary pitch movements
Yosuke Igarashi, Hanae Koiso

Perceived prosodic boundaries in taiwanese and their acoustic correlates
Grace Kuo

Phonetic foreignization of Mandarin for dubbing in imported western movies
Laying Hon, Yuan Jia, Aijun Li

Prosodic contex-based analysis of disfluencies.
Helena Moniz, Fernando Batista, Isabel Trancoso, Ana Isabel Mata

Describing the development of intonational categories using a target-oriented parametric approach
Britta Lintfert, Bernd Möbius

Computer Assisted Language Learning I, II

Robust tracking for automatic reading tutors
Emre Yilmaz, Dirk van Compernolle, Hugo Van hamme

Maximum F1-score discriminative training for automatic mispronunciation detection in computer-assisted language learning
Hao Huang, Jianming Wang, Halidan Abudureyimu

Error pattern detection integrating generative and discriminative learning for computer-aided pronunciation training
Yow-Bang Wang, Lin-Shan Lee

The automatic assessment of non-native prosody: combining classical prosodic analysis with acoustic modelling
Florian Hönig, Tobias Bocklet, Korbinian Riedhammer, Anton Batliner, Elmar Nöth

Improving L1-specific phonological error diagnosis in computer assisted pronunciation training
Theban Stanley, Kadri Hacioglu

A self-learning assistive vocal interface based on vocabulary learning and grammar induction
Jort F. Gemmeke, Janneke van de Loo, Guy de Pauw, Joris Driesen, Hugo Van hamme, Walter Daelemans

Real-time visualization of English pronunciation on an IPA chart based on articulatory feature extraction
Yurie Iribe, Takurou Mori, Kouichi Katsurada, Goh Kawai, Tsuneo Nitta

Acoustic feature-based non-scorable response detection for an automated speaking proficiency assessment
Je Hun Jeon, Su-Youn Yoon

Pronunciation quality evaluation of sentences by combining word based scores
Jorge Wuth, Néstor Becerra Yoma, Leopoldo Benavides, Hiram Vivanco

Designing a spoken language interface for a tutorial dialogue system
Peter Bell, Myroslava Dzikovska, Amy Isard

Automatic pronunciation error detection based on extended pronunciation space using the unsupervised clustering of pronunciation errors
Long Zhang, Haifeng Li, Lin Ma

Less errors with TTS? a dictation experiment with foreign language learners
Thomas Pellegrini, Ângela Costa, Isabel Trancoso

Improvement in automatic pronunciation scoring using additional basic scores and learning to rank
Liang-Yu Chen, Jyh-Shing Roger Jang

Automatic tone assessment of non-native Mandarin speakers
Jian Cheng

Analysis of Spoken Disorders in Health Applications I, II (Special Session)

Fully automated neuropsychological assessment for detecting mild cognitive impairment
Maider Lehr, Emily Prud'hommeaux, Izhak Shafran, Brian Roark

Spontaneous-speech acoustic-prosodic features of children with autism and the interacting psychologist
Daniel Bone, Matthew P. Black, Chi-Chun Lee, Marian E. Williams, Pat Levitt, Sungbok Lee, Shrikanth Narayanan

Contrastive intonation in autism: the effect of speaker- and listener-perspective
Constantijn Kaland, Emiel Krahmer, Marc Swerts

Characterizing covert articulation in apraxic speech using real-time MRI
Christina Hagedorn, Michael Proctor, Louis Goldstein, Maria Luisa Gorno Tempini, Shrikanth S. Narayanan

Automatic word naming recognition for treatment and assessment of aphasia
Alberto Abad, Anna Pompili, Angela Costa, Isabel Trancoso

Vocal-source biomarkers for depression: a link to psychomotor activity
Thomas F. Quatieri, Nicolas Malyska

Audio and contact microphones for cough detection
Thomas Drugman, Jerome Urbain, Nathalie Bauwens, Ricardo Chessini, Anne-Sophie Aubriot, Patrick Lebecque, Thierry Dutoit

Analyzing and interpreting automatically learned rules across dialects
Nancy F. Chen, Wade Shen, Joseph P. Campbell

The effect of use of drugs on speaker's fundamental frequency and formants
Andrey Raev, Yuri Matveev, Tatiana Goloshchapova

On the assessment of audiovisual cues to speaker confidence by preteens with typical development (TD) and a-typical development (AD)
Marc Swerts, Cees de Bie

Interplay between verbal response latency and physiology of children with autism during ECA interactions
Theodora Chaspari, Chi-Chun Lee, Shrikanth Narayanan

Combination of multiple speech dimensions for automatic assessment of dysarthric speech intelligibility
Myung Jong Kim, Hoirin Kim

Whole-word recognition from articulatory movements for silent speech interfaces
Jun Wang, Ashok Samal, Jordan R. Green, Frank Rudzicz

Verifying session level pronunciation accuracy in a speech therapy application
Shou-Chun Yin, Richard C. Rose, Yun Tang

Duration of ambulatory monitoring needed to accurately estimate voice use
Daryush D. Mehta, Rebecca Woodbury Listfield, Harold A. Cheyne II, James T. Heaton, Shengran W. Feng, Matías Zañartu, Robert E. Hillman

Evaluating NLP features for automatic prediction of language impairment using child speech transcripts
Khairun-nisa Hassanali, Yang Liu, Thamar Solorio

Quantitative analysis of pitch in speech of children with neurodevelopmental disorders
Géza Kiss, Jan P. H. van Santen, Emily Prud'hommeaux, Lois M. Black

Speaker Recognition I-III

Mixture component clustering for efficient speaker verification
Richard D. McClanahan, Phillip L. De Leon

Front-end channel compensation using mixture-dependent feature transformations for i-vector speaker recognition
Taufiq Hasan, John H. L. Hansen

Query-by-example using speaker content graphs
William M. Campbell, Elliot Singer

Unsupervised NAP training data design for speaker recognition
Hanwu Sun, Bin Ma

The role of score calibration in speaker recognition
George Doddington

A Bayesian approach to speaker recognition based on GMMs using multiple model structures
Takafumi Hattori, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Residual phase cepstrum coefficients with application to cross-lingual speaker verification
Jianglin Wang, Michael Johnson

Speaker veri.cation using neighborhood preserving embedding
Chunyan Liang, Jinchao Yang, Lin Yang, Yonghong Yan

Discriminative decision function based scoring method in joint factor analysis for speaker verification
Chunyan Liang, Xiang Zhang, Lin Yang, Yonghong Yan

Integrated feature normalization and enhancement for robust speaker recognition using acoustic factor analysis
Taufiq Hasan, John H. L. Hansen

Factor analysis and nuisance attribute projection revisited
Lukáš Machlica, Zbyněk Zajic

Compensation of intrinsic variability with factor analysis modeling for robust speaker verification
Sheng Chen, Mingxing Xu

RSR2015: database for text-dependent speaker verification using multiple pass-phrases
Anthony Larcher, Kong Aik Lee, Bin Ma, Haizhou Li

Speaker idiosyncratic rhythmic features in the speech signal
Volker Dellwo, Adrian Leemann, Marie-José Kolly

Bilinear factor analysis for i-vector based speaker verification
Yun Lei, Lukáš Burget, Nicolas Scheffer

Unsupervised speaker identification using overlaid texts in TV broadcast
Johann Poignant, Hervé Bredin, Viet Bac Le, Laurent Besacier, Claude Barras, Georges Quénot

Mask estimation and refinement for MFT-based robust speaker verification
Yali Zhao, Lie Xie, Zhonghua Fu

Sparse probabilistic linear discriminant analysis for speaker verification
Hai Yang, Chunyan Liang, Yunfei Xu, Lin Yang, Yonghong Yan

Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification
Achintya Kumar Sarkar, Driss Matrouf, Pierre Michel Bousquet, Jean-François Bonastre

Ensemble classifiers using unsupervised data selection for speaker recognition
Chien-Lin Huang, Chiori Hori, Hideki Kashioka, Bin Ma

A method of speaker identification based on phoneme mean F-Ratio contribution
Songgun Hyon, Hongcui Wang, Chen Zhao, Jianguo Wei, Jianwu Dang

Mitigating effects of recording condition mismatch in speaker recognition using partial least squares
Jeremiah J. Remus, Jenniffer M. Estrada, Stephanie A. C. Schuckers

HMM Synthesis I, II

Combining multiple high quality corpora for improving HMM-TTS
Vincent Wan, Javier Latorre, K. K. Chin, Langzhou Chen, Mark J. F. Gales, Heiga Zen, Kate Knill, Masami Akamine

An evaluation of parameter generation methods with rich context models in HMM-based speech synthesis
Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, Sakriani Sakti, Satoshi Nakamura

Using Bayesian networks to find relevant context features for HMM-based speech synthesis
Heng Lu, Simon King

Considering global variance of the log power spectrum derived from mel-cepstrum in HMM-based parametric speech synthesis
Xiang Yin, Zhen-Hua Ling, Ming Lei, Lirong Dai

A speech parameter generation algorithm using local bariance for HMM-based speech synthesis
Vataya Chunwijitra, Takashi Nose, Takao Kobayashi

Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP
Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine

Wideband parametric speech synthesis using warped linear prediction
Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku

Modeling the creaky excitation for parametric speech synthesis
Thomas Drugman, John Kane, Christer Gobl

Amplitude spectrum based excitation model for HMM-based speech synthesis
Zhengqi Wen, Jianhua Tao

Speech synthesis using a non-maximally decimated filter bank for embedded systems
Nobuyuki Nishizawa, Tsuneo Kato

Ways to implement global variance in statistical speech synthesis
Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj

HMM-based speech synthesis using sub-band basis spectrum model
Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine

ASR: Robust Features I, II

Amplitude modulation filters as feature sets for robust ASR: constant absolute or relative bandwidth?
Niko Moritz, Jörn Anemüller, Birger Kollmeier

Effect of speech priors in single-channel speech-music separation for ASR
Cemil Demir, A. Taylan Cemgil, Murat Saraçlar

On the role of binary mask pattern in automatic speech recognition
Arun Narayanan, DeLiang Wang

Dereverberation based on wavelet packet filtering for robust automatic speech recognition
Randy Gomez, Tatsuya Kawahara

Spectral intersections for non-stationary signal separation
Trausti Kristjansson, Thad Hughes

Speech recognition by denoising and dereverberation based on spectral subtraction in a real noisy reverberant environment
Kyohei Odani, Longbiao Wang, Atsuhiko Kai

Q-Gaussian based spectral subtraction for robust speech recognition
Hilman F. Pardede, Koichi Shinoda, Koji Iwano

Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition
Bernd T. Meyer, Constantin Spille, Birger Kollmeier, Nelson Morgan

Feature extraction based on hearing system signal processing for robust large vocabulary speech recognition
Qi Peter Li, Xie Sun

Automatic estimation of the first two subglottal resonances in children's speech with application to speaker normalization in limited-data conditions
Harish Arsikere, Gary K. F. Leung, Steven M. Lulich, Abeer Alwan

Robust phoneme recognition based on biomimetic speech contours
Michael A. Carlin, Kailash Patil, Sridhar Krishna Nemala, Mounya Elhilali

A feature space transformation method for personalization using generalized i-vector clustering
Kaisheng Yao, Yifan Gong, Chaojun Liu

Longer features: they do a speech detector good
T. J. Tsai, Nelson Morgan

Robust feature extraction for speech recognition by enhancing auditory spectrum
Md Jahangir Alam, Patrick Kenny, Douglas O'Shaughnessy

Enhancing vocal tract length normalization with elastic registration for automatic speech recognition
Florian Müller, Alfred Mertins

Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double talk scenarios
Hannes Pessentheiner, Stefan Petrik, Harald Romsdorfer

Rich Transcription I, II

Novel approach to live captioning through re-speaking: tailoring speech recognition to re-speaker's needs
Aleš Pražák, Zdeněk Loos, Jan Trmal, Josef V. Psutka, Josef Psutka

Development and evaluation of automatic punctuation for French and English speech-to-text
Jáchym Kolář, Lori Lamel

Spoken document clustering using word confusion networks
Shajith Ikbal, Sachindra Joshi, Ashish Verma, Om D. Deshmukh

Dynamic conditional random fields for joint sentence boundary and punctuation prediction
Xuancong Wang, Hwee Tou Ng, Khe Chai Sim

Analysis of the characteristics of talk-show TV programs
Fabio Brugnara, Daniele Falavigna, Diego Giuliani, Roberto Gretter

Rethinking the corpus: moving towards dynamic linguistic resources
Andrew Rosenberg

Speaker recognition for children's speech
Saeid Safavi, Maryam Najafian, Abualsoud Hanani, Martin Russell, Peter Jančovič, Michael Carey

A simple and efficient method to align very long speech signals to acoustically imperfect transcriptions
Germán Bordel, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, Amparo Varona

Estimation of talker's head orientation based on discrimination of the shape of cross-power spectrum phase coefficients
Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki

Sentence detection using multiple annotations
Ann Lee, James Glass

A speaker-role based approach for detecting Politicians in TV broadcast news
Delphine Charlet, Geraldine Damnati

Relative importance of temporal envelope and fine structure cues in low- and high- order harmonic regions for Mandarin lexical-tone recognition
Guangting Mai

Real-time implementation of multi-band frequency compression for listeners with moderate sensorineural impairment
Nitya Tiwari, Prem C. Pandey, Pandurangarao N. Kulkarni

Word prominence detection using robust yet simple prosodic features
Taniya Mishra, Vivek Rangarajan Sridhar, Alistair Conkie

Online story segmentation of multilingual streaming broadcast news
Amit Srivastava, Saurabh Khanwalkar, Gretchen Markiewicz, Guruprasad Saikumar

Glottal Source Processing: from Analysis to Applications (Special Session)

Resonator-based creaky voice detection
Thomas Drugman, John Kane, Christer Gobl

Effect of tongue tip trilling on the glottal excitation source
V. K. Mittal, N. Dhananjaya, Bayya Yegnanarayana

Estimating the voice source in noise
Gang Chen, Yen-Liang Shue, Jody Kreiman, Abeer Alwan

Voice source analysis using biomechanical modeling and glottal inverse filtering
Alan Pinheiro, Tuomo Raitio, Danyane Gomes, Paavo Alku

Speech modeling and processing by low-dimensional dynamic glottal models
Carlo Drioli, Andrea Calanca

Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction
Paavo Alku, Jouni Pohjalainen, Martti Vainio, Anne-Maria Laukkanen, Brad Story

Automatic topology generation of glottal source HMM
Akira Sasou

Towards glottal source controllability in expressive speech synthesis
Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Tuomo Raitio, Nicolas Obin, Paavo Alku, Junichi Yamagishi, Juan M. Montero

Combining temporal and cepstral features for the automatic perceptual categorization of disordered connected speech
Ali Alpan, Jean Schoentgen, Francis Grenez

A preliminary study on cross-databases emotion recognition using the glottal features in speech
Rui Sun, Elliot Moore II

Analysis on the importance of short-term speech parameterizations for emotional statistical parametric speech synthesis
Ranniery Maia, Masami Akamine

Analysis of vocal tremor and jitter by empirical mode decomposition of glottal cycle length time series
Christophe Mertens, Francis Grenez, Jean Schoentgen

Utilizing Markov chain Monte Carlo (MCMC) method for improved glottal inverse filtering
Harri Auvinen, Tuomo Raitio, Samuli Siltanen, Paavo Alku

Glottal source shape parameter estimation using phase minimization variants
Stefan Huber, Axel Roebel, Gilles Degottex

Glottal waveform analysis of physical task stress speech
Keith W. Godin, Taufiq Hasan, John H. L. Hansen

Speaker discrimination ability of glottal waveform features
Juan Félix Torres, Elliot Moore

Robust Speech Recognition I, II

Exploring discriminative speech trajectory structures
Heyun Huang, Louis ten Bosch, Bert Cranen, Lou Boves

Estimating classifier performance in unknown noise
Ehsan Variani, Hynek Hermansky

Continuous digit recognition in noise: reservoirs can do an excellent job!
Azarakhsh Jalalvand, Fabian Triefenbach, Jean-Pierre Martens

Optimization-based control for the extended baum-welch algorithm
Janne Pylkkönen, Mikko Kurimo

Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems
Marc René Schädler, Birger Kollmeier

Phone recognition in critical bands using sub-band temporal modulations
Feipeng Li, Sri Harish Mallidi, Hynek Hermansky

Combining acoustic data driven G2p and letter-to-sound rules for under resource lexicon generation
Ramya Rasipuram, Mathew M. Doss

CRF-based diacritisation of colloquial Arabic for automatic speech recognition
Sarah Al-Shareef, Thomas Hain

Analysis of temporal resolution in frequency domain linear prediction
Sriram Ganapathy, Hynek Hermansky

White listing and score normalization for keyword spotting of noisy speech
Bing Zhang, Richard Schwartz, Stavros Tsakalidis, Long Nguyen, Spyros Matsoukas

Complementary phone error training
Frank Diehl, Phillip C. Woodland

Posterior-scaled MPE: novel discriminative training criteria
Markus Nussbaum-Thom, Zoltan Tuske, Georg Heigold, Ralf Schlüter, Hermann Ney

Improve the implementation of pitch features for Mandarin digit string recognition task
Pei Ding, Liqiang He

Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition
Hsin-Ju Hsieh, Jeih-weih Hung, Berlin Chen

Speaker-dependent voice activity detection robust to background speech noise
Shigeki Matsuda, Naoya Ito, Kosuke Tsujino, Hideki Kashioka, Shigeki Sagayama

Log-spectral feature reconstruction based on an occlusion model for noise robust speech recognition
Jose A. González, Antonio M. Peinado, Angel M. Gómez, Ning Ma

Decoding of uncertain features using the posterior distribution of the clean data for robust speech recognition
Ahmed Hussen Abdelaziz, Dorothea Kolossa

Coupling identification and reconstruction of missing features for noise-robust automatic speech recognition
Ning Ma, Jon Barker

Integrating stress information in large vocabulary continuous speech recognition
Bogdan Ludusan, Stefan Ziegler, Guillaume Gravier

Group sparse hidden Markov models for speech recognition
Jen-Tzung Chien, Cheng-Chun Chiang

Speech Tools and Systems Demo (Special Session)

The speech recognition virtual kitchen: an initial prototype
Florian Metze, Eric Fosler-Lussier

Perma and Balloon: tools for string alignment and text processing
Uwe D. Reichel

Visartico: a visualization tool for articulatory data
Slim Ouni, Loïc Mangeonjean, Ingmar Steiner

Towards automated annotation of audio and video recordings by application of advanced web-services
Przemyslaw Lenkiewicz, Dieter van Uytvanck, Peter Wittenburg, Sebastian Drude

A rule based pronunciation generator and regional accent databank for Portuguese
Simone Ashby, Sílvia Barbosa, Silvia Brandão, José Pedro Ferreira, Maarten Janssen, Catarina Silva, Mário Eduardo Viaro

Speech enhancement for android (SEA): a speech processing demonstration tool for android based smart phones and tablets
Roger Chappel, Kuldip Paliwal

ProTK: an improved prosody toolkit
Jacob Okamoto, Serguei Pakhomov, Elizabeth Shriberg, Andreas Stolcke

Speechmark: landmark detection tool for speech analysis
Suzanne Boyce, Harriet Fell, Joel MacAuslan

A tutorial dialogue system with unrestricted spoken input
Peter Bell, Myroslava Dzikovska, Amy Isard

Integrating adaptive beam-forming and auditory features for robust large vocabulary speech recognition
Xie Sun, Qi Peter Li, Manli Zhu, Qiru Zhou

A natural in-car speech interface to internet services using hybrid ASR
Hansjörg Hofmann, Ute Ehrlich, Klaus Bader, Ilona Nothelfer, André Berton

How marni helps English language learners acquire oral reading fluency
Ronald A. Cole, Daniel Bolanos, Wayne H. Ward, J. T. Carmer, Eric Borts, Edward Svirsky

Demonstration of advanced multi-modal, network-centric communication management suite
Victor Finomore Jr, John Stewart, Rita Singh, Bhiksha Raj, Ron Dallman

Dutch automatic speech recognition on the web: towards a general purpose system
Joris Pelemans, Kris Demuynck, Patrick Wambacq

An on-line, cloud-based Spanish-Spanish sign language translation system
Javier Tejedor, Fernando López-Colino, Jordi Porta, José Colás

Voice Search and Spoken Document Retrieval I, II

Open-vocabulary retrieval of spoken content with shorter/longer queries considering word/subword-based acoustic feature similarity
Huny-yi Lee, Po-wei Chou, Lin-shan Lee

Consumer-level multimedia event detection through unsupervised audio signal modeling
Byungki Byun, Ilseo Kim, Sabato Marco Siniscalchi, Chin-Hui Lee

Event-based video retrieval using audio
Qin Jin, Peter Schulam, Shourabh Rawat, Susanne Burger, Duo Ding, Florian Metze

Compact audio representation for event detection in consumer media
Xiaodan Zhuang, Stavros Tsakalidis, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan

N-gram FST indexing for spoken term detection
Chao Liu, Dong Wang, Javier Tejedor

Spoken inquiry discrimination using bag-of-words for speech-oriented guidance system
Haruka Majima, Rafael Torres, Yoko Fujita, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano

Robust event detection from spoken content in consumer domain videos
Stavros Tsakalidis, Xiaodan Zhuang, Roger Hsiao, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan

Bag-of-audio-words approach for multimedia event classification
Stephanie Pancoast, Murat Akbacak

Improvements in Japanese voice search
Ken-ichi Iso, Edward Whittaker, Tadashi Emori, Junpei Miyake

A conversational movie search system based on conditional random fields
Jingjing Liu, Scott Cyphers, Panupong Pasupat, Ian McGraw, James Glass

Interactive spoken content retrieval with different types of actions optimized by a Markov decision process
Tsung-Hsien Wen, Hung-Yi Lee, Lin-Shan Lee

Voice query refinement
Cyril Allauzen, Edward Benson, Ciprian Chelba, Michael Riley, Johan Schalkwyk

Indexing raw acoustic features for scalable zero resource search
Aren Jansen, Benjamin Van Durme

Lexical-phonetic automata for spoken utterance indexing and retrieval
Julien Fayolle, Murat Saraçlar, Fabienne Moreau, Christian Raymond, Guillaume Gravier

Automating crowd-supervised learning for spoken language systems
Ian McGraw, Scott Cyphers, Panupong Pasupat, Jingjing Liu, James Glass

Spoken Language Understanding

Spelling as a complementary strategy for speech recognition
Keith Vertanen, Per Ola Kristensson

Automatic error recovery for pronunciation dictionaries
Tim Schlippe, Sebastian Ochs, Ngoc Thang Vu, Tanja Schultz

Confidence measure for speech indexing based on latent dirichlet allocation
Grégory Senay, Georges Linarès

Mixed probabilistic and deterministic dependency parsing
Christophe Cerisara, Alejandra Lorenzo

Automatic vocabulary adaptation based on semantic similarity and speech recognition confidence measure
Shoko Yamahata, Yoshikazu Yamaguchi, Atsunori Ogawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi

Towards empirical dialog-state modeling and its use in language modeling
Nigel G. Ward, Alejandro Vega

Evaluation of many-to-many alignment algorithm by automatic pronunciation annotation using web text mining
Keigo Kubo, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

Applying multiview learning algorithms to human-human conversation classification
Sokol Koço, Cécile Capponi, Frédéric Béchet

Automatic transcription of lecture speech using language model based on speaking- style transformation of proceeding texts
Yuya Akita, Makoto Watanabe, Tatsuya Kawahara

Normalization of text messages using character- and phone-based machine translation approaches
Chen Li, Yang Liu

A weighted combination of speech with text-based models for Arabic diacritization
Aisha S. Azim, Xiaoxuan Wang, Sim Khe Chai

Using sub-word-level information for confidence estimation with conditional random field models
Matthew S. Seigel, Phillip C. Woodland

Prosodic Prominence: Annotation, Prediction, Applications (Special Session)

Visualizing tool for evaluating inter-label similarity in prosodic labeling experiments
David Escudero-Mancebo, Eva Estebas-Vilaplana

Objective, subjective and linguistic roads to perceptual are they compared and why?
Petra Wagner, Fabio Tamburini, Andreas Windmann

Audio-visual evaluation and detection of word prominence in a human-machine interaction scenario
Martin Heckmann

Obtaining prominence judgments from naive listeners.influence of rating scales linguistic levels and normalisation
Denis Arnold, Petra Wagner, Bernd Möbius

Towards hierarchical prosodic prominence generation in TTS synthesis
Leonardo Badino, Robert A. J. Clark, Mirjam Wester

Investigating syllabic prominence with conditional random fields and latent-dynamic conditional random fields
Francesco Cutugno, Enrico Leone, Bogdan Ludusan, Antonio Origlia

Disentangling lexical, morphological, syntactic and semantic influences on German prominence - evidence from a production study
Barbara Samlowski, Petra Wagner, Bernd Möbius

Using prominence and phrasing predictions to improve weighted dictionary pronunciation models
Andrew Rosenberg

A continuous prominence score based on acoustic features
Jean-Philippe Goldman, Mathieu Avanzi, Antoine Auchlin, Anne Catherine Simon

More on the normalization of syllable prominence ratings
Christopher Sappok, Denis Arnold

F0 and the perception of prominence
Tim Mahrt, Jennifer Cole, Margaret Fleck, Mark Hasegawa-Johnson

Language differences in the perceptual weight of prominence-lending properties
Bistra Andreeva, William Barry, Magdalena Wolska

Speech Synthesis: Selected Topics

Improving WFST-based G2p conversion with alignment constraints and RNNLM n-best rescoring
Josef R. Novak, Paul R. Dixon, Nobuaki Minematsu, Keikichi Hirose, Chiori Hori, Hideki Kashioka

Expand CRF to model long distance dependencies in prosodic break prediction
Jian Luan, Bolei He, Hairong Xia, Linfang Wang, Daniela Braga, Sheng Zhao

Perceptual foundations for naturalistic variability in the prosody of synthetic speech
Nanette Veilleux, Jonathan Barnes, Alejna Brugos, Stefanie Shattuck-Hufnagel

Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and LVCSR tasks
Stefan Hahn, Paul Vozila, Maximilian Bisani

A simple hybrid acoustic/morphologically-constrained technique for the synthesis of stop consonants in various vocalic contexts
Frédéric Berthommier, Laurent Girin, Louis-Jean Boë

The IIIT-h indic speech databases
Kishore Prahallad, E. Naresh Kumar, Venkatesh Keri, S. Rajendran, Alan W. Black

Detecting acronyms from capital letter sequences in Spanish
Rubén San-Segundo, Juan M. Montero, Verónica López-Ludeña, Simon King

Hidden conditional random fields with M-to-N alignments for grapheme-to-phoneme conversion
Patrick Lehnen, Stefan Hahn, Vlad-Andrei Guta, Hermann Ney

Phrase boundary assignment from text in multiple domains
Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran

Improved prediction of Japanese word accent sandhi using CRF
Nobuaki Minematsu, Shumpei Kobayashi, Shinya Shimizu, Keikichi Hirose

Articulatory VCV synthesis from EMA data
Asterios Toutios, Shinji Maeda

Search papers

Keynote Papers

ASR: Deep Neural Networks I, II

Language Recognition

Communication Disorders and Assistive Technologies

Voice Conversion

Phonetics and Phonology I, II


Language Modeling

Spoken Language Understanding and Dialog I, II

Speaker Trait Challenge I, II (Special Session)

ASR: Noise Robustness

Paralinguistics I-III

Pitch and Harmonic Analysis

Perceptual Learning and Perceptual Cues to Segments and Tones

Speech Synthesis: Prosody

Speaker Diarization and Age Recognition

ASR: Discriminative Training

Single Channel Speech Enhancement

Conversation and Interaction I, II

Speech Synthesis: Intelligibility

Prosody I, II

Speech Analysis

Dialog Systems

Speech and Language Technologies for STEM (Special Session)

ASR: Bayesian Modeling

Computer Assisted Language Learning I, II

Speech Analysis and Modeling

Language Learning and Cross-Language Production and Perception

Enhancement and Coding

Speech Synthesis: Adaptation

Search and Decoding

Analysis of Spoken Disorders in Health Applications I, II (Special Session)

Dynamic Decoding

Speaker Recognition I-III

Development of Speech Production and Perception

HMM Synthesis I, II

ASR: Robust Modeling

ASR: Robust Features I, II

Rich Transcription I, II


Degraded Speech and Enhancement

Source Separation and Computational Auditory Scene Analysis

Glottal Source Processing: from Analysis to Applications (Special Session)

Language Modeling: New Models and Features

Speaker Verification

Speech Intelligibility in Quiet and in Noise

Audio Analysis, Estimation and Classification

Adaptation for ASR

Robust Speech Recognition I, II

Speech Tools and Systems Demo (Special Session)

Adaptation & Robust Modeling

Multi-Channel Speech Enhancement

Voice Activity Detection

Perception and Production

Language and Accent Recognition

Voice Search and Spoken Document Retrieval I, II

Sparse, Template-Based Representations

Speaker Diarization

Speech Production: Imaging and Models

Speech Synthesis

Speech and Speaker Segmentation

Spoken Language Understanding

Spoken Language Applications

Prosodic Prominence: Annotation, Prediction, Applications (Special Session)

Spoken Term and Unseen Word Detection

Speech and Age Differences

Acoustic Classification

Speech Synthesis: Selected Topics

New Trends in Vowel Nasalization: The Articulation of Nasal Vowels (Special Session)