ISCA Archive Interspeech 2008 Sessions Search Booklet
  ISCA Archive Sessions Search Booklet
×

Click on column names to sort.

Searching uses the 'and' of terms e.g. Smith Interspeech matches all papers by Smith in any Interspeech. The order of terms is not significant.

Use double quotes for exact phrasal matches e.g. "acoustic features".

Case is ignored.

Diacritics are optional e.g. lefevre also matches lefèvre (but not vice versa).

It can be useful to turn off spell-checking for the search box in your browser preferences.

If you prefer to scroll rather than page, increase the number in the show entries dropdown.

top

Interspeech 2008

Brisbane, Australia
22-26 September 2008

General Chair: Dennis Burnham
doi: 10.21437/Interspeech.2008
ISSN: 2958-1796





Acoustic Activity Detection, Pitch Tracking and Analysis


Statistical speech activity detection based on spatial power distribution for analyses of poster presentations
Kentaro Ishizuka, Shoko Araki, Tatsuya Kawahara

A statistical model-based voice activity detection employing minimum classification error technique
Sang-Ick Kang, Ji-Hyun Song, Kye-Hwan Lee, Yun-Sik Park, Joon-Hyuk Chang

Comparative evaluation of different methods for voice activity detection
Hongfei Ding, Koichi Yamamoto, Masami Akamine

Speech/non-speech segments detection based on chaotic and prosodic features
Soheil Shafiee, Farshad Almasganj, Ayyoob Jafari

Acoustic event classification using a distributed microphone network with a GMM/SVM combined algorithm
Christian Zieger, Maurizio Omologo

Intentional voice command detection for completely hands-free speech interface in home environments
Yasunari Obuchi, Masahito Togami, Takashi Sumiyoshi

Fusion of audio and video modalities for detection of acoustic events
Taras Butko, Andrey Temko, Climent Nadeu, Cristian Canton

DySANA: dynamic speech and noise adaptation for voice activity detection
Ron J. Weiss, Trausti Kristjansson

A comprehensive study on the effects of room reverberation on fundamental frequency estimation
Rico Petrick, Masashi Unoki, Anish Mittal, Carlos Segura, Rüdiger Hoffmann

A hybrid speech signal based algorithm for pitch marking using finite state machines
H. Hussein, M. Wolff, Oliver Jokisch, F. Duckhorn, G. Strecha, Rüdiger Hoffmann

Parameter estimation method of F0 control model for singing voices
Yasunori Ohishi, Hirokazu Kameoka, Kunio Kashino, Kazuya Takeda

An algorithm for multi-pitch tracking in co-channel speech
Srikanth Vishnubhotla, Carol Y. Espy-Wilson

Multipitch tracking using a factorial hidden Markov model
Michael Wohlmayr, Franz Pernkopf

Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping
Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, Yonghong Yan

Crosscorrelation of adjacent spectra enhances fundamental frequency tracking
Philippe Martin


Single- and Multichannel Speech Enhancement I, II


Enhancement of noisy speech recordings via blind source separation
Jiri Malek, Zbynek Koldovsky, Jindrich Zdansky, Jan Nouza

Studies on estimation of the number of sources in blind source separation
Takaaki Ishibashi, Hidetoshi Nakashima, Hiromu Gotanda

Speech enhancement based on hypothesized Wiener filtering
V. Ramasubramanian, Deepak Vijaywargi

Psychoacoustically-motivated adaptive β-order generalized spectral subtraction based on data-driven optimization
Junfeng Li, Hui Jiang, Masato Akagi

Two stage iterative Wiener filtering for speech enhancement
Krishna Nand K, T. V. Sreenivas

Assessment of correlation between objective measures and speech recognition performance in the evaluation of speech enhancement
Pei Ding, Jie Hao

Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement
James G. Lyons, Kuldip K. Paliwal

A long state vector kalman filter for speech enhancement
Stephen So, Kuldip K. Paliwal

Subspace based speech enhancement using Gaussian mixture model
Achintya Kundu, Saikat Chatterjee, T. V. Sreenivas

Generalized parametric spectral subtraction using weighted Euclidean distortion
Amit Das, John H. L. Hansen

Sudden noise reduction based on GMM with noise power estimation
Nobuyuki Miyake, Tetsuya Takiguchi, Yasuo Ariki

Speech enhancement using a wiener denoising technique and musical noise reduction
Md. Jahangir Alam, Sid-Ahmed Selouani, Douglas O'Shaughnessy, Sofia Ben Jebara

Regularized non-negative matrix factorization with temporal dependencies for speech denoising
Kevin W. Wilson, Bhiksha Raj, Paris Smaragdis

ICA-based MAP speech enhancement with multiple variable speech distribution models
Xin Zou, Peter Jančovič, Munevver Kokuer, Martin J. Russell

Source separation based on binaural cues and source model constraints
Ron J. Weiss, Michael I. Mandel, Daniel P. W. Ellis

Maximum kurtosis beamforming with the generalized sidelobe canceller
Kenichi Kumatani, John McDonough, Barbara Rauch, Philip N. Garner, Weifeng Li, John Dines

Noise robust speech dereverberation using constrained inverse filter
Ken'ichi Furuya, Akitoshi Kataoka, Yoichi Haneda

A dual microphone coherence based method for speech enhancement in headsets
Mohsen Rahmani, Ahmad Akbari, Beghdad Ayad

Sound capture system and spatial filter for small devices
Ivan Tashev, Slavy Mihov, Tyler Gleghorn, Alex Acero

An effective microphone array post-filter in arbitrary environments
Ning Cheng, Wen-ju Liu, Peng Li, Bo Xu

Localization of multiple sound sources based on inter-channel correlation using a distributed microphone system
Kook Cho, Hajime Okumura, Takanobu Nishiura, Yoichi Yamashita

A frequency domain approach for speech enhancement with directionality using compact microphone array
Heng Zhang, Qiang Fu, Yonghong Yan


Spoken Language Systems I, II


Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems
Kazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okuno

Expanding vocabulary for recognizing user's abbreviations of proper nouns without increasing ASR error rates in spoken dialogue systems
Masaki Katsumaru, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Exploiting the ASR n-best by tracking multiple dialog state hypotheses
Jason D. Williams

A spoken language interpretation component for a robot dialogue system
Enes Makalic, Ingrid Zukerman, Michael Niemann

MUESLI: multiple utterance error correction for a spoken language interface
Federico Cesari, Horacio Franco, Gregory K. Myers, Harry Bratt

Methods to optimize transcription of on-line media
Sarah Conrod, Sara Basson, Dimitri Kanevsky

Discrimination of task-related words for vocabulary design of spoken dialog systems
Akinori Ito, Toyomi Meguro, Shozo Makino, Motoyuki Suzuki

Dialog management using weighted finite-state transducers
Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura

Probabilistic answer selection based on conditional random fields for spoken dialog system
Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Let's go lab: a platform for evaluation of spoken dialog systems with real world users
Maxine Eskenazi, Alan W. Black, Antoine Raux, Brian Langner

The impact of language dynamics on the capitalization of broadcast news
Fernando Batista, Nuno Mamede, Isabel Trancoso

Lightly supervised acoustic model training on EPPS recordings
Matthias Paulik, Alex Waibel

Fast call-classification system development without in-domain training data
Christophe Servan, Frédéric Bechet

iCNC and iROVER: the limits of improving system combination with classification?
Björn Hoffmeister, Ralf Schlüter, Hermann Ney

System combination for spoken language understanding
Stefan Hahn, Patrick Lehnen, Hermann Ney

Question and answer database optimization using speech recognition results
Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

Development and evaluation of hands-free spoken dialogue system for railway station guidance
Hiroshi Saruwatari, Yu Takahashi, Hiroyuki Sakai, Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Kiyohiro Shikano

Statistical shared plan-based dialog management
Amanda J. Stent, Srinivas Bangalore

When calls go wrong: how to detect problematic calls based on log-files and emotions?
Ota Herm, Alexander Schmitt, Jackson Liscombe

Unsupervised learning of edit parameters for matching name variants
Dan Gillick, Dilek Hakkani-Tür, Michael Levit

Detection of repetitions in spontaneous speech in dialogue sessions
Mert Cevik, Fuliang Weng, Chin-Hui Lee

Automatic customer feedback processing: alarm detection in open question spoken messages
Nathalie Camelin, Geraldine Damnati, Frédéric Bechet, Renato De Mori

Minimal training based semantic categorization in a voice activated question answering (VAQA) system
Mithun Balakrishna, Marta Tatu, Dan Moldovan

User study of the Bayesian update of dialogue state approach to dialogue management
B. Thomson, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, K. Yu, Steve Young

Extensibility verification of robust domain selection against out-of-grammar utterances in multi-domain spoken dialogue system
Satoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno, Hiroshi G. Okuno

Improving large scale alphanumeric string recognition using redundant information
Ea-Ee Jan, Osamuyimen Stewart, Raymond Co, David Lubensky

SPRAAK: an open source "SPeech recognition and automatic annotation kit"
Kris Demuynck, Jan Roelens, Dirk Van Compernolle, Patrick Wambacq

Preliminary evaluation of speech/sound recognition for telemedicine application in a real environment
Michel Vacher, Anthony Fleury, Jean-François Serignat, Norbert Noury, Hubert Glasson

Mobidic - a mobile dictation and notetaking application
Markku Turunen, Aleksi Melto, Anssi Kainulainen, Jaakko Hakulinen

Automatic speech recognition for scientific purposes - webASR
Thomas Hain, Asmaa El Hannani, Stuart N. Wrigley, Vincent Wan

Evaluation of a live broadcast news subtitling system for portuguese
Hugo Meinedo, Marcio Viveiros, Joao Neto


Emotion and Expression I, II


Multidimensional features of emotional speech
Tomoko Suzuki, Machiko Ikemoto, Tomoko Sano, Toshihiko Kinoshita

Leveraging emotion detection using emotions from yes-no answers
Narjes Boufaden, Pierre Dumouchel

Vowel placement during operatic singing: 'come si parla' or 'aggiustamento'?
Thomas J. Millhouse, Dianna T. Kenny

Study on strained rough voice as a conveyer of rage
Yumiko O. Kato, Yoshifumi Hirose, Takahiro Kamai

Integrating rule and template-based approaches for emotional Malay speech synthesis
Mumtaz Begum, Raja N. Ainon, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles

The expression and perception of emotions: comparing assessments of self versus others
Carlos Busso, Shrikanth S. Narayanan

On the role of acting skills for the collection of simulated emotional speech
Emiel Krahmer, Marc Swerts

Detection of security related affect and behaviour in passenger transport
Björn Schuller, Matthias Wimmer, Dejan Arsic, Tobias Moosmayr, Gerhard Rigoll

Emotions and articulatory precision
Martijn Goudbeek, Jean Philippe Goldman, Klaus R. Scherer

Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data
Khiet P. Truong, Mark A. Neerincx, David A. van Leeuwen

Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems
Yoshiko Arimoto, Hiromi Kawatsu, Sumio Ohno, Hitoshi Iida

Assigning suitable phrasal tones and pitch accents by sensing affective information from text to synthesize human-like speech
Mostafa Al Masum Shaikh, Md. Khademul Islam Molla, Keikichi Hirose

Cross-language study of vocal correlates of affective states
Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl

Gender-related differences in the production and perception of emotion
Marc Swerts, Emiel Krahmer


Automatic Speech Recognition: Acoustic Models I-III


Soft margin estimation with various separation levels for LVCSR
Jinyu Li, Zhi-Jie Yan, Chin-Hui Lee, Ren-Hua Wang

On the equivalence of Gaussian and log-linear HMMs
Georg Heigold, Patrick Lehnen, Ralf Schlüter, Hermann Ney

Generalization of extended baum-welch parameter estimation for discriminative training and decoding
Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo

An ellipsoid constrained quadratic programming perspective to discriminative training of HMMs
Peng Liu, Frank K. Soong

Discriminative training of variable-parameter HMMs for noise robust speech recognition
Dong Yu, Li Deng, Yifan Gong, Alex Acero

Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation
Jasha Droppo, Michael L. Seltzer, Alex Acero, Yu-Hsiang Bosco Chiu

A shrinkage estimator for speech recognition with full covariance HMMs
Peter Bell, Simon King

Covariance updates for discriminative training by constrained line search
Peter Bell, Simon King

Min-max discriminative training of decoding parameters using iterative linear programming
Brian Mak, Tom Ko

Discriminative training for complementariness in system combination
Daniel Willett, Chuang He

Penalty function maximization for large margin HMM training
George Saon, Daniel Povey

Implicit state-tying for support vector machines based speech recognition
Daniel Bolaños, Wayne Ward

Using KL-based acoustic models in a large vocabulary recognition task
Guillermo Aradilla, Hervé Bourlard, Mathew Magimai Doss

Acoustic modeling based on model structure annealing for speech recognition
Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition
Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Speech recognition using soft decision trees
Jitendra Ajmera, Masami Akamine

GPU-accelerated Gaussian clustering for fMPE discriminative training
Yu Shi, Frank Seide, Frank K. Soong

Discriminative training using the trusted expectation maximization
Yasser Hifny, Yuqing Gao

Maximum mutual information estimation with unlabeled data for phonetic classification
Jui-Ting Huang, Mark Hasegawa-Johnson

Maximum accept and reject (MARS) training of HMM-GMM speech recognition systems
Vivek Tyagi

Nonlinear mixture autoregressive hidden Markov models for speech recognition
Sundar Srinivasan, Tao Ma, Daniel May, Georgios Lazarou, Joseph Picone

GPU accelerated acoustic likelihood computations
Patrick Cardinal, Pierre Dumouchel, Gilles Boulianne, Michel Comeau

Nonnative speech recognition based on state-candidate bilingual model modification
Qingqing Zhang, Ta Li, Jielin Pan, Yonghong Yan

Prosodic and spectral features within segment-based acoustic modeling
Björn Schuller, Xiaohua Zhang, Gerhard Rigoll

Unsupervised versus supervised training of acoustic models
Jeff Ma, Richard Schwartz

A comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition
Tara N. Sainath, Victor Zue

Aggregated cross-validation and its efficient application to Gaussian mixture optimization
Takahiro Shinozaki, Sadaoki Furui, Tatsuya Kawahara

A minimum classification error based distance measure for template based speech recognition
Mike Matton, Dirk Van Compernolle, Ronald Cools

A penalized logistic regression approach to detection based phone classification
Sabato Marco Siniscalchi, Torbjørn Svendsen, Chin-Hui Lee

Incorporating acoustical modelling of phone transitions in an hybrid ANN/HMM speech recognizer
Alberto Abad, João Neto

Flexible discriminative training based on equal error group scores obtained from an error-indexed forward-backward algorithm
Erik McDermott, Atsushi Nakamura

Pitch adaptive features for LVCSR
Giulia Garau, Steve Renals

Using syllable nuclei locations to improve automatic speech recognition in the presence of burst noise
Chris D. Bartels, Jeff A. Bilmes

Effects of allophones on the performance of Korean speech recognition
Hyejin Hong, Sunhee Kim, Minhwa Chung

Combining evidence from a generative and a discriminative model in phoneme recognition
Joel Pinto, Hynek Hermansky

Fragmented context-dependent syllable acoustic models
K. Thambiratnam, Frank Seide

Speech recognition using non-linear trajectories in a formant-based articulatory layer of a multiple-level segmental HMM
Hongwei Hu, Martin J. Russell

Recent improvements of the RWTH GALE Mandarin LVCSR system
Ch. Plahl, Björn Hoffmeister, M.-Y. Hwang, D. Lu, Georg Heigold, Jonas Loof, Ralf Schlüter, Hermann Ney

Using prosody for the improvement of ASR - sentence modality recognition
Klára Vicsi, György Szaszák





Perception, Production, Discourse and Dialog


Recognizing and modelling regional varieties of Swedish
Jonas Beskow, Gösta Bruce, Laura Enflo, Björn Granström, Susanne Schötz

Vowel duration, compression and lengthening in stressed syllables in central and southern varieties of standard Italian
John Hajek, Mary Stevens

Acoustic cues for the perception of intonation in Cantonese
Joan K.-Y. Ma, Valter Ciocca, Tara L. Whitehill

Perception of dialectal prosody
Adrian Leemann, Beat Siebenhaar

Does the Mcgurk effect rely on processing time constraints?
Christian Kroos, Ashlie Dreves

Exploring the Uncanny Valley Effect with talking heads
Takaaki Kuratate, Kathryn Ayers, Jeesun Kim, Denis Burnham

How do the elderly talk to a natural language call routing system?
Knut Kvale, Ragnhild Halvorsrud

Analysis of relationship between impression of human-to-human conversations and prosodic change and its modeling
Ryota Nishimura, Norihide Kitaoka, Seiichi Nakagawa

Utterance-level normalization for relative articulation rate analysis
Tuomo Saarni, Jussi Hakokari, Jouni Isoaho, Tapio Salakoski

Syntactic complexity induces explicit grounding in the Maptask corpus
Martin Tietze, Vera Demberg, Johanna D. Moore

Do discourse cues facilitate recall in information presentation messages?
Andi Winterboer, Johanna D. Moore, Fernanda Ferreira

Structured heterogeneity of English stress variants
Noriko Hattori

A method for automatically estimating F0 model parameters and a speech re-synthesis tool using F0 model and STRAIGHT
Shota Sato, Taro Kimura, Yasuo Horiuchi, Masafumi Nishida, Shingo Kuroiwa, Akira Ichikawa



Speech Synthesis Methods I, II


Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge
Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi, Ren-Hua Wang

Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis
Yi-Jian Wu, Keiichi Tokuda

Robustness of HMM-based speech synthesis
Junichi Yamagishi, Zhen-Hua Ling, Simon King

Improving preselection in unit selection synthesis
Alistair Conkie, Ann Syrdal, Yeon-Jun Kim, Mark Beutnagel

Efficient join cost computation for unit selection based TTS systems
Feng Ding, Jani Nurminen, Jilei Tian

A phonetic assessment of cross-language voice conversion
Kayoko Yanagisawa, Mark Huckvale

Synthesis by generation and concatenation of multiform segments
Vincent Pollet, Andrew Breen

Glottal spectral separation for parametric speech synthesis
João P. Cabral, Steve Renals, Korin Richmond, Junichi Yamagishi

Improving speech systems built from very little data
John Kominek, Sameer Badaskar, Tanja Schultz, Alan W. Black

Structure to speech conversion - speech generation based on infant-like vocal imitation
Daisuke Saito, Satoshi Asakawa, Nobuaki Minematsu, Keikichi Hirose

Statistical text-to-speech synthesis with improved dynamics
Stas Tiomkin, David Malah

An evaluation of non-standard features for grapheme-to-phoneme conversion
Gabriel Webster, Norbert Braunschweiler

Towards flexible speech coding for speech synthesis: an LF + modulated noise vocoder
Yannis Agiomyrgiannakis, Olivier Rosec

Evaluation of Finnish unit selection and HMM-based speech synthesis
Hanna Silen, Elina Helander, Jani Nurminen, Moncef Gabbouj

A probabilistic trajectory synthesis system for synthesising visual speech
Barry-John Theobald, Nicholas Wilkinson

Paralinguistic elements in speech synthesis
Didier Cadic, Lionel Segalen

Building sleek synthesizers for multi-lingual screen reader
E Veera Raghavendra, B. Yegnanarayana, Alan W. Black, Kishore Prahallad

Unsupervised adaptation for HMM-based speech synthesis
Simon King, Keiichi Tokuda, Heiga Zen, Junichi Yamagishi

Investigating festival's target cost function using perceptual experiments
Volker Strom, Simon King

Modeling Austrian dialect varieties for TTS
Friedrich Neubarth, Michael Pucher, Christian Kranzler

HMM-based Finnish text-to-speech system utilizing glottal inverse filtering
Tuomo Raitio, Antti Suni, Hannu Pulakka, Martti Vainio, Paavo Alku

LTS using decision forest of regression trees and neural networks
Tanuja Sarkar, Sachin Joshi, Sathish Chandra Pammi, Kishore Prahallad

Automatic word stress marking and syllabification for Catalan TTS
Silvia Rustullet, Daniela Braga, João Nogueira, Miguel Sales Dias




Special Session: Auditory-Inspired Spectro-Temporal Features I, II


Modulation spectrogram features for improved speaker diarization
Oriol Vinyals, Gerald Friedland

Spectro-temporal features for robust far-field speaker identification
Tiago H. Falk, Wai-Yip Chan

Long-term spectro-temporal information for improved automatic speech emotion classification
Siqing Wu, Tiago H. Falk, Wai-Yip Chan

A comparative study on AM and FM features
Yotaro Kubo, Shigeki Okawa, Akira Kurematsu, Katsuhiko Shirai

Dimensionality reduction of modulation frequency features for speech discrimination
Maria Markaki, Yannis Stylianou

Spectral envelope recovery beyond the nyquist limit for high-quality manipulation of speech sounds
Hideki Kawahara, Masanori Morise, Hideki Banno, Toru Takahashi, Ryuichi Nisimura, Toshio Irino

Adaptive-order fractional Fourier transform features for speech recognition
Hui Yin, Xiang Xie, Jingming Kuang

Robust front end processing for speech recognition in reverberant environments: utilization of speech characteristics
Rico Petrick, Xugang Lu, Masashi Unoki, Masato Akagi, Rüdiger Hoffmann

Introducing temporal asymmetries in feature extraction for automatic speech recognition
G. S. V. S. Sivaram, Hynek Hermansky

A closer look on hierarchical spectro-temporal features (HIST)
Martin Heckmann, Xavier Domont, Frank Joublin, Christian Goerick

Multi-stream spectro-temporal features for robust speech recognition
Sherry Y. Zhao, Nelson Morgan

The value of auditory offset adaptation and appropriate acoustic modeling
Huan Wang, David Gelbart, Hans-Günter Hirsch, Werner Hemmert

Optimization and evaluation of Gabor feature sets for ASR
Bernd T. Meyer, Birger Kollmeier


Speech Coding, Quality Measurement and Auditory Modelling


High-quality analysis/synthesis method based on temporal decomposition for speech modification
Binh Phu Nguyen, Takeshi Shibata, Masato Akagi

Improved frame loss recovery using closed-loop estimation of very low bit rate side information
Philippe Gournay

Predictability of STRFs in auditory cortex neurons depends on stimulus class
Max F. K. Happel, Simon Müller, Jörn Anemüller, Frank W. Ohl

Higher layer coding of non-speech like signals using factorial pulse codebook
Udar Mittal, James P. Ashley, Jonathan Gibbs

Spectral noise shaping: improvements in speech/audio codec based on linear prediction in spectral domain
Sriram Ganapathy, Petr Motlicek, Hynek Hermansky, Harinath Garudadri

Introducing the compression wave cochlear amplifier
Matthew R. Flax, W. Harvey Holmes

Goldman-hodgkin-katz cochlear hair cell models - a foundation for nonlinear cochlear mechanics
Matthew R. Flax, W. Harvey Holmes

A 8.32 kb/s embedded wideband speech coding candidate for ITU-t EV-VBR standardization
Changchun Bao, Hai-ting Li, Ze-xin Liu, Rui Fan, Heng Zhu, Mao-shen Jia, Rui Li

Decision tree based frame mode selection for AMR-WB+
Jong Kyu Kim, Seung Seop Park, Chang Woo Han, Nam Soo Kim

Assessment of objective quality measures for speech intelligibility
W. M. Liu, K. A. Jellyman, N. W. D. Evans, John S. D. Mason

Assessment of the speech-quality dimension "noisiness" for the instrumental estimation and analysis of telephone-band speech quality
Kirstin Scholz, Christine Kühnel, Marcel Waltermann, Sebastian Möller, Ulrich Heute

Intelligibility evaluation of Ramsey-derived interleavers for internet voice streaming with the iLBC codec
Angel M. Gomez, Jose L. Carmona, Antonio M. Peinado, Victoria Sanchez, Jose A. Gonzalez


Accent and Language Recognition


Language identification on code-switching utterances using multiple cues
Dau-Cheng Lyu, Ren-Yuan Lyu

Target-oriented phone selection from universal phone set for spoken language recognition
Rong Tong, Bin Ma, Haizhou Li, Eng Siong Chng

The MITLL NIST LRE 2007 language recognition system
Pedro A. Torres-Carrasquillo, Elliot Singer, W. M. Campbell, Terry Gleason, Alan McCree, Douglas A. Reynolds, Fred Richardson, Wade Shen, Douglas E. Sturim

Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition
Pedro A. Torres-Carrasquillo, Douglas E. Sturim, Douglas A. Reynolds, Alan McCree

Anchor-model fusion for language recognition
Ignacio Lopez-Moreno, Daniel Ramos, Joaquin Gonzalez-Rodriguez, Doroteo T. Toledano

Introducing a FM based feature to hierarchical language identification
Bo Yin, Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Fang Chen

Dialect classification via discriminative training
Yun Lei, John H. L. Hansen

BUT language recognition system for NIST 2007 evaluations
Pavel Matějka, Lukáš Burget, Ondřej Glembek, Petr Schwarz, Valiantsina Hubeika, Michal Fapšo, Tomáš Mikolov, Oldřich Plchot, Jan Černocký

Advances in phonotactic language recognition
Ondřej Glembek, Pavel Matějka, Lukáš Burget, Tomáš Mikolov

Dialect separation assessment using log-likelihood score distributions
Mahnoosh Mehrabani, John H. L. Hansen

Study on unique pharyngeal and uvular consonants in foreign accented Arabic
Yousef A. Alotaibi, Khondaker Abdullah-Al-Mamun, Ghulam Muhammad

Automatic accent classification using ensemble methods
Fukun Bi, Jian Yang, Dan Xu

Foreign accent identification based on prosodic parameters
Marina Piat, Dominique Fohr, Irina Illina

Dialect recognition using adapted phonetic models
Wade Shen, Nancy Chen, Douglas A. Reynolds

Beyond frame independence: parametric modelling of time duration in speaker and language recognition
Alan McCree, Fred Richardson, Elliot Singer, Douglas A. Reynolds


Prosody: Prosodic Structure, Paralinguistic, Non-linguistic and Other Cues


Testing a large corpus of natural standard Arabic for rhythm class
Liz Dockendorf, Dalal Almubayei, Matthew Benton

A comparison of two acoustic measurement approaches to the rhythm continuum of natural Chinese and English speech
Matthew Benton, Liz Dockendorf

A study of pitch patterns of Japanese English analyzed via comparative linguistic features of English and Japanese
Tomoko Nariai, Kazuyo Tanaka

A corpus-based prosodic study of Alsatian, Belgian and Swiss French
Cécile Woehrling, Philippe Boula de Mareüil, Martine Adda-Decker, Lori Lamel

Prosodic position effects and function words in English: a pilot study
Mitsuhiro Nakamura

How useful are polynomials for analyzing intonation?
Laura E. de Ruiter

Adaptive filter based prosody modification approach
Qingcai Chen, Shusen Zhou, Dandan Wang, Xiaohong Yang

Speech/laughter classification in meeting audio
Swe Zin Kalayar Khine, Tin Lay Nwe, Haizhou Li

Getting the last laugh: automatic laughter segmentation in meetings
Mary Tai Knox, Nelson Morgan, Nikki Mirghafori

The influence of audio presentation style on multitasking during teleconferences
Stuart N. Wrigley, Simon Tucker, Guy J. Brown, Steve Whittaker

Balancing spoken content adaptation and unit length in the recognition of emotion and interest
Bogdan Vlasenko, Björn Schuller, Kinfe Tadesse Mengistu, Gerhard Rigoll, Andreas Wendemuth

Nonverbal responses to social inclusion and exclusion
Emiel Krahmer, Juliette Schaafsma, Marc Swerts, Ad Vingerhoets

Acoustic analysis of imitated voice produced by a professional impersonator
Tatsuya Kitamura

Detection of speech under physical stress: model development, sensor selection, and feature fusion
Sanjay A. Patil, John H. L. Hansen


Automatic Speech Recognition: Language Models I, II


Improving Japanese language models using POS information
Langzhou Chen, Hisayoshi Nagae, Matt Stuttle

Discriminative n-gram language modeling for Turkish
Ebru Arısoy, Brian Roark, Izhak Shafran, Murat Saraçlar

Rich morphology based n-gram language models for Arabic
Ahmad Emami, Imed Zitouni, Lidia Mangu

Unsupervised language model adaptation based on topic and role information in multiparty meetings
Songfang Huang, Steve Renals

Context dependent language model adaptation
X. Liu, M. J. F. Gales, P. C. Woodland

Iterative language model estimation: efficient data structure & algorithms
Bo-June Hsu, James Glass

Evaluating spoken language model based on filler prediction model in speech recognition
Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa

Transcription-less call routing using unsupervised language model adaptation
Nicolae Duta

Large margin multinomial mixture model for text categorization
Zhen-Yu Pan, Hui Jiang

Language modeling for speech recognition of spoken Cantonese
Yu Ting Yeung, Houwei Cao, N. H. Zheng, Tan Lee, P. C. Ching

Discriminative rescoring based on minimization of word errors for transcribing broadcast news
Akio Kobayashi, Takahiro Oku, Shinichi Homma, Shoei Sato, Toru Imai, Tohru Takagi

Search and classification based language model adaptation
Qin Shi, Stephen M. Chu, Wen Liu, Hong-Kwang Jeff Kuo, Yi Liu, Yong Qin

Fast n-gram language model look-ahead for decoders with static pronunciation prefix trees
Marijn Huijbregts, Roeland Ordelman, Franciska de Jong

Thai named-entity recognition using class-based language modeling on multiple-sized subword units
Kwanchiva Saykhum, Vataya Boonpiam, Nattanun Thatphithakkul, Chai Wutiwiwatchai, Cholwich Natthee

Combining statistical and syntactical systems for spoken language understanding with graphical models
S. Schwarzler, J. Geiger, J. Schenk, M. Al-Hames, B. Hornler, Günther Ruske, Gerhard Rigoll

Bag-of-word normalized n-gram models
Abhinav Sethy, Bhuvana Ramabhadran

A study of unsupervised clustering techniques for language modeling
Sangyun Hahn, Abhinav Sethy, Hong-Kwang Jeff Kuo, Bhuvana Ramabhadran

Automatic estimation of language model parameters for unseen words using morpho-syntactic contextual information
Ciro Martins, António Teixeira, João Neto

Modeling the effects on time-into-utterance on word probabilities
Nigel G. Ward, Alejandro Vega

Inductive and example-based learning for text classification
Ye-Yi Wang, Xiao Li, Alex Acero

Comparing word, character, and phoneme n-grams for subjective utterance recognition
Theresa Wilson, Stephan Raaijmakers

IRSTLM: an open source toolkit for handling large scale language models
Marcello Federico, Nicola Bertoldi, Mauro Cettolo




Robust Automatic Speech Recognition I-III


CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments
Masato Nakayama, Takanobu Nishiura, Yuki Denda, Norihide Kitaoka, Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

In-car speech recognition using model-based wiener filter and multi-condition training
Masanori Tsujikawa, Takayuki Arakawa, Ryosuke Isotani

Adaptive beamforming and soft missing data decoding for robust speech recognition in reverberant environments
Marco Kühne, Roberto Togneri, Sven Nordholm

Spectral subtraction in likelihood-maximizing framework for robust speech recognition
Bagher BabaAli, Hossein Sameti, Mehran Safayani

Front-end for far-field speech recognition based on frequency domain linear prediction
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky

Mask estimation incorporating time-frequency trajectories for a CASA-based ASR front-end
Ji Hun Park, Jae Sam Yoon, Hong Kook Kim

Soft missing-feature mask generation for simultaneous speech recognition system in robots
Toru Takahashi, Shun'ichi Yamamoto, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

A posterior approach for microphone array based speech recognition
Dong Wang, Ivan Himawan, Joe Frankel, Simon King

Analysis of physiologically-motivated signal processing for robust speech recognition
Yu-Hsiang Bosco Chiu, Richard M. Stern

Evaluation of modulation spectrum equalization techniques for large vocabulary robust speech recognition
Liang-che Sun, Chang-wen Hsu, Lin-shan Lee

Confusion-based entropy-weighted decoding for robust speech recognition
Yi Chen, Chia-yu Wan, Lin-shan Lee

Cepstral domain voice activity detection for improved noise modeling in MMSE feature enhancement for ASR
Svein Gunnar Pettersen, Magne Hallstein Johnsen

Unsupervised re-scoring of observation probability based on maximum entropy criterion by using confidence measure with telephone speech
Carlos Molina, Nestor Becerra Yoma, Fernando Huenupan, Claudio Garreton

Within-class feature normalization for robust speech recognition
Yuan-Fu Liao, Chi-Hui Hsu, Chi-Min Yang, Jeng-Shien Lin, Sen-Chia Chang

A posteriori SNR weighted energy based variable frame rate analysis for speech recognition
Zheng-Hua Tan, Børge Lindberg

Silence feature normalization for robust speech recognition in additive noise environments
Chieh-cheng Wang, Chi-an Pan, Jeih-weih Hung

Blind dereverberation based on CMN and spectral subtraction by multi-channel LMS algorithm
L. Wang, Seiichi Nakagawa, Norihide Kitaoka

Eigen-MLLR environment/speaker compensation for robust speech recognition
Yuan-Fu Liao, Hung-Hsiang Fang, Chi-Hui Hsu

Parameter clustering and sharing in variable-parameter HMMs for noise robust speech recognition
Dong Yu, Li Deng, Yifan Gong, Alex Acero

A feature compensation approach using high-order vector taylor series approximation of an explicit distortion model for noisy speech recognition
Jun Du, Qiang Huo

N-best based stochastic mapping on stereo HMM for noise robust speech recognition
Xiaodong Cui, Mohamed Afify, Yuqing Gao

Improving the ensemble speaker and speaking environment modeling approach by enhancing the precision of the online estimation process
Yu Tsao, Chin-Hui Lee

Combining noise compensation and missing-feature decoding for large vocabulary speech recognition in noise
Jianhua Lu, Ji Ming, Roger Woods

Joint Bayesian predictive classification and parallel model combination with prior scaling for robust ASR
Svein Gunnar Pettersen

Environment mismatch compensation using average eigenspace for speech recognition
Abhishek Kumar, John H. L. Hansen

Monte Carlo model-space noise adaptation for speech recognition
Daniel Povey, Brian Kingsbury

A 'speechiness' measure to improve speech decoding in the presence of other sound sources
Ning Ma, Phil Green

Feature vector normalization with combined standard and throat microphones for robust ASR
Luis Buera, Antonio Miguel, Oscar Saz, Alfonso Ortega, Eduardo Lleida

Phone-duration-dependent long-term dynamic features for a stochastic model-based voice activity detection
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura

An on-line adaptation technique for emotional speech recognition using style estimation with multiple-regression HMM
Yusuke Ijima, Makoto Tachibana, Takashi Nose, Takao Kobayashi

HMM adaptation using statistical linear approximation for robust automatic speech recognition
Michael Berkovitch, Ilan D. Shallom

Beyond linear transforms: efficient non-linear dynamic adaptation for noise robust speech recognition
Steven J. Rennie, Pierre L. Dognin

Rapid unsupervised speaker adaptation robust in reverberant environment conditions
Randy Gomez, Jani Even, Kiyohiro Shikano

On a generalization of margin-based discriminative training to robust speech recognition
Jinyu Li, Chin-Hui Lee

Discriminative classifiers with generative kernels for noise robust ASR
M. J. F. Gales, C. Longworth

Covariance modelling for noise-robust speech recognition
R. C. van Dalen, M. J. F. Gales

Exploiting spatial-temporal feature distribution characteristics for robust speech recognition
Wei-Hau Chen, Shih-Hsiang Lin, Berlin Chen

Study of integration of statistical model-based voice activity detection and noise suppression
Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani

Neural network based regression for robust overlapping speech recognition using microphone arrays
Weifeng Li, John Dines, Mathew Magimai Doss, Hervé Bourlard


Speech Analysis and Processing, Voice Conversion and Modification


Amplitude and amplitude variation of emotional speech
Hartmut R. Pfitzinger, Christian Kaernbach

Babble speech: acoustic and perceptual variability
Nitish Krishnamurthy, Ayako Ikeno, John H. L. Hansen

On the properties of a time-varying quasi-harmonic model of speech
Yannis Pantazis, Olivier Rosec, Yannis Stylianou

Extraction and tracking of formant response jitter in the cochlea for objective prediction of SB/SF DAM attributes
Wenliang Lu, D. Sen

Consonant discrimination of degraded speech using an efferent-inspired closed-loop cochlear model
David P. Messing, Lorraine Delhorne, Ed Bruckert, Louis D. Braida, Oded Ghitza

On the development of variable length Teager energy operator (VTEO)
Vikrant Tomar, Hemant A. Patil

Metric learning for unsupervised phoneme segmentation
Yu Qiao, Nobuaki Minematsu

Combining task-dependent information with auditory attention cues for prominence detection in speech
Ozlem Kalinli, Shrikanth S. Narayanan

Probabilistic feature mapping based on trajectory HMMs
Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda

Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching
Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Keiichi Tokuda

Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

An improved one-to-many eigenvoice conversion system
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Study on manipulation method of voice quality based on the vocal tract area function
Yoshinori Uchimura, Hideki Banno, Fumitada Itakura, Hideki Kawahara

Incorporating durational modification in voice transformation
Arthur Toth, Alan W. Black

Non-segmental duration feature extraction for prosodic classification
Amy Dashiell, Brian Hutchinson, Anna Margolis, Mari Ostendorf


Special Session: Tonality in Production and Perception, Language in Australia and New Zealand


An ERP study on categorical perception of lexical tones and nonspeech pitches
Hong-Ying Zheng, William S.-Y. Wang

The role of Japanese pitch accent in spoken-word recognition: evidence from middle-aged accentless dialect listeners
Takashi Otake, Marii Higuchi

Mandarin Chinese tone nucleus detection with landmarks
Siwei Wang, Gina-Anne Levow

A comparative study on dissyllabic stress patterns of Mandarin and Cantonese
Weixiang Hu, Jin Jian, Aijun Li, Xia Wang

Three-sectional-staff characterization of Cantonese level tones
Rerrario Shui-Ching Ho, Yoshinori Sagisaka

A seven-tone dialect in southern China with falling-rising-falling contour: a linguistic acoustic analysis
Xiaonong Zhu, Caicai Zhang

Pitch target analysis of Thai tones using quantitative target approximation model and unsupervised clustering
Santitham Prom-on

Do English speakers assimilate Mandarin tones to English prosodic categories?
Connie K. So, Catherine T. Best

Evidence of a near-merger in western sydney australian English vowels
Rikke L. Bundgaard-Nielsen, Catherine T. Best, Michael D. Tyler, Christian Kroos

Central vowels in Arrernte: metrical prominence and pitch accent
Marija Tabain, Kristine Rickard, Gavan Breen, Veronica Dobson

Pausing and phrase length in two australian languages
Bella Ross

Positional effects on the characterization of ejectives in Waima'a
Mary Stevens, John Hajek

A Niuean variant of New Zealand English?
Donna Starks, Laura Thompson, Catherine I. Watson





Special Session: Prosody of Spontaneous Speech I, II


Quantitative prosodic analysis of spontaneous speech
Hansjörg Mixdorff

The effect of cognitive load on disfluencies during in-vehicle spoken dialogue
Anders Lindström, Jessica Villing, Staffan Larsson, Alexander Seward, Nina Åberg, Cecilia Holtelius

Discourse prosody context - global F0 and tempo modulations
Chiu-yu Tseng, Zhao-yu Su

A method for automatic and dynamic estimation of discourse genre typology with prosodic features
Nicolas Obin, Anne Lacheret-Dujour, Christophe Veaux, Xavier Rodet, Anne-Catherine Simon

The meanings carried by interjections in spontaneous speech
Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita

Speech interaction with an emotional robotic dog
Christian M. Jones, Andrew Deeming

Control of prosodic focus in corpus-based generation of fundamental frequency based on the generation process model
Keiko Ochi, Keikichi Hirose, Nobuaki Minematsu

Analysis and perception of speech under physical task stress
Keith W. Godin, John H. L. Hansen

An analysis of multimodal cues of interruption in dyadic spoken interactions
Chi-Chun Lee, Sungbok Lee, Shrikanth S. Narayanan

Paralinguistic effects on turn-taking behavior in expressive conversation
Hiroki Mori, Hideki Kasuya

Study on "ng, a" type of discourse markers in standard Chinese
Zhigang Yin, Aijun Li, Ziyu Xiong

How can you use disfluencies and still sound as a good speaker?
Helena Moniz, Ana Isabel Mata, Isabel Trancoso, M. Ceu Viana

What makes a good speaker? subject ratings, acoustic measurements and perceptual evaluations
Eva Strangert, Joakim Gustafson

Towards measuring continuous acoustic feature convergence in unconstrained spoken dialogues
Spyros Kousidis, David Dorran, Yi Wang, Brian Vaughan, Charlie Cullen, Dermot Campbell, Ciaran McDonnell, Eugene Coyle

Detection of feeling through back-channels in spoken dialogue
Tatsuya Kawahara, Masayoshi Toyokura, Teruhisa Misu, Chiori Hori


Automatic Speech Recognition: Adaptation I, II


Discrimininative training of narrow band - wide band adapted systems for meeting recognition
Martin Karafiát, Lukáš Burget, Thomas Hain, Jan Černocký

A fast speaker adaptation method using aspect model
Seongjun Hahm, Akinori Ito, Shozo Makino, Motoyuki Suzuki

Probabilistic latent speaker training for large vocabulary speech recognition
Dan Su, Xihong Wu, Huisheng Chi

Improvement of eigenvoice-based speaker adaptation by parameter space clustering
Shutaro Tanji, Koichi Shinoda, Sadaoki Furui, Antonio Ortega

Study of jacobian compensation using linear transformation of conventional MFCC for VTLN
D. R. Sanand, S. Umesh

Adaptive HMM topology for speech recognition
Chuan-Wei Ting, Kuo-Yuan Lee, Jen-Tzung Chien

Minimum phone error discriminative training for Mandarin Chinese speaker adaptation
Liang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang

Fast speaker adaptive training for speech recognition
Daniel Povey, Hong-Kwang Jeff Kuo, Hagen Soltau

Adaptive training using discriminative mapping transforms
C. K. Raut, K. Yu, M. J. F. Gales

Speaker adaptive training using shift-MLLR
Jonas Loof, Christian Gollan, Hermann Ney

XMLLR for improved speaker adaptation in speech recognition
Daniel Povey, Hong-Kwang Jeff Kuo

Effective acoustic adaptation for a distant-talking interactive TV system
Jing Huang, Mark Epstein, Marco Matassoni

A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics
P. T. Akhil, S. P. Rath, S. Umesh, D. R. Sanand

A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation
Shizhen Wang, Steven M. Lulich, Abeer Alwan


Features for Speech and Speaker Recognition


Speaker identification for whispered speech based on frequency warping and score competition
Xing Fan, John H. L. Hansen

Experimental evaluation of multi-band position-pitch estimation (m-popi) algorithm for multi-speaker localization
Tania Habib, Lukas Ottowitz, Marián Képesi

Features for automatic detection of voice bars in continuous speech
N. Dhananjaya, S. Rajendran, B. Yegnanarayana

Speaker orientation estimation based on hybridation of GCC-PHAT and HLBR
Carlos Segura, Alberto Abad, Javier Hernando, Climent Nadeu

Parallel and hierarchical speech feature classification using frame and segment-based methods
Jun Hou, Lawrence R. Rabiner, Sorin Dusan

Automatically learning speaker-independent acoustic subword units
Balakrishnan Varadarajan, Sanjeev Khudanpur

Human-like ears versus two-microphone array, which works better for speaker identification?
Waleed H. Abdulla, Yushi Zhang

Is a speech recognizer useful for characteristic analysis of classroom lecture speech?
Kenji Kobayashi, Mitsuhiro Somiya, Hiromitsu Nishizaki, Yoshihiro Sekiguchi

An intuitive class discriminability measure for feature selection in a speech recognition system
Ladan Golipour, Douglas O'Shaughnessy

f-divergence is a generalized invariant measure between distributions
Yu Qiao, Nobuaki Minematsu

Sparse linear predictors for speech processing
Daniele Giacobello, Mads Græsbøll Christensen, Joachim Dahl, Søren Holdt Jensen, Marc Moonen

Frequency-domain parameter estimations for binary masked signals
J. X. Zhang, Mads Græsbøll Christensen, Joachim Dahl, Søren Holdt Jensen, Marc Moonen

Decomposition of rotational distortion caused by VTL difference using eigenvalues of its transformation matrix
Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose

Region-based vocal tract length normalization for ASR
Michail G. Maragakis, Alexandros Potamianos


Speaker Recognition: Kernel-Based and Session Mismatch


Speaker verification with non-audible murmur segments by combining global alignment kernel and penalized logistic regression machine
Hideki Okamoto, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

Analysis of subspace within-class covariance normalization for SVM-based speaker verification
Liang Lu, Yuan Dong, Xianyu Zhao, Jian Zhao, Chengyu Dong, Haila Wang

Comparison of input and feature space nonlinear kernel nuisance attribute projections for speaker verification
Xianyu Zhao, Yuan Dong, Jian Zhao, Liang Lu, Jiqing Liu, Haila Wang

A generalised derivative kernel for speaker verification
C. Longworth, M. J. F. Gales

Modeling prior belief for speaker verification SVM systems
Luciana Ferrer

Convergence between SVM-based and distance-based paradigms for speaker recognition
Delphine Charlet, Xianyu Zhao, Yuan Dong

High-level speaker verification via articulatory-feature based sequence kernels and SVM
Shi-Xiong Zhang, Man-Wai Mak

Characterizing speech utterances for speaker verification with sequence kernel SVM
Kong-Aik Lee, Changhuai You, Haizhou Li, Tomi Kinnunen, Donglai Zhu

Development of the primary CRIM system for the NIST 2008 speaker recognition evaluation
Patrick Kenny, Najim Dehak, Pierre Ouellet, Vishwa Gupta, Pierre Dumouchel

Making confident speaker verification decisions with minimal speech
Robbie Vogt, Sridha Sridharan, Michael Mason

Parallelized factor analysis and feature normalization for automatic speaker verification
Jun Luo, Cheung-Chi Leung, Marc Ferràs, Claude Barras

Intersession variability in speaker recognition: a behind the scene analysis
Daniel Garcia-Romero, Carol Y. Espy-Wilson

Speaker recognition based on variational Bayesian method
Tatsuya Ito, Kei Hashimoto, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Factor analysis multi-session training constraint in session compensation for speaker verification
Driss Matrouf, Jean-François Bonastre, Salah Eddine Mezaache

The role of 'delta' features in speaker verification
Ying Liu, Martin J. Russell, Michael J. Carey






Automatic Speech Recognition: Features I, II


Group delay function for improved gender identification
Kye-Hwan Lee, Sang-Ick Kang, Ji-Hyun Song, Joon-Hyuk Chang

Frame-synchronous and local confidence measures for on-the-fly automatic speech recognition
Joseph Razik, Odile Mella, Dominique Fohr, Jean-Paul Haton

Hilbert envelope based spectro-temporal features for phoneme recognition in telephone speech
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky

Evidence of coarticulation in a phonological feature detection system
Abhijeet Sangwan, Ayako Ikeno, John H. L. Hansen

Phoneme recognition based on hybrid neural networks with inhibition/enhancement of distinctive phonetic feature (DPF) trajectories
Mohammad Nurul Huda, Kouichi Katsurada, Tsuneo Nitta

A neural network based nonlinear feature transformation for speech recognition
Hongbing Hu, Stephen A. Zahorian

Significance of group delay based acoustic features in the linguistic search space for robust speech recognition
R. Ramya, Rajesh M. Hegde, Hema A. Murthy

Genetic programming based optimization of class-dependent PCA for extracting robust MFCC
Houman Abbasian, Babak Nasersharif, Ahmad Akbari

Comparison of AM-FM based features for robust speech recognition
K. V. S. Narayana, T. V. Sreenivas

Growing bottleneck features for tandem ASR
Joe Frankel, Dong Wang, Simon King

Landmark based recognition of stops: acoustic attributes versus smoothed spectra
Veena Karjigi, Preeti Rao

Speech recognition performance of CJLC: corpus of Japanese lecture contents
Satoru Kogure, Hiromitsu Nishizaki, Masatoshi Tsuchiya, Kazumasa Yamamoto, Shingo Togashi, Seiichi Nakagawa

On the combination of auditory and modulation frequency channels for ASR applications
Fabio Valente, Hynek Hermansky

Tandem processing of fepstrum features
Vivek Tyagi

Data-driven clustered hierarchical tandem system for LVCSR
Shuo-Yiin Chang, Lin-shan Lee

Linear discriminant feature extraction using weighted classification confusion information
Hung-Shin Lee, Berlin Chen

Use of spectral centre of gravity for generating speaker invariant features for automatic speech recognition
D. R. Sanand, V. Balaji, Rani R. Sandhya, S. Umesh

Short- and long-term dynamic features for robust speech recognition
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura


Speech Resources and Technology Evaluation


Multi-modal recording, analysis and indexing of poster sessions
Tatsuya Kawahara, Hisao Setoguchi, Katsuya Takanashi, Kentaro Ishizuka, Shoko Araki

Automatic pitch-synchronous phonetic segmentation
Jindřich Matoušek, Jan Romportl

Two protocols comparing human and machine phonetic recognition performance in conversational speech
Wade Shen, Joseph Olive, Douglas Jones

Analysis of drivers' speech in a car environment
Tomoyuki Kato, Jun Okamoto, Makoto Shozakai

Preparing a corpus of dutch spontaneous dialogues for automatic phonetic analysis
Barbara Schuppler, Mirjam Ernestus, Odette Scharenborg, Lou Boves

Evaluation of voice activity and voicing detection
Bojan Kotnik, Pierre Sendorek, Sergey Astrov, Turgay Koc, Tolga Ciloglu, Laura Docío Fernández, Eduardo Rodríguez Banga, Harald Höge, Zdravko Kačič

Wikispeech - a content management system for speech databases
Christoph Draxler, Klaus Jänsch

Development and evaluation of Polish speech corpus for unit selection speech synthesis systems
Grazyna Demenko, J. Bachan, Bernd Möbius, K. Klessa, M. Szymański, Stefan Grocholewski

A data format enabling interoperation of speech recognition, translation and information extraction engines: the GALE type system
John F. Pitrelli, Burn L. Lewis, Edward A. Epstein, Jerome L. Quinn, Ganesh Ramaswamy

A rank-predicted pseudo-greedy approach to efficient text selection from large-scale corpus for maximum coverage of target units
Wei Li, Qiang Huo

Memo workbench for semi-automated usability testing
Klaus-Peter Engelbrecht, Michael Kruppa, Sebastian Möller, Michael Quade

MDS-based visualization method for multiple speech corpora
Kimiko Yamakawa, Tomoko Matsui, Shuichi Itahashi

Scripted dialogs versus improvisation: lessons learned about emotional elicitation techniques from the IEMOCAP database
Carlos Busso, Shrikanth S. Narayanan


Applications in Education and Learning I, II


Automatic pronunciation evaluation and classification
Om D. Deshmukh, Sachindra Joshi, Ashish Verma

Pronunciation error detection techniques for children's speech
Daniel Bolanos, Wayne Ward, Barbara Wise, Sarel van Vuuren

Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training
Lan Wang, Xin Feng, Helen M. Meng

Automatic children's reading tutor on hand-held devices
Xiaolong Li, Li Deng, Yun-Cheng Ju, Alex Acero

A Japanese CALL system based on dynamic question generation and error prediction for ASR
Hongcui Wang, Tatsuya Kawahara

Estimation of children's reading ability by fusion of automatic pronunciation verification and fluency detection
Matthew Black, Joseph Tepperman, Sungbok Lee, Shrikanth S. Narayanan

Pronunciation verification of English letter-sounds in preliterate children
Matthew Black, Joseph Tepperman, Abe Kazemzadeh, Sungbok Lee, Shrikanth S. Narayanan

Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer
Alissa M. Harrison, Wing Yiu Lau, Helen M. Meng, Lan Wang

DISCO: development and integration of speech technology into courseware for language learning
Catia Cucchiarini, Joost van Doremalen, Helmer Strik

Discriminative model combination and language model selection in a reading tutor for children
Abdurrahman Samir, Jacques Duchateau, Hugo Van hamme

Usability of ASR-based reading training for dyslexics
Jakob Schou Pedersen, Lars Bo Larsen, Børge Lindberg

A browsing system for classroom lecture speech
Shingo Togashi, Seiichi Nakagawa

Automatic pronunciation evaluation of language learners' utterances generated through shadowing
Dean Luo, Naoya Shimomura, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose

Application and evaluation of speech technologies in language learning: experiments with the Saybot player
Sylvain Chevalier, Zhenhai Cao

Forward optimal modeling of acoustic confusions in Mandarin CALL system
Fengpei Ge, Fuping Pan, Changliang Liu, Bin Dong, Yonghong Yan

Recognition of English utterances with grammatical and lexical mistakes for dialogue-based CALL system
Akinori Ito, Ryohei Tsutsui, Shozo Makino, Motoyuki Suzuki





Speaker Recognition: Adverse Conditions and Forensics


Robust far-field speaker identification under mismatched conditions
Qin Jin, Tanja Schultz

Robust speaker verification using short-time frequency with long-time window and fusion of multi-resolutions
Chien-Lin Huang, Bin Ma, Chung-Hsien Wu, Brian Mak, Haizhou Li

Performance improvement of text-independent speaker verification systems based on histogram enhancement in noisy environments
C. H. Kwon, J. K. Choi, Eliathamby Ambikairajah

Filling acoustic holes through leveraged uncorellated GMMs for in-set/out-of-set speaker recognition
Jun-Won Suh, Pongtep Angkititrakul, John H. L. Hansen

Missing-feature method for speaker recognition in band-restricted conditions
Wooil Kim, John H. L. Hansen

Robust speaker identification using cross-correlation GTF-ICA feature
Yushi Zhang, Waleed H. Abdulla

Perceptual speaker identification using monosyllabic stimuli - effects of the nucleus vowels and speaker characteristics contained in nasals
Kanae Amino, Takayuki Arai

Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech
Amitava Das, Gokul Chittaranjan

Usefulness of text-conditioning and a new database for text-dependent speaker recognition research
Amitava Das, Gokul Chittaranjan, Gopala K. Anumanchipalli

Combination method of bone-conduction speech and air-conduction speech for speaker recognition
Satoru Tsuge, Takashi Osanai, Hisanori Makinae, Toshiaki Kamada, Minoru Fukumi, Shingo Kuroiwa

MAP and sub-word level t-norm for text-dependent speaker recognition
Doroteo T. Toledano, Daniel Hernandez-Lopez, Cristina Esteve-Elizalde, Joaquin Gonzalez-Rodriguez, Ruben Fernandez Pozo, Luis Hernandez Gomez

Forensic speaker recognition in Chinese: a multivariate likelihood ratio discrimination on /i/ and /y/
Cuiling Zhang, Geoffrey Stewart Morrison, Philip Rose

How many do we need? exploration of the population size effect on the performance of forensic speaker classification
Shunichi Ishihara, Yuko Kinoshita

Comparing prosodic models for speaker recognition
Cheung-Chi Leung, Marc Ferras, Claude Barras, Jean-Luc Gauvain

Combination of clean and contaminated GMM/SVM for far-field text-independent speaker verification
Christian Zieger, Maurizio Omologo


Phonetics: Development, Learning, Cross-Language and Language-Specific


English word stress as produced by English and dutch speakers: the role of segmental and suprasegmental differences
Bettina Braun, Kristin Lemhofer, Anne Cutler

The strength of stress-related lexical competition depends on the presence of first-syllable stress
Eva Reinisch, Alexandra Jesse, James M. McQueen

Word stress placement by native speakers and Japanese learners of English
Keiichi Ishikawa, Jun Nomura

Schwa variants in american English
H. Timothy Bunnell, Jason Lilley

Covariations of English segmental durations across speakers
Jiahong Yuan

The intelligibility of the English vowel /ʌ/ produced by native speakers of Japanese and its relations to the acoustic characteristics
Akiyo Joto

Rate dependent spectral reduction for voiceless fricatives
Benjamin Weiss

Investigating perception of places of articulation in sign and speech
Stina Ojala, Olli Aaltonen, Tapio Salakoski

Six- and twelve-month-olds' discrimination of native versus non-native between- and within-organ fricative place contrasts
Michael D. Tyler, Catherine T. Best, Louis M. Goldstein, Mark Antoniou, Lidija Krebs-Lazendic

your baby can't hear you: how mothers talk to infants with simulated hearing loss
Christa Lam, Christine Kitamura

Development of communicative skills in 8- to 16-month-old children: a longitudinal study
Eeva Klintfors, Ulla Sundberg, Francisco Lacerda, Ellen Marklund, Lisa Gustavsson, Ulla Bjursäter, Iris-Corinna Schwarz, Göran Söderlund

Vocal imitation in early language acquisition
Lisa Gustavsson, Francisco Lacerda

Computational language acquisition by statistical bottom-up processing
Okko Rasanen, Unto K. Laine, Toomas Altosaar

Lexical analyses of native and non-native English language instructor speech based on a six-month co-taught classroom video corpus
Noriaki Katagiri, Goh Kawai

Perception and production of consonant clusters in Japanese-English bilingual and Japanese monolingual speakers
Hinako Masuda, Takayuki Arai






Speech Synthesis: Prosody and Emotion I, II


Analysis of voice-quality features of speech that expresses 'anger', 'joy', and 'sadness' uttered by radio actors and actresses
Shoichi Takeda, Yuuri Yasuda, Risako Isobe, Shogo Kiryu, Makiko Tsuru

Including pitch accent optionality in unit selection text-to-speech synthesis
Leonardo Badino, Robert A. J. Clark, Volker Strom

Emotion conversion using F0 segment selection
Zeynep Inanoglu, Steve Young

Generating natural F0 trajectory with additive trees
Yao Qian, Hui Liang, Frank K. Soong

Generating intonation from a mixed CART-HMM model for speech synthesis
Cédric Boidin, Olivier Boeffard

Intonation modeling of Mandarin Chinese using a superpositional approach
Pablo Daniel Aguero, Antonio Bonafonte, Lu Yu, Juan Carlos Tulli

Two-stage prosody prediction for emotional text-to-speech synthesis
Hao Tang, Xi Zhou, Matthias Odisio, Mark Hasegawa-Johnson, Thomas S. Huang

Prosody boundary detection through context-dependent position models
Yue-Ning Hu, Min Chu, Chao Huang, Yan-Ning Zhang

Duration refinement by jointly optimizing state and longer unit likelihood
Boyang Gao, Yao Qian, Zhizheng Wu, Frank K. Soong

T-tilt: a modified tilt model for F0 analysis and synthesis in tonal languages
Ausdang Thangthai, Nattanun Thatphithakkul, Chai Wutiwiwatchai, Anocha Rugchatjaroen, Sittipong Saychum

Multilevel parametric-base F0 model for speech synthesis
Javier Latorre, Masami Akamine

On the generation of synthetic disfluent speech: local prosodic modifications caused by the insertion of editing terms
Jordi Adell, Antonio Bonafonte, David Escudero-Mancebo

A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis
Oytun Türk, Marc Schröder

Tree grammars as models of prosodic structure
Joseph Tepperman, Shrikanth S. Narayanan


Language Information Retrieval Systems


Addressing the out-of-vocabulary problem for large-scale Chinese spoken term detection
Sha Meng, Jian Shao, Roger Peng Yu, Jia Liu, Frank Seide

Towards vocabulary-independent speech indexing for large-scale repositories
Jian Shao, Roger Peng Yu, Qingwei Zhao, Yonghong Yan, Frank Seide

Towards the integration of automatic speech recognition and information retrieval for spoken query processing
A. Moreno-Daniel, J. Wilpon, B.-H. Juang, S. Parthasarathy

Reducing the effect of OOV query words by using morph-based spoken document retrieval
Ville T. Turunen

Bayesian latent topic clustering model
Meng-Sung Wu, Jen-Tzung Chien

Spoken document retrieval by translating recognition candidates into correct transcriptions
Tomoyosi Akiba, Yusuke Yokota

Audio indexing for an interactive Italian literature management system
Carlo Drioli, Piero Cosi

Open-vocabulary spoken-document retrieval based on query expansion using related web documents
Makoto Terao, Takafumi Koshinaka, Shinichi Ando, Ryosuke Isotani, Akitoshi Okumura

Discriminative graph training for ultra-fast low-footprint speech indexing
Upendra Chaudhari, Hong-Kwang Jeff Kuo, Brian Kingsbury

A language-modeling approach to inverse text normalization and data cleanup for multimodal voice search applications
Yun-Cheng Ju, Julian Odell

Topic segmentation and indexation in a media watch system
Rui Amaral, Isabel Trancoso

Vocabulary independent discriminative term frequency estimation
J. Scott Olsson

Spoken keyword spotting via multi-lattice alignment
Hui Lin, Alex Stupakov, Jeff A. Bilmes

Robust spoken term detection using combination of phone-based and word-based recognition
Kenji Iwata, Koichi Shinoda, Sadaoki Furui


Applications for the Aged and Handicapped


Language model adaptation for a speech to sign language translation system using web frequencies and a MAP framework
Luis Fernando D'Haro, Ruben San-Segundo, Ricardo de Cordoba, Jan Bungeroth, Daniel Stein, Hermann Ney

Hearing at home - communication support in home environments for hearing impaired persons
Jonas Beskow, Björn Granström, Peter Nordqvist, Samer Al Moubayed, Giampiero Salvi, Tobias Herzke, Arne Schulz

Traveling wave based group delays for cochlear implant speech processing
Daniel A. Taft, David B. Grayden, Anthony N. Burkitt

Multimodal perception of Mandarin tone for cochlear implant users
Damien J. Smith, Denis Burnham

Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments
Keigo Nakamura, Tomoki Toda, Yoshitaka Nakajima, Hiroshi Saruwatari, Kiyohiro Shikano

An acoustic typology of apraxic speech - toward reliable diagnosis
Jacqueline McKechnie, Kirrie J. Ballard, Donald A. Robin, Adam Jacks, Sallyanne Palethorpe, Kristin M. Rosen

Dysphonic voices and the 0-3000 hz frequency band
G. Pouchoulin, C. Fredouille, Jean-François Bonastre, A. Ghio, A. Giovanni

Verifying pronunciation accuracy from speakers with neuromuscular disorders
Shou-Chun Yin, Richard C. Rose, Oscar Saz, Eduardo Lleida

Multi-band and multi-cue analyses of disordered connected speech
A. Alpan, Y. Maryn, F. Grenez, A. Kacha, J. Schoentgen

Combining neural network and rule-based systems for dysarthria diagnosis
James Carmichael, Vincent Wan, Phil Green

Speech as a means of monitoring cognitive function of elderly speakers
Shona D'Arcy, Viliam Rapcan, Nils Penard, Margaret E. Morris, Ian H. Robertson, Richard B. Reilly

Integration of metamodel and acoustic model for speech recognition
Hironori Matsumasa, Tetsuya Takiguchi, Yasuo Ariki, Ichao Li, Toshitaka Nakabayashi

Frequency compression/transposition of fricative consonants for the hearing impaired with high-frequency dead regions
Francisco J. Fraga, Leticia P. Costa S. Prates, Maria Cecilia M. Iorio, Maria Cecilia M. Iorio



Special Session: LIPS 2008 - Visual Speech Synthesis Challenge


LIPS2008: visual speech synthesis challenge
Barry-John Theobald, Sascha Fagel, Gérard Bailly, Frédéric Elisei

Speech-driven lip motion generation with a trajectory HMM
Gregor Hofer, Junichi Yamagishi, Hiroshi Shimodaira

A trainable trajectory formation model TD-HMM parameterized for the LIPS 2008 challenge
Gérard Bailly, Oxana Govokhina, Gaspard Breton, Frédéric Elisei, Christophe Savariaux

Comparing text-driven and speech-driven visual speech synthesisers
Barry-John Theobald, Gavin Cawley, Andrew Bangham, Iain Matthews, Nicholas Wilkinson

Automatic lip synchronization by speech signal analysis
Goranka Zoric, Aleksandra Cerekovic, Igor S. Pandzic

MASSY speaks English: adaptation and evaluation of a talking head
Sascha Fagel

From 3-d speaker cloning to text-to-audiovisual-speech
Sascha Fagel, Frédéric Elisei, Gérard Bailly

A development of Czech talking head
Zdeněk Krňoul, Miloš Železný

Realistic facial animation system for interactive services
Kang Liu, Joern Ostermann

Speech-driven 3d facial animation for mobile entertainment
Juan Yan, Xiang Xie, Hao Hu

A real-time text to audio-visual speech synthesis system
Lijuan Wang, Xiaojun Qian, Lei Ma, Yao Qian, Yining Chen, Frank K. Soong

Spoken language translation systems ************ ASR word lattice translation with exhaustive reordering is possible
Evgeny Matusov, Björn Hoffmeister, Hermann Ney

Development of SRI's translation systems for broadcast news and broadcast conversations
Jing Zheng, Wen Wang, Necip Fazil Ayan

Machine translation in continuous space
Ruhi Sarikaya, Yonggang Deng, Mohamed Afify, Brian Kingsbury, Yuqing Gao

Discovering phrases in machine translation by simulated annealing
Caroline Lavecchia, David Langlois, Kamel Smaïli

Towards domain independence in machine aided human translation
Aarthi Reddy, Richard C. Rose

Class-based statistical machine translation for field maintainable speech-to-speech translation
Ian R. Lane, Alex Waibel




Speech, Music, Audio Segmentation and Classification


Robust speaker change detection using Kernel-Gaussian model
Jie Gao, Xiang Zhang, Qingwei Zhao, Yonghong Yan

A comparative study in automatic recognition of broadcast audio
Stavros Ntalampiras, Nikos Fakotakis

Joint time-frequency segmentation for transient decomposition
Charturong Tantibundhit, Gernot Kubin

Language and genre detection in audio content analysis
Vikramjit Mitra, Daniel Garcia-Romero, Carol Y. Espy-Wilson

An entropy based feature for whisper-island detection within audio streams
Chi Zhang, John H. L. Hansen

Two step speaker segmentation method using Bayesian information criterion and adapted Gaussian mixtures models
Matej Grašič, Marko Kos, Andrej Žgank, Zdravko Kačič

Domain-specific classification methods for disfluency detection
Sebastian Germesin, Tilman Becker, Peter Poller

Multi-speaker meeting audio segmentation
Tin Lay Nwe, Minghui Dong, Swe Zin Kalayar Khine, Haizhou Li

Rhythm based music segmentation and octave scale cepstral features for sung language recognition
Namunu C. Maddage, Haizhou Li

Robust voiced/unvoiced speech classification using empirical mode decomposition and periodic correlation model
Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu

A combination of data mining method with decision trees building for speech/music discrimination
Qiong Wu, Qin Yan, Jun Wang, Jun Hong

Advertisement detection in French broadcast news using acoustic repetition and Gaussian mixture models
Vishwa Gupta, Gilles Boulianne, Patrick Kenny, Pierre Dumouchel

A hybrid SVM/MCE training approach for vector space topic identification of spoken audio recordings
Timothy J. Hazen, Fred Richardson

Training audio events detectors with a sound effects corpus
Isabel Trancoso, Jose Portelo, Miguel Bugalho, João Neto, Antonio Serralheiro







Cross-Lingual and Multilingual Automatic Speech Recognition, Speech Translation


Czech-to-slovak adapted broadcast news transcription system
Jan Nouza, Jan Silovsky, Jindrich Zdansky, Petr Cerva, Martin Kroul, Josef Chaloupka

Continuous phone recognition without target language training data
Dau-Cheng Lyu, Sabato Marco Siniscalchi, Tae-Yoon Kim, Chin-Hui Lee

An investigation of acoustic models for multilingual code-switching
Christopher M. White, Sanjeev Khudanpur, James K. Baker

Cross-lingual portability of MLP-based tandem features - a case study for English and Hungarian
Lászlá Tóth, Joe Frankel, Gábor Gosztolya, Simon King

Seed models combination and state level mappings of cross-lingual transfer for rapid HMM development: from English to Mandarin
Xufang Zhao, Douglas O'Shaughnessy

Multi-accent and accent-independent non-native speech recognition
Ghazi Bouselmi, Dominique Fohr, Irina Illina

Cross-lingual sentence extraction for information distillation
Adish Kumar Singla, Dilek Hakkani-Tür

On the use of a multilingual neural network front-end
Stefano Scanzio, Pietro Laface, Luciano Fissore, Roberto Gemello, Franco Mana

Context-sensitive probabilistic phone mapping model for cross-lingual speech recognition
Khe Chai Sim, Haizhou Li

A non-acoustic approach to crosslingual speech recognition performance prediction
Chen Liu, Lynette Melnar

Factored translation models for enriching spoken language translation with prosody
Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore, Shrikanth S. Narayanan

Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation
Holger Schwenk, Yannick Esteve

Strategies for building a Farsi-English SMT system from limited resources
Andreas Kathol, Jing Zheng

Stream decoding for simultaneous spoken language translation
Muntsin Kolss, Stephan Vogel, Alex Waibel

Towards unsupervised training of the classifier-based speech translator
Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan

Aggregating distributed STT, MT, and information extraction engines: the GALE interoperability-demo system
John F. Pitrelli, Burn L. Lewis, Edward A. Epstein, Martin Franz, Daniel Kiecza, Jerome L. Quinn, Ganesh Ramaswamy, Amit Srivastava, Paola Virga



Human Speech Production and Speech Perception


An analysis of vocal tract shaping in English sibilant fricatives using real-time magnetic resonance imaging
Erik Bresch, Daylen Riggs, Louis M. Goldstein, Dani Byrd, Sungbok Lee, Shrikanth S. Narayanan

Science workshop with sliding vocal-tract model
Takayuki Arai

Segmentation cues in lexical identification and in lexical acquisition: same or different?
Odile Bagou, Ulrich H. Frauenfelder

Phonological representations in poor readers
Cecile Kuijpers, Louis ten Bosch

To what extent does tagged-MRI technique allow to infer tongue muscles' activation pattern? a modelling study
Stephanie Buchaillard, Pascal Perrier, Yohan Payan

Feature adaptation of hearing-impaired lip shapes: the vowel case in the cued speech context
Noureddine Aboutabit, Denis Beautemps, Olivier Mathieu, Laurent Besacier

Automatic detection of the context of acoustic landmark deletion
Nanette Veilleux, Stefanie Shattuck-Hufnagel

Aspects of pharyngealized phonemes in Arabic using articulography
Slim Ouni

The effect of spectral tilt on infants' discrimination of fricatives
Elizabeth Beach, Christine Kitamura, Harvey Dillon, Teresa Ching, Denis Burnham

look at the shark: evaluation of student produced standardized sentences of infant- and foreigner-directed speech
Monja Knoll, Lisa Scharrer

Vocal tract inversion by cepstral analysis-by-synthesis using chain matrices
Sankaran Panchapagesan, Abeer Alwan

DC-constrained linear prediction for glottal inverse filtering
Paavo Alku, Carlo Magi, Tom Bäckström

Voicing influences the saliency of place of articulation in audio-visual speech perception in babble
Magnus Alm, Dawn Behne

Correspondence of perception and production boundaries between single and geminate stops in Japanese
Shigeaki Amano, Yukari Hirata

Inhibitory processes of Chinese spoken word recognition
Michael C. W. Yip


Search papers
Article
×

Keynote Sessions

Segmentation and Classification

Speech Coding

Human Conversation and Communication

OzPhon08 - Phonetics and Phonology of Australian Aboriginal Languages (Special Session)

Acoustic Activity Detection, Pitch Tracking and Analysis

Single- and Multichannel Speech Enhancement I, II

Spoken Language Systems I, II

Emotion and Expression I, II

Automatic Speech Recognition: Acoustic Models I-III

Accent and Language Identification

Special Session: PANZE 2008 - Phonetics and Phonology of Australian and New Zealand English

Speaker Recognition and Diarisation

Perception, Production, Discourse and Dialog

Single-Channel Speech Enhancement

Speech Synthesis Methods I, II

Speaking Style and Emotion Recognition

Special Session: Cross-Linguistic and Developmental Issues in the Perception and Production of Lexical Tone

Special Session: Auditory-Inspired Spectro-Temporal Features I, II

Speech Coding, Quality Measurement and Auditory Modelling

Accent and Language Recognition

Prosody: Prosodic Structure, Paralinguistic, Non-linguistic and Other Cues

Automatic Speech Recognition: Language Models I, II

Speaker Identification and Verification

Prosodic Structure and Processing

Robust Automatic Speech Recognition I-III

Speech Analysis and Processing, Voice Conversion and Modification

Special Session: Tonality in Production and Perception, Language in Australia and New Zealand

Automatic Speech Recognition: Tone Languages

Spoken Dialogue Systems

Cross-Language and Language-Specific Phonetics

Special Session: Prosody of Spontaneous Speech I, II

Automatic Speech Recognition: Adaptation I, II

Features for Speech and Speaker Recognition

Speaker Recognition: Kernel-Based and Session Mismatch

Broadcast Transcription Systems

Voice Conversion and Modification

Phonetics: General

Special Session: Forensic Speaker Recognition - Traditional and Automatic Approaches

Automatic Speech Recognition: Features I, II

Speech Resources and Technology Evaluation

Applications in Education and Learning I, II

Speech Pathologies

Special Session: Consonant Challenge . Human-Machine Comparisons of Consonant Recognition in Noise

Automatic Speech Recognition: Lexical and Prosodic Models

Speaker Recognition: Adverse Conditions and Forensics

Phonetics: Development, Learning, Cross-Language and Language-Specific

Multimodal Signal Processing

-Speech Perception

Evaluation and Standardisation of Spoken-Language Technology

Automatic Speech Recognition: Search Methods

Speech Synthesis: Prosody and Emotion I, II

Language Information Retrieval Systems

Applications for the Aged and Handicapped

Human Speech Production

Special Session: LIPS 2008 - Visual Speech Synthesis Challenge

Spoken Language: Parsing and Summarisation

Multimodal Interfaces

Speech, Music, Audio Segmentation and Classification

Automatic Speech Recognition: New Paradigms

Speech and Acoustic Activity Detection

Speech Analysis and Processing

Special Session: Talking Heads and Pronunciation Training

Multimodal Speech Processing

Cross-Lingual and Multilingual Automatic Speech Recognition, Speech Translation

Expression, Emotion and Personality Recognition

Human Speech Production and Speech Perception