doi: 10.21437/Interspeech.2008
ISSN: 2958-1796
In search of models in speech communication research
Hiroya Fujisaki
Dealing with limited and noisy data in ASR: a hybrid knowledge-based and statistical approach
Abeer Alwan
Forensic automatic speaker recognition: fiction or science?
Joaquin Gonzalez-Rodriguez
Modelling rapport in embodied conversational agents
Justine Cassell
Agglomerative hierarchical speaker clustering using incremental Gaussian mixture cluster modeling
Kyu J. Han, Shrikanth S. Narayanan
Weighted segmental k-means initialization for SOM-based speaker clustering
Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman
Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering
Shajith Ikbal, Karthik Visweswariah
Two's a crowd: improving speaker diarization by automatically identifying and excluding overlapped speech
Kofi Boakye, Oriol Vinyals, Gerald Friedland
T-test distance and clustering criterion for speaker diarization
Trung Hieu Nguyen, Eng Siong Chng, Haizhou Li
Integration of TDOA features in information bottleneck framework for fast speaker diarization
Deepu Vijayasenan, Fabio Valente, Hervé Bourlard
Low complexity near-optimal unit-selection algorithm for ultra low bit-rate speech coding based on n-best lattice and Viterbi search
V. Ramasubramanian, D. Harish
A new fast algebraic fixed codebook search algorithm in CELP speech coding
Vaclav Eksler, Redwan Salami, Milan Jelinek
A novel transcoding algorithm between 3GPP AMR-NB (7.95kbit/s) and ITU-t g.729a (8kbit/s)
Hao Xu, Changchun Bao
Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech
Amr H. Nour-Eldin, Peter Kabal
A PCM coding noise reduction for ITU-t g.711.1
Jean-Luc Garcia, Claude Marro, Balazs Kövesi
An instrumental measure for end-to-end speech transmission quality based on perceptual dimensions: framework and realization
Marcel Wältermann, Kirstin Scholz, Sebastian Möller, Lu Huo, Alexander Raake, Ulrich Heute
Duration and F0 interval of utterance-final intonation contours in the perception of German sentence modality
Benno Peters, Hartmut R. Pfitzinger
Contrastive utterances make alternatives salient - cross-modal priming evidence
Bettina Braun, Lara Tagliapietra, Anne Cutler
Exploring a mechanism of speech sychronization using auditory delayed experiments
Masato Ishizaki, Yasuharu Den, Senshi Fukashiro
Prosodic manifestations of confidence and uncertainty in spoken language
Heather Pon-Barry
Identifying relevant phrases to summarize decisions in spoken meetings
Raquel Fernandez, Matthew Frampton, John Dowding, Anish Adukuzhiyil, Patrick Ehlen, Stanley Peters
Recovering participant identities in meetings from a probabilistic description of vocal interaction
Kornel Laskowski, Tanja Schultz
Coarticulation in nasal and lateral clusters in Warlpiri
Janet Fletcher, Deborah Loakes, Andrew Butcher
Phonetically prestopped laterals in Australian languages: a preliminary investigation of Warlpiri
Deborah Loakes, Andrew Butcher, Janet Fletcher, Hywel Stoakes
Connected speech processes in Warlpiri
John Ingram, Mary Laughren, Jeff Chapman
Consonant enhancement in Lamalama, an initial-dropping language of Cape York Peninsula, North Queensland
Christina Pentland
Text, rhythm and metrical form in an Aboriginal song series
Myfany Turpin
Statistical speech activity detection based on spatial power distribution for analyses of poster presentations
Kentaro Ishizuka, Shoko Araki, Tatsuya Kawahara
A statistical model-based voice activity detection employing minimum classification error technique
Sang-Ick Kang, Ji-Hyun Song, Kye-Hwan Lee, Yun-Sik Park, Joon-Hyuk Chang
Comparative evaluation of different methods for voice activity detection
Hongfei Ding, Koichi Yamamoto, Masami Akamine
Speech/non-speech segments detection based on chaotic and prosodic features
Soheil Shafiee, Farshad Almasganj, Ayyoob Jafari
Acoustic event classification using a distributed microphone network with a GMM/SVM combined algorithm
Christian Zieger, Maurizio Omologo
Intentional voice command detection for completely hands-free speech interface in home environments
Yasunari Obuchi, Masahito Togami, Takashi Sumiyoshi
Fusion of audio and video modalities for detection of acoustic events
Taras Butko, Andrey Temko, Climent Nadeu, Cristian Canton
DySANA: dynamic speech and noise adaptation for voice activity detection
Ron J. Weiss, Trausti Kristjansson
A comprehensive study on the effects of room reverberation on fundamental frequency estimation
Rico Petrick, Masashi Unoki, Anish Mittal, Carlos Segura, Rüdiger Hoffmann
A hybrid speech signal based algorithm for pitch marking using finite state machines
H. Hussein, M. Wolff, Oliver Jokisch, F. Duckhorn, G. Strecha, Rüdiger Hoffmann
Parameter estimation method of F0 control model for singing voices
Yasunori Ohishi, Hirokazu Kameoka, Kunio Kashino, Kazuya Takeda
An algorithm for multi-pitch tracking in co-channel speech
Srikanth Vishnubhotla, Carol Y. Espy-Wilson
Multipitch tracking using a factorial hidden Markov model
Michael Wohlmayr, Franz Pernkopf
Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping
Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, Yonghong Yan
Crosscorrelation of adjacent spectra enhances fundamental frequency tracking
Philippe Martin
Enhancement of noisy speech recordings via blind source separation
Jiri Malek, Zbynek Koldovsky, Jindrich Zdansky, Jan Nouza
Studies on estimation of the number of sources in blind source separation
Takaaki Ishibashi, Hidetoshi Nakashima, Hiromu Gotanda
Speech enhancement based on hypothesized Wiener filtering
V. Ramasubramanian, Deepak Vijaywargi
Psychoacoustically-motivated adaptive β-order generalized spectral subtraction based on data-driven optimization
Junfeng Li, Hui Jiang, Masato Akagi
Two stage iterative Wiener filtering for speech enhancement
Krishna Nand K, T. V. Sreenivas
Assessment of correlation between objective measures and speech recognition performance in the evaluation of speech enhancement
Pei Ding, Jie Hao
Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement
James G. Lyons, Kuldip K. Paliwal
A long state vector kalman filter for speech enhancement
Stephen So, Kuldip K. Paliwal
Subspace based speech enhancement using Gaussian mixture model
Achintya Kundu, Saikat Chatterjee, T. V. Sreenivas
Generalized parametric spectral subtraction using weighted Euclidean distortion
Amit Das, John H. L. Hansen
Sudden noise reduction based on GMM with noise power estimation
Nobuyuki Miyake, Tetsuya Takiguchi, Yasuo Ariki
Speech enhancement using a wiener denoising technique and musical noise reduction
Md. Jahangir Alam, Sid-Ahmed Selouani, Douglas O'Shaughnessy, Sofia Ben Jebara
Regularized non-negative matrix factorization with temporal dependencies for speech denoising
Kevin W. Wilson, Bhiksha Raj, Paris Smaragdis
ICA-based MAP speech enhancement with multiple variable speech distribution models
Xin Zou, Peter Jančovič, Munevver Kokuer, Martin J. Russell
Source separation based on binaural cues and source model constraints
Ron J. Weiss, Michael I. Mandel, Daniel P. W. Ellis
Maximum kurtosis beamforming with the generalized sidelobe canceller
Kenichi Kumatani, John McDonough, Barbara Rauch, Philip N. Garner, Weifeng Li, John Dines
Noise robust speech dereverberation using constrained inverse filter
Ken'ichi Furuya, Akitoshi Kataoka, Yoichi Haneda
A dual microphone coherence based method for speech enhancement in headsets
Mohsen Rahmani, Ahmad Akbari, Beghdad Ayad
Sound capture system and spatial filter for small devices
Ivan Tashev, Slavy Mihov, Tyler Gleghorn, Alex Acero
An effective microphone array post-filter in arbitrary environments
Ning Cheng, Wen-ju Liu, Peng Li, Bo Xu
Localization of multiple sound sources based on inter-channel correlation using a distributed microphone system
Kook Cho, Hajime Okumura, Takanobu Nishiura, Yoichi Yamashita
A frequency domain approach for speech enhancement with directionality using compact microphone array
Heng Zhang, Qiang Fu, Yonghong Yan
Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems
Kazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okuno
Expanding vocabulary for recognizing user's abbreviations of proper nouns without increasing ASR error rates in spoken dialogue systems
Masaki Katsumaru, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
Exploiting the ASR n-best by tracking multiple dialog state hypotheses
Jason D. Williams
A spoken language interpretation component for a robot dialogue system
Enes Makalic, Ingrid Zukerman, Michael Niemann
MUESLI: multiple utterance error correction for a spoken language interface
Federico Cesari, Horacio Franco, Gregory K. Myers, Harry Bratt
Methods to optimize transcription of on-line media
Sarah Conrod, Sara Basson, Dimitri Kanevsky
Discrimination of task-related words for vocabulary design of spoken dialog systems
Akinori Ito, Toyomi Meguro, Shozo Makino, Motoyuki Suzuki
Dialog management using weighted finite-state transducers
Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura
Probabilistic answer selection based on conditional random fields for spoken dialog system
Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Let's go lab: a platform for evaluation of spoken dialog systems with real world users
Maxine Eskenazi, Alan W. Black, Antoine Raux, Brian Langner
The impact of language dynamics on the capitalization of broadcast news
Fernando Batista, Nuno Mamede, Isabel Trancoso
Lightly supervised acoustic model training on EPPS recordings
Matthias Paulik, Alex Waibel
Fast call-classification system development without in-domain training data
Christophe Servan, Frédéric Bechet
iCNC and iROVER: the limits of improving system combination with classification?
Björn Hoffmeister, Ralf Schlüter, Hermann Ney
System combination for spoken language understanding
Stefan Hahn, Patrick Lehnen, Hermann Ney
Question and answer database optimization using speech recognition results
Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano
Development and evaluation of hands-free spoken dialogue system for railway station guidance
Hiroshi Saruwatari, Yu Takahashi, Hiroyuki Sakai, Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Kiyohiro Shikano
Statistical shared plan-based dialog management
Amanda J. Stent, Srinivas Bangalore
When calls go wrong: how to detect problematic calls based on log-files and emotions?
Ota Herm, Alexander Schmitt, Jackson Liscombe
Unsupervised learning of edit parameters for matching name variants
Dan Gillick, Dilek Hakkani-Tür, Michael Levit
Detection of repetitions in spontaneous speech in dialogue sessions
Mert Cevik, Fuliang Weng, Chin-Hui Lee
Automatic customer feedback processing: alarm detection in open question spoken messages
Nathalie Camelin, Geraldine Damnati, Frédéric Bechet, Renato De Mori
Minimal training based semantic categorization in a voice activated question answering (VAQA) system
Mithun Balakrishna, Marta Tatu, Dan Moldovan
User study of the Bayesian update of dialogue state approach to dialogue management
B. Thomson, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, K. Yu, Steve Young
Extensibility verification of robust domain selection against out-of-grammar utterances in multi-domain spoken dialogue system
Satoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno, Hiroshi G. Okuno
Improving large scale alphanumeric string recognition using redundant information
Ea-Ee Jan, Osamuyimen Stewart, Raymond Co, David Lubensky
SPRAAK: an open source "SPeech recognition and automatic annotation kit"
Kris Demuynck, Jan Roelens, Dirk Van Compernolle, Patrick Wambacq
Preliminary evaluation of speech/sound recognition for telemedicine application in a real environment
Michel Vacher, Anthony Fleury, Jean-François Serignat, Norbert Noury, Hubert Glasson
Mobidic - a mobile dictation and notetaking application
Markku Turunen, Aleksi Melto, Anssi Kainulainen, Jaakko Hakulinen
Automatic speech recognition for scientific purposes - webASR
Thomas Hain, Asmaa El Hannani, Stuart N. Wrigley, Vincent Wan
Evaluation of a live broadcast news subtitling system for portuguese
Hugo Meinedo, Marcio Viveiros, Joao Neto
Multidimensional features of emotional speech
Tomoko Suzuki, Machiko Ikemoto, Tomoko Sano, Toshihiko Kinoshita
Leveraging emotion detection using emotions from yes-no answers
Narjes Boufaden, Pierre Dumouchel
Vowel placement during operatic singing: 'come si parla' or 'aggiustamento'?
Thomas J. Millhouse, Dianna T. Kenny
Study on strained rough voice as a conveyer of rage
Yumiko O. Kato, Yoshifumi Hirose, Takahiro Kamai
Integrating rule and template-based approaches for emotional Malay speech synthesis
Mumtaz Begum, Raja N. Ainon, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles
The expression and perception of emotions: comparing assessments of self versus others
Carlos Busso, Shrikanth S. Narayanan
On the role of acting skills for the collection of simulated emotional speech
Emiel Krahmer, Marc Swerts
Detection of security related affect and behaviour in passenger transport
Björn Schuller, Matthias Wimmer, Dejan Arsic, Tobias Moosmayr, Gerhard Rigoll
Emotions and articulatory precision
Martijn Goudbeek, Jean Philippe Goldman, Klaus R. Scherer
Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data
Khiet P. Truong, Mark A. Neerincx, David A. van Leeuwen
Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems
Yoshiko Arimoto, Hiromi Kawatsu, Sumio Ohno, Hitoshi Iida
Assigning suitable phrasal tones and pitch accents by sensing affective information from text to synthesize human-like speech
Mostafa Al Masum Shaikh, Md. Khademul Islam Molla, Keikichi Hirose
Cross-language study of vocal correlates of affective states
Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl
Gender-related differences in the production and perception of emotion
Marc Swerts, Emiel Krahmer
Soft margin estimation with various separation levels for LVCSR
Jinyu Li, Zhi-Jie Yan, Chin-Hui Lee, Ren-Hua Wang
On the equivalence of Gaussian and log-linear HMMs
Georg Heigold, Patrick Lehnen, Ralf Schlüter, Hermann Ney
Generalization of extended baum-welch parameter estimation for discriminative training and decoding
Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo
An ellipsoid constrained quadratic programming perspective to discriminative training of HMMs
Peng Liu, Frank K. Soong
Discriminative training of variable-parameter HMMs for noise robust speech recognition
Dong Yu, Li Deng, Yifan Gong, Alex Acero
Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation
Jasha Droppo, Michael L. Seltzer, Alex Acero, Yu-Hsiang Bosco Chiu
A shrinkage estimator for speech recognition with full covariance HMMs
Peter Bell, Simon King
Covariance updates for discriminative training by constrained line search
Peter Bell, Simon King
Min-max discriminative training of decoding parameters using iterative linear programming
Brian Mak, Tom Ko
Discriminative training for complementariness in system combination
Daniel Willett, Chuang He
Penalty function maximization for large margin HMM training
George Saon, Daniel Povey
Implicit state-tying for support vector machines based speech recognition
Daniel Bolaños, Wayne Ward
Using KL-based acoustic models in a large vocabulary recognition task
Guillermo Aradilla, Hervé Bourlard, Mathew Magimai Doss
Acoustic modeling based on model structure annealing for speech recognition
Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition
Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Speech recognition using soft decision trees
Jitendra Ajmera, Masami Akamine
GPU-accelerated Gaussian clustering for fMPE discriminative training
Yu Shi, Frank Seide, Frank K. Soong
Discriminative training using the trusted expectation maximization
Yasser Hifny, Yuqing Gao
Maximum mutual information estimation with unlabeled data for phonetic classification
Jui-Ting Huang, Mark Hasegawa-Johnson
Maximum accept and reject (MARS) training of HMM-GMM speech recognition systems
Vivek Tyagi
Nonlinear mixture autoregressive hidden Markov models for speech recognition
Sundar Srinivasan, Tao Ma, Daniel May, Georgios Lazarou, Joseph Picone
GPU accelerated acoustic likelihood computations
Patrick Cardinal, Pierre Dumouchel, Gilles Boulianne, Michel Comeau
Nonnative speech recognition based on state-candidate bilingual model modification
Qingqing Zhang, Ta Li, Jielin Pan, Yonghong Yan
Prosodic and spectral features within segment-based acoustic modeling
Björn Schuller, Xiaohua Zhang, Gerhard Rigoll
Unsupervised versus supervised training of acoustic models
Jeff Ma, Richard Schwartz
A comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition
Tara N. Sainath, Victor Zue
Aggregated cross-validation and its efficient application to Gaussian mixture optimization
Takahiro Shinozaki, Sadaoki Furui, Tatsuya Kawahara
A minimum classification error based distance measure for template based speech recognition
Mike Matton, Dirk Van Compernolle, Ronald Cools
A penalized logistic regression approach to detection based phone classification
Sabato Marco Siniscalchi, Torbjørn Svendsen, Chin-Hui Lee
Incorporating acoustical modelling of phone transitions in an hybrid ANN/HMM speech recognizer
Alberto Abad, João Neto
Flexible discriminative training based on equal error group scores obtained from an error-indexed forward-backward algorithm
Erik McDermott, Atsushi Nakamura
Pitch adaptive features for LVCSR
Giulia Garau, Steve Renals
Using syllable nuclei locations to improve automatic speech recognition in the presence of burst noise
Chris D. Bartels, Jeff A. Bilmes
Effects of allophones on the performance of Korean speech recognition
Hyejin Hong, Sunhee Kim, Minhwa Chung
Combining evidence from a generative and a discriminative model in phoneme recognition
Joel Pinto, Hynek Hermansky
Fragmented context-dependent syllable acoustic models
K. Thambiratnam, Frank Seide
Speech recognition using non-linear trajectories in a formant-based articulatory layer of a multiple-level segmental HMM
Hongwei Hu, Martin J. Russell
Recent improvements of the RWTH GALE Mandarin LVCSR system
Ch. Plahl, Björn Hoffmeister, M.-Y. Hwang, D. Lu, Georg Heigold, Jonas Loof, Ralf Schlüter, Hermann Ney
Using prosody for the improvement of ASR - sentence modality recognition
Klára Vicsi, György Szaszák
Experiments with the ABI (accents of the british isles) speech corpus
Shona D'Arcy, Martin J. Russell
Politecnico di Torino system for the 2007 NIST language recognition evaluation
Fabio Castaldo, Emanuele Dalmasso, Pietro Laface, Daniele Colibro, Claudio Vair
Discriminative training and channel compensation for acoustic language recognition
Valiantsina Hubeika, Lukáš Burget, Pavel Matějka, Petr Schwarz
Comparison of variable selection methods and classifiers for native accent identification
Tingyao Wu, Peter Karsmakers, Hugo Van hamme, Dirk Van Compernolle
A comparison of subspace feature-domain methods for language recognition
W. M. Campbell, Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Douglas A. Reynolds
Context-dependent phone models and models adaptation for phonotactic language recognition
Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel
The English pronunciation of successive groups of Maori speakers
Catherine I. Watson, Margaret Maclagan, Jeanette King, Ray Harlow
Reversal of short front vowel raising in Australian English
Felicity Cox, Sallyanne Palethorpe
GOOSE on the move: a study of /u/-fronting in Australian news speech
Jennifer Price
The vowels of Australian Aboriginal English
Andrew Butcher, Victoria Anderson
Perception and production of /i:/, /i@/ and /e:/ in australian English
Robert H. Mannell
An expert system in speaker verification task
Zbyněk Zajíc, Lukáš Machlica, Aleš Padrta, Jan Vaněk, Vlasta Radová
Cascading appearance-based features for visual speaker verification
David Dean, Sridha Sridharan, Patrick Lucey
Improved novelty detection for online GMM based speaker diarization
Konstantin Markov, Satoshi Nakamura
Analysis of impostor tests with high scores in NIST-SRE context
Salah Eddine Mezaache, Jean-François Bonastre, Driss Matrouf
Reinforced temporal structure information for embedded utterance-based speaker recognition
Anthony Larcher, Jean-François Bonastre, John S. D. Mason
Fast search for common segments in speech signals for speaker verification
Michael Gerber, Beat Pfister
Audio-visual multilevel fusion for speech and speaker recognition
Girija Chetty, Michael Wagner
Clustering initialization based on spatial information for speaker diarization of meetings
J. Luque, Carlos Segura, Javier Hernando
Recognizing and modelling regional varieties of Swedish
Jonas Beskow, Gösta Bruce, Laura Enflo, Björn Granström, Susanne Schötz
Vowel duration, compression and lengthening in stressed syllables in central and southern varieties of standard Italian
John Hajek, Mary Stevens
Acoustic cues for the perception of intonation in Cantonese
Joan K.-Y. Ma, Valter Ciocca, Tara L. Whitehill
Perception of dialectal prosody
Adrian Leemann, Beat Siebenhaar
Does the Mcgurk effect rely on processing time constraints?
Christian Kroos, Ashlie Dreves
Exploring the Uncanny Valley Effect with talking heads
Takaaki Kuratate, Kathryn Ayers, Jeesun Kim, Denis Burnham
How do the elderly talk to a natural language call routing system?
Knut Kvale, Ragnhild Halvorsrud
Analysis of relationship between impression of human-to-human conversations and prosodic change and its modeling
Ryota Nishimura, Norihide Kitaoka, Seiichi Nakagawa
Utterance-level normalization for relative articulation rate analysis
Tuomo Saarni, Jussi Hakokari, Jouni Isoaho, Tapio Salakoski
Syntactic complexity induces explicit grounding in the Maptask corpus
Martin Tietze, Vera Demberg, Johanna D. Moore
Do discourse cues facilitate recall in information presentation messages?
Andi Winterboer, Johanna D. Moore, Fernanda Ferreira
Structured heterogeneity of English stress variants
Noriko Hattori
A method for automatically estimating F0 model parameters and a speech re-synthesis tool using F0 model and STRAIGHT
Shota Sato, Taro Kimura, Yasuo Horiuchi, Masafumi Nishida, Shingo Kuroiwa, Akira Ichikawa
Noise driven short-time phase spectrum compensation procedure for speech enhancement
Anthony P. Stark, Kamil K. Wojcicki, James G. Lyons, Kuldip K. Paliwal, Kuldip K. Paliwal
A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain
Friedrich Faubel, John McDonough, Dietrich Klakow
Time and frequency dependent amplification for speech intelligibility enhancement in noisy environments
Henk Brouckxon, Werner Verhelst, Bart De Schuymer
A wavelet based speech enhancement method using noise classification and shaping
Mahdi Mohammadi, Behzad Zamani, Babak Nasersharif, Mohsen Rahmani, Ahmad Akbari
Speech enhancement based on novel two-step a priori SNR estimators
Md. Jahangir Alam, Douglas O'Shaughnessy, Sid-Ahmed Selouani
A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions
Jun Du, Qiang Huo
Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge
Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi, Ren-Hua Wang
Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis
Yi-Jian Wu, Keiichi Tokuda
Robustness of HMM-based speech synthesis
Junichi Yamagishi, Zhen-Hua Ling, Simon King
Improving preselection in unit selection synthesis
Alistair Conkie, Ann Syrdal, Yeon-Jun Kim, Mark Beutnagel
Efficient join cost computation for unit selection based TTS systems
Feng Ding, Jani Nurminen, Jilei Tian
A phonetic assessment of cross-language voice conversion
Kayoko Yanagisawa, Mark Huckvale
Synthesis by generation and concatenation of multiform segments
Vincent Pollet, Andrew Breen
Glottal spectral separation for parametric speech synthesis
João P. Cabral, Steve Renals, Korin Richmond, Junichi Yamagishi
Improving speech systems built from very little data
John Kominek, Sameer Badaskar, Tanja Schultz, Alan W. Black
Structure to speech conversion - speech generation based on infant-like vocal imitation
Daisuke Saito, Satoshi Asakawa, Nobuaki Minematsu, Keikichi Hirose
Statistical text-to-speech synthesis with improved dynamics
Stas Tiomkin, David Malah
An evaluation of non-standard features for grapheme-to-phoneme conversion
Gabriel Webster, Norbert Braunschweiler
Towards flexible speech coding for speech synthesis: an LF + modulated noise vocoder
Yannis Agiomyrgiannakis, Olivier Rosec
Evaluation of Finnish unit selection and HMM-based speech synthesis
Hanna Silen, Elina Helander, Jani Nurminen, Moncef Gabbouj
A probabilistic trajectory synthesis system for synthesising visual speech
Barry-John Theobald, Nicholas Wilkinson
Paralinguistic elements in speech synthesis
Didier Cadic, Lionel Segalen
Building sleek synthesizers for multi-lingual screen reader
E Veera Raghavendra, B. Yegnanarayana, Alan W. Black, Kishore Prahallad
Unsupervised adaptation for HMM-based speech synthesis
Simon King, Keiichi Tokuda, Heiga Zen, Junichi Yamagishi
Investigating festival's target cost function using perceptual experiments
Volker Strom, Simon King
Modeling Austrian dialect varieties for TTS
Friedrich Neubarth, Michael Pucher, Christian Kranzler
HMM-based Finnish text-to-speech system utilizing glottal inverse filtering
Tuomo Raitio, Antti Suni, Hannu Pulakka, Martti Vainio, Paavo Alku
LTS using decision forest of regression trees and neural networks
Tanuja Sarkar, Sachin Joshi, Sathish Chandra Pammi, Kishore Prahallad
Automatic word stress marking and syllabification for Catalan TTS
Silvia Rustullet, Daniela Braga, João Nogueira, Miguel Sales Dias
Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies
Martin Wöllmer, Florian Eyben, Stephan Reiter, Björn Schuller, Cate Cox, Ellen Douglas-Cowie, Roddy Cowie
Patterns, prototypes, performance: classifying emotional user states
Dino Seppi, Anton Batliner, Björn Schuller, Stefan Steidl, Thurid Vogt, Johannes Wagner, Laurence Devillers, Laurence Vidrascu, Noam Amir, Vered Aharonson
Recognition of stress in speech using wavelet analysis and Teager energy operator
Ling He, Margaret Lech, Sheeraz Memon, Nicholas Allen
Effects of vocal effort and speaking style on text-independent speaker verification
Elizabeth Shriberg, Martin Graciarena, Harry Bratt, Andreas Kathol, Sachin S. Kajarekar, Huda Jameel, Colleen Richey, Fred Goodman
Robustness of prosodic features to voice imitation
Mireia Farrús, Michael Wagner, Jan Anguita, Javier Hernando
Phonetic and speaker variations in automatic emotion classification
Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps
Infants' native and nonnative tone perception
Karen Mattock
Language experience dependent plasticity for pitch representation in the human brainstem
Ananthanarayan Krishnan, Jackson Gandour, Jayaganesh Swaminathan
Development of tone perception and tone production in Cantonese-learning children aged 2 to 5 years
Valter Ciocca, Vivian W.-K. Ip
Tone hyperarticulation in Cantonese infant-directed speech
Nan Xu, Denis Burnham
Influences on tone in Sepedi, a southern Bantu language
Sabine Zerbian, Etienne Barnard
An acoustic-phonetic comparative analysis of Osaka and Kagoshima Japanese tonal phenomena
Shunichi Ishihara
Modulation spectrogram features for improved speaker diarization
Oriol Vinyals, Gerald Friedland
Spectro-temporal features for robust far-field speaker identification
Tiago H. Falk, Wai-Yip Chan
Long-term spectro-temporal information for improved automatic speech emotion classification
Siqing Wu, Tiago H. Falk, Wai-Yip Chan
A comparative study on AM and FM features
Yotaro Kubo, Shigeki Okawa, Akira Kurematsu, Katsuhiko Shirai
Dimensionality reduction of modulation frequency features for speech discrimination
Maria Markaki, Yannis Stylianou
Spectral envelope recovery beyond the nyquist limit for high-quality manipulation of speech sounds
Hideki Kawahara, Masanori Morise, Hideki Banno, Toru Takahashi, Ryuichi Nisimura, Toshio Irino
Adaptive-order fractional Fourier transform features for speech recognition
Hui Yin, Xiang Xie, Jingming Kuang
Robust front end processing for speech recognition in reverberant environments: utilization of speech characteristics
Rico Petrick, Xugang Lu, Masashi Unoki, Masato Akagi, Rüdiger Hoffmann
Introducing temporal asymmetries in feature extraction for automatic speech recognition
G. S. V. S. Sivaram, Hynek Hermansky
A closer look on hierarchical spectro-temporal features (HIST)
Martin Heckmann, Xavier Domont, Frank Joublin, Christian Goerick
Multi-stream spectro-temporal features for robust speech recognition
Sherry Y. Zhao, Nelson Morgan
The value of auditory offset adaptation and appropriate acoustic modeling
Huan Wang, David Gelbart, Hans-Günter Hirsch, Werner Hemmert
Optimization and evaluation of Gabor feature sets for ASR
Bernd T. Meyer, Birger Kollmeier
High-quality analysis/synthesis method based on temporal decomposition for speech modification
Binh Phu Nguyen, Takeshi Shibata, Masato Akagi
Improved frame loss recovery using closed-loop estimation of very low bit rate side information
Philippe Gournay
Predictability of STRFs in auditory cortex neurons depends on stimulus class
Max F. K. Happel, Simon Müller, Jörn Anemüller, Frank W. Ohl
Higher layer coding of non-speech like signals using factorial pulse codebook
Udar Mittal, James P. Ashley, Jonathan Gibbs
Spectral noise shaping: improvements in speech/audio codec based on linear prediction in spectral domain
Sriram Ganapathy, Petr Motlicek, Hynek Hermansky, Harinath Garudadri
Introducing the compression wave cochlear amplifier
Matthew R. Flax, W. Harvey Holmes
Goldman-hodgkin-katz cochlear hair cell models - a foundation for nonlinear cochlear mechanics
Matthew R. Flax, W. Harvey Holmes
A 8.32 kb/s embedded wideband speech coding candidate for ITU-t EV-VBR standardization
Changchun Bao, Hai-ting Li, Ze-xin Liu, Rui Fan, Heng Zhu, Mao-shen Jia, Rui Li
Decision tree based frame mode selection for AMR-WB+
Jong Kyu Kim, Seung Seop Park, Chang Woo Han, Nam Soo Kim
Assessment of objective quality measures for speech intelligibility
W. M. Liu, K. A. Jellyman, N. W. D. Evans, John S. D. Mason
Assessment of the speech-quality dimension "noisiness" for the instrumental estimation and analysis of telephone-band speech quality
Kirstin Scholz, Christine Kühnel, Marcel Waltermann, Sebastian Möller, Ulrich Heute
Intelligibility evaluation of Ramsey-derived interleavers for internet voice streaming with the iLBC codec
Angel M. Gomez, Jose L. Carmona, Antonio M. Peinado, Victoria Sanchez, Jose A. Gonzalez
Language identification on code-switching utterances using multiple cues
Dau-Cheng Lyu, Ren-Yuan Lyu
Target-oriented phone selection from universal phone set for spoken language recognition
Rong Tong, Bin Ma, Haizhou Li, Eng Siong Chng
The MITLL NIST LRE 2007 language recognition system
Pedro A. Torres-Carrasquillo, Elliot Singer, W. M. Campbell, Terry Gleason, Alan McCree, Douglas A. Reynolds, Fred Richardson, Wade Shen, Douglas E. Sturim
Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition
Pedro A. Torres-Carrasquillo, Douglas E. Sturim, Douglas A. Reynolds, Alan McCree
Anchor-model fusion for language recognition
Ignacio Lopez-Moreno, Daniel Ramos, Joaquin Gonzalez-Rodriguez, Doroteo T. Toledano
Introducing a FM based feature to hierarchical language identification
Bo Yin, Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Fang Chen
Dialect classification via discriminative training
Yun Lei, John H. L. Hansen
BUT language recognition system for NIST 2007 evaluations
Pavel Matějka, Lukáš Burget, Ondřej Glembek, Petr Schwarz, Valiantsina Hubeika, Michal Fapšo, Tomáš Mikolov, Oldřich Plchot, Jan Černocký
Advances in phonotactic language recognition
Ondřej Glembek, Pavel Matějka, Lukáš Burget, Tomáš Mikolov
Dialect separation assessment using log-likelihood score distributions
Mahnoosh Mehrabani, John H. L. Hansen
Study on unique pharyngeal and uvular consonants in foreign accented Arabic
Yousef A. Alotaibi, Khondaker Abdullah-Al-Mamun, Ghulam Muhammad
Automatic accent classification using ensemble methods
Fukun Bi, Jian Yang, Dan Xu
Foreign accent identification based on prosodic parameters
Marina Piat, Dominique Fohr, Irina Illina
Dialect recognition using adapted phonetic models
Wade Shen, Nancy Chen, Douglas A. Reynolds
Beyond frame independence: parametric modelling of time duration in speaker and language recognition
Alan McCree, Fred Richardson, Elliot Singer, Douglas A. Reynolds
Testing a large corpus of natural standard Arabic for rhythm class
Liz Dockendorf, Dalal Almubayei, Matthew Benton
A comparison of two acoustic measurement approaches to the rhythm continuum of natural Chinese and English speech
Matthew Benton, Liz Dockendorf
A study of pitch patterns of Japanese English analyzed via comparative linguistic features of English and Japanese
Tomoko Nariai, Kazuyo Tanaka
A corpus-based prosodic study of Alsatian, Belgian and Swiss French
Cécile Woehrling, Philippe Boula de Mareüil, Martine Adda-Decker, Lori Lamel
Prosodic position effects and function words in English: a pilot study
Mitsuhiro Nakamura
How useful are polynomials for analyzing intonation?
Laura E. de Ruiter
Adaptive filter based prosody modification approach
Qingcai Chen, Shusen Zhou, Dandan Wang, Xiaohong Yang
Speech/laughter classification in meeting audio
Swe Zin Kalayar Khine, Tin Lay Nwe, Haizhou Li
Getting the last laugh: automatic laughter segmentation in meetings
Mary Tai Knox, Nelson Morgan, Nikki Mirghafori
The influence of audio presentation style on multitasking during teleconferences
Stuart N. Wrigley, Simon Tucker, Guy J. Brown, Steve Whittaker
Balancing spoken content adaptation and unit length in the recognition of emotion and interest
Bogdan Vlasenko, Björn Schuller, Kinfe Tadesse Mengistu, Gerhard Rigoll, Andreas Wendemuth
Nonverbal responses to social inclusion and exclusion
Emiel Krahmer, Juliette Schaafsma, Marc Swerts, Ad Vingerhoets
Acoustic analysis of imitated voice produced by a professional impersonator
Tatsuya Kitamura
Detection of speech under physical stress: model development, sensor selection, and feature fusion
Sanjay A. Patil, John H. L. Hansen
Improving Japanese language models using POS information
Langzhou Chen, Hisayoshi Nagae, Matt Stuttle
Discriminative n-gram language modeling for Turkish
Ebru Arısoy, Brian Roark, Izhak Shafran, Murat Saraçlar
Rich morphology based n-gram language models for Arabic
Ahmad Emami, Imed Zitouni, Lidia Mangu
Unsupervised language model adaptation based on topic and role information in multiparty meetings
Songfang Huang, Steve Renals
Context dependent language model adaptation
X. Liu, M. J. F. Gales, P. C. Woodland
Iterative language model estimation: efficient data structure & algorithms
Bo-June Hsu, James Glass
Evaluating spoken language model based on filler prediction model in speech recognition
Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa
Transcription-less call routing using unsupervised language model adaptation
Nicolae Duta
Large margin multinomial mixture model for text categorization
Zhen-Yu Pan, Hui Jiang
Language modeling for speech recognition of spoken Cantonese
Yu Ting Yeung, Houwei Cao, N. H. Zheng, Tan Lee, P. C. Ching
Discriminative rescoring based on minimization of word errors for transcribing broadcast news
Akio Kobayashi, Takahiro Oku, Shinichi Homma, Shoei Sato, Toru Imai, Tohru Takagi
Search and classification based language model adaptation
Qin Shi, Stephen M. Chu, Wen Liu, Hong-Kwang Jeff Kuo, Yi Liu, Yong Qin
Fast n-gram language model look-ahead for decoders with static pronunciation prefix trees
Marijn Huijbregts, Roeland Ordelman, Franciska de Jong
Thai named-entity recognition using class-based language modeling on multiple-sized subword units
Kwanchiva Saykhum, Vataya Boonpiam, Nattanun Thatphithakkul, Chai Wutiwiwatchai, Cholwich Natthee
Combining statistical and syntactical systems for spoken language understanding with graphical models
S. Schwarzler, J. Geiger, J. Schenk, M. Al-Hames, B. Hornler, Günther Ruske, Gerhard Rigoll
Bag-of-word normalized n-gram models
Abhinav Sethy, Bhuvana Ramabhadran
A study of unsupervised clustering techniques for language modeling
Sangyun Hahn, Abhinav Sethy, Hong-Kwang Jeff Kuo, Bhuvana Ramabhadran
Automatic estimation of language model parameters for unseen words using morpho-syntactic contextual information
Ciro Martins, António Teixeira, João Neto
Modeling the effects on time-into-utterance on word probabilities
Nigel G. Ward, Alejandro Vega
Inductive and example-based learning for text classification
Ye-Yi Wang, Xiao Li, Alex Acero
Comparing word, character, and phoneme n-grams for subjective utterance recognition
Theresa Wilson, Stephan Raaijmakers
IRSTLM: an open source toolkit for handling large scale language models
Marcello Federico, Nicola Bertoldi, Mauro Cettolo
Phone-based cepstral polynomial SVM system for speaker recognition
Sachin S. Kajarekar
Using MAP estimation of feature transformation for speaker recognition
Donglai Zhu, Bin Ma, Haizhou Li
Factor analysis subspace estimation for speaker verification with short utterances
Robbie Vogt, Brendan Baker, Sridha Sridharan
Combining continuous progressive model adaptation and factor analysis for speaker verification
Mitchell McLaren, Driss Matrouf, Robbie Vogt, Jean-François Bonastre
Adaptive decision tree-based phone cluster models for speaker clustering
Chia-Hsin Hsieh, Chung-Hsien Wu, Han-Ping Shen
Speaker recognition in two-wire test sessions
Hagai Aronowitz, Yosef A. Solewicz
The effect of position on the realization of second occurrence focus
Jason B. Bishop
Effects of intonational phrase boundaries on pitch-accented syllables in american English
Yen-Liang Shue, Stefanie Shattuck-Hufnagel, Markus Iseli, Sun-Ah Jun, Nanette Veilleux, Abeer Alwan
Examining pitch-accent variability from an exemplar-theoretic perspective
Michael Walsh, Katrin Schweitzer, Bernd Möbius, Hinrich Schütze
Correlation of utterance length and segmental duration in Finnish is questionable
Jussi Hakokari, Tuomo Saarni, Jouni Isoaho, Tapio Salakoski
Different roles of pitch and duration in distinguishing word stress in English
Jiahong Yuan, Stephen Isard, Mark Liberman
Cross-dialect Irish prosody: linguistic constraints on Fujisaki modelling
Maria O'Reilly, Ailbhe Ní Chasaide, Christer Gobl
CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments
Masato Nakayama, Takanobu Nishiura, Yuki Denda, Norihide Kitaoka, Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
In-car speech recognition using model-based wiener filter and multi-condition training
Masanori Tsujikawa, Takayuki Arakawa, Ryosuke Isotani
Adaptive beamforming and soft missing data decoding for robust speech recognition in reverberant environments
Marco Kühne, Roberto Togneri, Sven Nordholm
Spectral subtraction in likelihood-maximizing framework for robust speech recognition
Bagher BabaAli, Hossein Sameti, Mehran Safayani
Front-end for far-field speech recognition based on frequency domain linear prediction
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
Mask estimation incorporating time-frequency trajectories for a CASA-based ASR front-end
Ji Hun Park, Jae Sam Yoon, Hong Kook Kim
Soft missing-feature mask generation for simultaneous speech recognition system in robots
Toru Takahashi, Shun'ichi Yamamoto, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
A posterior approach for microphone array based speech recognition
Dong Wang, Ivan Himawan, Joe Frankel, Simon King
Analysis of physiologically-motivated signal processing for robust speech recognition
Yu-Hsiang Bosco Chiu, Richard M. Stern
Evaluation of modulation spectrum equalization techniques for large vocabulary robust speech recognition
Liang-che Sun, Chang-wen Hsu, Lin-shan Lee
Confusion-based entropy-weighted decoding for robust speech recognition
Yi Chen, Chia-yu Wan, Lin-shan Lee
Cepstral domain voice activity detection for improved noise modeling in MMSE feature enhancement for ASR
Svein Gunnar Pettersen, Magne Hallstein Johnsen
Unsupervised re-scoring of observation probability based on maximum entropy criterion by using confidence measure with telephone speech
Carlos Molina, Nestor Becerra Yoma, Fernando Huenupan, Claudio Garreton
Within-class feature normalization for robust speech recognition
Yuan-Fu Liao, Chi-Hui Hsu, Chi-Min Yang, Jeng-Shien Lin, Sen-Chia Chang
A posteriori SNR weighted energy based variable frame rate analysis for speech recognition
Zheng-Hua Tan, Børge Lindberg
Silence feature normalization for robust speech recognition in additive noise environments
Chieh-cheng Wang, Chi-an Pan, Jeih-weih Hung
Blind dereverberation based on CMN and spectral subtraction by multi-channel LMS algorithm
L. Wang, Seiichi Nakagawa, Norihide Kitaoka
Eigen-MLLR environment/speaker compensation for robust speech recognition
Yuan-Fu Liao, Hung-Hsiang Fang, Chi-Hui Hsu
Parameter clustering and sharing in variable-parameter HMMs for noise robust speech recognition
Dong Yu, Li Deng, Yifan Gong, Alex Acero
A feature compensation approach using high-order vector taylor series approximation of an explicit distortion model for noisy speech recognition
Jun Du, Qiang Huo
N-best based stochastic mapping on stereo HMM for noise robust speech recognition
Xiaodong Cui, Mohamed Afify, Yuqing Gao
Improving the ensemble speaker and speaking environment modeling approach by enhancing the precision of the online estimation process
Yu Tsao, Chin-Hui Lee
Combining noise compensation and missing-feature decoding for large vocabulary speech recognition in noise
Jianhua Lu, Ji Ming, Roger Woods
Joint Bayesian predictive classification and parallel model combination with prior scaling for robust ASR
Svein Gunnar Pettersen
Environment mismatch compensation using average eigenspace for speech recognition
Abhishek Kumar, John H. L. Hansen
Monte Carlo model-space noise adaptation for speech recognition
Daniel Povey, Brian Kingsbury
A 'speechiness' measure to improve speech decoding in the presence of other sound sources
Ning Ma, Phil Green
Feature vector normalization with combined standard and throat microphones for robust ASR
Luis Buera, Antonio Miguel, Oscar Saz, Alfonso Ortega, Eduardo Lleida
Phone-duration-dependent long-term dynamic features for a stochastic model-based voice activity detection
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
An on-line adaptation technique for emotional speech recognition using style estimation with multiple-regression HMM
Yusuke Ijima, Makoto Tachibana, Takashi Nose, Takao Kobayashi
HMM adaptation using statistical linear approximation for robust automatic speech recognition
Michael Berkovitch, Ilan D. Shallom
Beyond linear transforms: efficient non-linear dynamic adaptation for noise robust speech recognition
Steven J. Rennie, Pierre L. Dognin
Rapid unsupervised speaker adaptation robust in reverberant environment conditions
Randy Gomez, Jani Even, Kiyohiro Shikano
On a generalization of margin-based discriminative training to robust speech recognition
Jinyu Li, Chin-Hui Lee
Discriminative classifiers with generative kernels for noise robust ASR
M. J. F. Gales, C. Longworth
Covariance modelling for noise-robust speech recognition
R. C. van Dalen, M. J. F. Gales
Exploiting spatial-temporal feature distribution characteristics for robust speech recognition
Wei-Hau Chen, Shih-Hsiang Lin, Berlin Chen
Study of integration of statistical model-based voice activity detection and noise suppression
Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani
Neural network based regression for robust overlapping speech recognition using microphone arrays
Weifeng Li, John Dines, Mathew Magimai Doss, Hervé Bourlard
Amplitude and amplitude variation of emotional speech
Hartmut R. Pfitzinger, Christian Kaernbach
Babble speech: acoustic and perceptual variability
Nitish Krishnamurthy, Ayako Ikeno, John H. L. Hansen
On the properties of a time-varying quasi-harmonic model of speech
Yannis Pantazis, Olivier Rosec, Yannis Stylianou
Extraction and tracking of formant response jitter in the cochlea for objective prediction of SB/SF DAM attributes
Wenliang Lu, D. Sen
Consonant discrimination of degraded speech using an efferent-inspired closed-loop cochlear model
David P. Messing, Lorraine Delhorne, Ed Bruckert, Louis D. Braida, Oded Ghitza
On the development of variable length Teager energy operator (VTEO)
Vikrant Tomar, Hemant A. Patil
Metric learning for unsupervised phoneme segmentation
Yu Qiao, Nobuaki Minematsu
Combining task-dependent information with auditory attention cues for prominence detection in speech
Ozlem Kalinli, Shrikanth S. Narayanan
Probabilistic feature mapping based on trajectory HMMs
Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda
Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching
Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Keiichi Tokuda
Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano
An improved one-to-many eigenvoice conversion system
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano
Study on manipulation method of voice quality based on the vocal tract area function
Yoshinori Uchimura, Hideki Banno, Fumitada Itakura, Hideki Kawahara
Incorporating durational modification in voice transformation
Arthur Toth, Alan W. Black
Non-segmental duration feature extraction for prosodic classification
Amy Dashiell, Brian Hutchinson, Anna Margolis, Mari Ostendorf
An ERP study on categorical perception of lexical tones and nonspeech pitches
Hong-Ying Zheng, William S.-Y. Wang
The role of Japanese pitch accent in spoken-word recognition: evidence from middle-aged accentless dialect listeners
Takashi Otake, Marii Higuchi
Mandarin Chinese tone nucleus detection with landmarks
Siwei Wang, Gina-Anne Levow
A comparative study on dissyllabic stress patterns of Mandarin and Cantonese
Weixiang Hu, Jin Jian, Aijun Li, Xia Wang
Three-sectional-staff characterization of Cantonese level tones
Rerrario Shui-Ching Ho, Yoshinori Sagisaka
A seven-tone dialect in southern China with falling-rising-falling contour: a linguistic acoustic analysis
Xiaonong Zhu, Caicai Zhang
Pitch target analysis of Thai tones using quantitative target approximation model and unsupervised clustering
Santitham Prom-on
Do English speakers assimilate Mandarin tones to English prosodic categories?
Connie K. So, Catherine T. Best
Evidence of a near-merger in western sydney australian English vowels
Rikke L. Bundgaard-Nielsen, Catherine T. Best, Michael D. Tyler, Christian Kroos
Central vowels in Arrernte: metrical prominence and pitch accent
Marija Tabain, Kristine Rickard, Gavan Breen, Veronica Dobson
Pausing and phrase length in two australian languages
Bella Ross
Positional effects on the characterization of ejectives in Waima'a
Mary Stevens, John Hajek
A Niuean variant of New Zealand English?
Donna Starks, Laura Thompson, Catherine I. Watson
Phonetic confusion analysis and robust phone set generation for Shanghai-accented Mandarin speech recognition
Guo-Hong Ding
Prosody for Mandarin speech recognition: a comparative study of read and spontaneous speech
Yu Ting Yeung, Yao Qian, Tan Lee, Frank K. Soong
Improved large vocabulary Mandarin speech recognition by selectively using tone information with a two-stage prosodic model
Li-Wei Cheng, Lin-shan Lee
Mandarin connected digits recognition for whispered speech
Tingting Ru, Xiang Xie, Hui Yin, Jingming Kuang
Recognizing named entities in spoken Chinese dialogues with a character-level maximum entropy tagger
Changchun Bao, Weiqun Xu, Yonghong Yan
A novel approach in continuous speech recognition for Vietnamese, an isolating tonal language
Hong Quang Nguyen, Pascal Nocera, Eric Castelli, Van Loan Trinh
Evaluating semantic-level confidence scores with multiple hypotheses
B. Thomson, K. Yu, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, Steve Young
Structured models for joint decoding of repeated utterances
Geoffrey Zweig, Dan Bohus, Xiao Li, Patrick Nguyen
A Bayesian approach to semantic composition for spoken language interpretation
Marie-Jean Meurs, Fabrice Lefevre, Renato De Mori
Accommodating explicit user expressions of uncertainty in voice search or something like that
Tim Paek, Yun-Cheng Ju
Effects of user modeling on POMDP-based dialogue systems
Dongho Kim, Hyeong Seop Sim, Kee-Eung Kim, Jin Hyung Kim, Hyunjeong Kim, Joo Won Sung
The best of both worlds: unifying conventional dialog systems and POMDPs
Jason D. Williams
The assimilation of L2 australian English vowels to L1 Japanese vowel categories: vocabulary size matters
Rikke L. Bundgaard-Nielsen, Catherine T. Best, Michael D. Tyler
Vowel epenthesis, acoustics and phonology patterns in Moroccan Arabic
Azra N. Ali, Mohamed Lahrouchi, Michael Ingleby
Estimation of vocal tract area function for Mandarin vowel sequences using MRI
Gaowu Wang, Jianwu Dang, Jiangping Kong
The effect of first language (L1) dialects on the identification of Vietnamese word-final stops
Kimiko Tsukada, Thu T. A. Nguyen
Perceptual evidence of modern Greek voiced stops as phonological categories
Mark Antoniou, Catherine T. Best, Michael D. Tyler
The effect of auditory and visual degradation on audiovisual perception of native and non-native speakers
Valerie Hazan, Enid Li
Quantitative prosodic analysis of spontaneous speech
Hansjörg Mixdorff
The effect of cognitive load on disfluencies during in-vehicle spoken dialogue
Anders Lindström, Jessica Villing, Staffan Larsson, Alexander Seward, Nina Åberg, Cecilia Holtelius
Discourse prosody context - global F0 and tempo modulations
Chiu-yu Tseng, Zhao-yu Su
A method for automatic and dynamic estimation of discourse genre typology with prosodic features
Nicolas Obin, Anne Lacheret-Dujour, Christophe Veaux, Xavier Rodet, Anne-Catherine Simon
The meanings carried by interjections in spontaneous speech
Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita
Speech interaction with an emotional robotic dog
Christian M. Jones, Andrew Deeming
Control of prosodic focus in corpus-based generation of fundamental frequency based on the generation process model
Keiko Ochi, Keikichi Hirose, Nobuaki Minematsu
Analysis and perception of speech under physical task stress
Keith W. Godin, John H. L. Hansen
An analysis of multimodal cues of interruption in dyadic spoken interactions
Chi-Chun Lee, Sungbok Lee, Shrikanth S. Narayanan
Paralinguistic effects on turn-taking behavior in expressive conversation
Hiroki Mori, Hideki Kasuya
Study on "ng, a" type of discourse markers in standard Chinese
Zhigang Yin, Aijun Li, Ziyu Xiong
How can you use disfluencies and still sound as a good speaker?
Helena Moniz, Ana Isabel Mata, Isabel Trancoso, M. Ceu Viana
What makes a good speaker? subject ratings, acoustic measurements and perceptual evaluations
Eva Strangert, Joakim Gustafson
Towards measuring continuous acoustic feature convergence in unconstrained spoken dialogues
Spyros Kousidis, David Dorran, Yi Wang, Brian Vaughan, Charlie Cullen, Dermot Campbell, Ciaran McDonnell, Eugene Coyle
Detection of feeling through back-channels in spoken dialogue
Tatsuya Kawahara, Masayoshi Toyokura, Teruhisa Misu, Chiori Hori
Discrimininative training of narrow band - wide band adapted systems for meeting recognition
Martin Karafiát, Lukáš Burget, Thomas Hain, Jan Černocký
A fast speaker adaptation method using aspect model
Seongjun Hahm, Akinori Ito, Shozo Makino, Motoyuki Suzuki
Probabilistic latent speaker training for large vocabulary speech recognition
Dan Su, Xihong Wu, Huisheng Chi
Improvement of eigenvoice-based speaker adaptation by parameter space clustering
Shutaro Tanji, Koichi Shinoda, Sadaoki Furui, Antonio Ortega
Study of jacobian compensation using linear transformation of conventional MFCC for VTLN
D. R. Sanand, S. Umesh
Adaptive HMM topology for speech recognition
Chuan-Wei Ting, Kuo-Yuan Lee, Jen-Tzung Chien
Minimum phone error discriminative training for Mandarin Chinese speaker adaptation
Liang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang
Fast speaker adaptive training for speech recognition
Daniel Povey, Hong-Kwang Jeff Kuo, Hagen Soltau
Adaptive training using discriminative mapping transforms
C. K. Raut, K. Yu, M. J. F. Gales
Speaker adaptive training using shift-MLLR
Jonas Loof, Christian Gollan, Hermann Ney
XMLLR for improved speaker adaptation in speech recognition
Daniel Povey, Hong-Kwang Jeff Kuo
Effective acoustic adaptation for a distant-talking interactive TV system
Jing Huang, Mark Epstein, Marco Matassoni
A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics
P. T. Akhil, S. P. Rath, S. Umesh, D. R. Sanand
A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation
Shizhen Wang, Steven M. Lulich, Abeer Alwan
Speaker identification for whispered speech based on frequency warping and score competition
Xing Fan, John H. L. Hansen
Experimental evaluation of multi-band position-pitch estimation (m-popi) algorithm for multi-speaker localization
Tania Habib, Lukas Ottowitz, Marián Képesi
Features for automatic detection of voice bars in continuous speech
N. Dhananjaya, S. Rajendran, B. Yegnanarayana
Speaker orientation estimation based on hybridation of GCC-PHAT and HLBR
Carlos Segura, Alberto Abad, Javier Hernando, Climent Nadeu
Parallel and hierarchical speech feature classification using frame and segment-based methods
Jun Hou, Lawrence R. Rabiner, Sorin Dusan
Automatically learning speaker-independent acoustic subword units
Balakrishnan Varadarajan, Sanjeev Khudanpur
Human-like ears versus two-microphone array, which works better for speaker identification?
Waleed H. Abdulla, Yushi Zhang
Is a speech recognizer useful for characteristic analysis of classroom lecture speech?
Kenji Kobayashi, Mitsuhiro Somiya, Hiromitsu Nishizaki, Yoshihiro Sekiguchi
An intuitive class discriminability measure for feature selection in a speech recognition system
Ladan Golipour, Douglas O'Shaughnessy
f-divergence is a generalized invariant measure between distributions
Yu Qiao, Nobuaki Minematsu
Sparse linear predictors for speech processing
Daniele Giacobello, Mads Græsbøll Christensen, Joachim Dahl, Søren Holdt Jensen, Marc Moonen
Frequency-domain parameter estimations for binary masked signals
J. X. Zhang, Mads Græsbøll Christensen, Joachim Dahl, Søren Holdt Jensen, Marc Moonen
Decomposition of rotational distortion caused by VTL difference using eigenvalues of its transformation matrix
Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
Region-based vocal tract length normalization for ASR
Michail G. Maragakis, Alexandros Potamianos
Speaker verification with non-audible murmur segments by combining global alignment kernel and penalized logistic regression machine
Hideki Okamoto, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano
Analysis of subspace within-class covariance normalization for SVM-based speaker verification
Liang Lu, Yuan Dong, Xianyu Zhao, Jian Zhao, Chengyu Dong, Haila Wang
Comparison of input and feature space nonlinear kernel nuisance attribute projections for speaker verification
Xianyu Zhao, Yuan Dong, Jian Zhao, Liang Lu, Jiqing Liu, Haila Wang
A generalised derivative kernel for speaker verification
C. Longworth, M. J. F. Gales
Modeling prior belief for speaker verification SVM systems
Luciana Ferrer
Convergence between SVM-based and distance-based paradigms for speaker recognition
Delphine Charlet, Xianyu Zhao, Yuan Dong
High-level speaker verification via articulatory-feature based sequence kernels and SVM
Shi-Xiong Zhang, Man-Wai Mak
Characterizing speech utterances for speaker verification with sequence kernel SVM
Kong-Aik Lee, Changhuai You, Haizhou Li, Tomi Kinnunen, Donglai Zhu
Development of the primary CRIM system for the NIST 2008 speaker recognition evaluation
Patrick Kenny, Najim Dehak, Pierre Ouellet, Vishwa Gupta, Pierre Dumouchel
Making confident speaker verification decisions with minimal speech
Robbie Vogt, Sridha Sridharan, Michael Mason
Parallelized factor analysis and feature normalization for automatic speaker verification
Jun Luo, Cheung-Chi Leung, Marc Ferràs, Claude Barras
Intersession variability in speaker recognition: a behind the scene analysis
Daniel Garcia-Romero, Carol Y. Espy-Wilson
Speaker recognition based on variational Bayesian method
Tatsuya Ito, Kei Hashimoto, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Factor analysis multi-session training constraint in session compensation for speaker verification
Driss Matrouf, Jean-François Bonastre, Salah Eddine Mezaache
The role of 'delta' features in speaker verification
Ying Liu, Martin J. Russell, Michael J. Carey
Investigating morphological decomposition for transcription of Arabic broadcast news and broadcast conversation data
Lori Lamel, Abdel. Messaoudi, Jean-Luc Gauvain
Transcribing broadcast data using MLP features
Petr Fousek, Lori Lamel, Jean-Luc Gauvain
Development of the SRI/nightingale Arabic ASR system
D. Vergyri, A. Mandal, Wen Wang, Andreas Stolcke, Jing Zheng, Martin Graciarena, D. Rybach, Christian Gollan, Ralf Schlüter, Katrin Kirchhoff, A. Faria, Nelson Morgan
Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system
Christian Gollan, Hermann Ney
The CMU-interACT 2008 Mandarin transcription system
Roger Hsiao, Mark Fuhs, Yik-Cheung Tam, Qin Jin, Tanja Schultz
Decoding-time prediction of non-verbalized punctuation
Anoop Deoras, Jürgen Fritsch
On the impact of alignment on voice conversion performance
Elina Helander, Jan Schwarz, Jani Nurminen, Hanna Silen, Moncef Gabbouj
The linear transformation of LF glottal waveforms for voice conversion
Arantza del Pozo, Steve Young
Maximum a posteriori adaptation for many-to-one eigenvoice conversion
Daisuke Tani, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano
Improvement to a NAM captured whisper-to-speech system
Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Christian Jutten
Speaker identification in noise mismatch conditions based on jump function Kolmogorov analysis in wavelet domain
Huy Dat Tran, Haizhou Li
Modelling fine-phonetic detail in a computational model of word recognition
Odette Scharenborg
Pronunciation reduction: how it relates to speech style, gender, and age
Helmer Strik, Joost van Doremalen, Catia Cucchiarini
Analysis of glottal stops in speech signals
B. Yegnanarayana, S. Rajendran, Hussien Seid Worku, N. Dhananjaya
The acoustic to articulation mapping: non-linear or non-unique?
Daniel Neiberg, G. Ananthakrishnan, Olov Engwall
The entropy of the articulatory phonological code: recognizing gestures from tract variables
Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis M. Goldstein, Elliot Saltzman
Addressing database mismatch in forensic speaker recognition with Ahumada III: a public real-casework database in Spanish
Daniel Ramos, Joaquin Gonzalez-Rodriguez, Javier Gonzalez-Dominguez, Jose Juan Lucena-Molina
FM features for automatic forensic speaker recognition
Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Julien Epps
Automatic-type calibration of traditionally derived likelihood ratios: forensic analysis of australian English /o/ formant trajectories
Geoffrey Stewart Morrison, Yuko Kinoshita
Forensic speaker verification using formant features and Gaussian mixture models
Timo Becker, Michael Jessen, Catalin Grigoras
The case for automatic higher-level features in forensic speaker recognition
Elizabeth Shriberg, Andreas Stolcke
Group delay function for improved gender identification
Kye-Hwan Lee, Sang-Ick Kang, Ji-Hyun Song, Joon-Hyuk Chang
Frame-synchronous and local confidence measures for on-the-fly automatic speech recognition
Joseph Razik, Odile Mella, Dominique Fohr, Jean-Paul Haton
Hilbert envelope based spectro-temporal features for phoneme recognition in telephone speech
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
Evidence of coarticulation in a phonological feature detection system
Abhijeet Sangwan, Ayako Ikeno, John H. L. Hansen
Phoneme recognition based on hybrid neural networks with inhibition/enhancement of distinctive phonetic feature (DPF) trajectories
Mohammad Nurul Huda, Kouichi Katsurada, Tsuneo Nitta
A neural network based nonlinear feature transformation for speech recognition
Hongbing Hu, Stephen A. Zahorian
Significance of group delay based acoustic features in the linguistic search space for robust speech recognition
R. Ramya, Rajesh M. Hegde, Hema A. Murthy
Genetic programming based optimization of class-dependent PCA for extracting robust MFCC
Houman Abbasian, Babak Nasersharif, Ahmad Akbari
Comparison of AM-FM based features for robust speech recognition
K. V. S. Narayana, T. V. Sreenivas
Growing bottleneck features for tandem ASR
Joe Frankel, Dong Wang, Simon King
Landmark based recognition of stops: acoustic attributes versus smoothed spectra
Veena Karjigi, Preeti Rao
Speech recognition performance of CJLC: corpus of Japanese lecture contents
Satoru Kogure, Hiromitsu Nishizaki, Masatoshi Tsuchiya, Kazumasa Yamamoto, Shingo Togashi, Seiichi Nakagawa
On the combination of auditory and modulation frequency channels for ASR applications
Fabio Valente, Hynek Hermansky
Tandem processing of fepstrum features
Vivek Tyagi
Data-driven clustered hierarchical tandem system for LVCSR
Shuo-Yiin Chang, Lin-shan Lee
Linear discriminant feature extraction using weighted classification confusion information
Hung-Shin Lee, Berlin Chen
Use of spectral centre of gravity for generating speaker invariant features for automatic speech recognition
D. R. Sanand, V. Balaji, Rani R. Sandhya, S. Umesh
Short- and long-term dynamic features for robust speech recognition
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
Multi-modal recording, analysis and indexing of poster sessions
Tatsuya Kawahara, Hisao Setoguchi, Katsuya Takanashi, Kentaro Ishizuka, Shoko Araki
Automatic pitch-synchronous phonetic segmentation
Jindřich Matoušek, Jan Romportl
Two protocols comparing human and machine phonetic recognition performance in conversational speech
Wade Shen, Joseph Olive, Douglas Jones
Analysis of drivers' speech in a car environment
Tomoyuki Kato, Jun Okamoto, Makoto Shozakai
Preparing a corpus of dutch spontaneous dialogues for automatic phonetic analysis
Barbara Schuppler, Mirjam Ernestus, Odette Scharenborg, Lou Boves
Evaluation of voice activity and voicing detection
Bojan Kotnik, Pierre Sendorek, Sergey Astrov, Turgay Koc, Tolga Ciloglu, Laura Docío Fernández, Eduardo Rodríguez Banga, Harald Höge, Zdravko Kačič
Wikispeech - a content management system for speech databases
Christoph Draxler, Klaus Jänsch
Development and evaluation of Polish speech corpus for unit selection speech synthesis systems
Grazyna Demenko, J. Bachan, Bernd Möbius, K. Klessa, M. Szymański, Stefan Grocholewski
A data format enabling interoperation of speech recognition, translation and information extraction engines: the GALE type system
John F. Pitrelli, Burn L. Lewis, Edward A. Epstein, Jerome L. Quinn, Ganesh Ramaswamy
A rank-predicted pseudo-greedy approach to efficient text selection from large-scale corpus for maximum coverage of target units
Wei Li, Qiang Huo
Memo workbench for semi-automated usability testing
Klaus-Peter Engelbrecht, Michael Kruppa, Sebastian Möller, Michael Quade
MDS-based visualization method for multiple speech corpora
Kimiko Yamakawa, Tomoko Matsui, Shuichi Itahashi
Scripted dialogs versus improvisation: lessons learned about emotional elicitation techniques from the IEMOCAP database
Carlos Busso, Shrikanth S. Narayanan
Automatic pronunciation evaluation and classification
Om D. Deshmukh, Sachindra Joshi, Ashish Verma
Pronunciation error detection techniques for children's speech
Daniel Bolanos, Wayne Ward, Barbara Wise, Sarel van Vuuren
Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training
Lan Wang, Xin Feng, Helen M. Meng
Automatic children's reading tutor on hand-held devices
Xiaolong Li, Li Deng, Yun-Cheng Ju, Alex Acero
A Japanese CALL system based on dynamic question generation and error prediction for ASR
Hongcui Wang, Tatsuya Kawahara
Estimation of children's reading ability by fusion of automatic pronunciation verification and fluency detection
Matthew Black, Joseph Tepperman, Sungbok Lee, Shrikanth S. Narayanan
Pronunciation verification of English letter-sounds in preliterate children
Matthew Black, Joseph Tepperman, Abe Kazemzadeh, Sungbok Lee, Shrikanth S. Narayanan
Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer
Alissa M. Harrison, Wing Yiu Lau, Helen M. Meng, Lan Wang
DISCO: development and integration of speech technology into courseware for language learning
Catia Cucchiarini, Joost van Doremalen, Helmer Strik
Discriminative model combination and language model selection in a reading tutor for children
Abdurrahman Samir, Jacques Duchateau, Hugo Van hamme
Usability of ASR-based reading training for dyslexics
Jakob Schou Pedersen, Lars Bo Larsen, Børge Lindberg
A browsing system for classroom lecture speech
Shingo Togashi, Seiichi Nakagawa
Automatic pronunciation evaluation of language learners' utterances generated through shadowing
Dean Luo, Naoya Shimomura, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose
Application and evaluation of speech technologies in language learning: experiments with the Saybot player
Sylvain Chevalier, Zhenhai Cao
Forward optimal modeling of acoustic confusions in Mandarin CALL system
Fengpei Ge, Fuping Pan, Changliang Liu, Bin Dong, Yonghong Yan
Recognition of English utterances with grammatical and lexical mistakes for dialogue-based CALL system
Akinori Ito, Ryohei Tsutsui, Shozo Makino, Motoyuki Suzuki
Dysarthric speech database for universal access research
Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon Gunderson, Thomas S. Huang, Kenneth Watkin, Simone Frame
Objective intelligibility assessment of pathological speakers
Catherine Middag, Gwen Van Nuffelen, Jean-Pierre Martens, Marc De Bodt
Quantitative analysis of intonation patterns produced by Cantonese speakers with Parkinson's disease: a preliminary study
Joan K.-Y. Ma, Tara L. Whitehill
Phonetic-acoustic and feature analyses by a neural network to assess speech quality in patients treated for head and neck cancer
Marieke de Bruijn, Irma Verdonck de Leeuw, Louis ten Bosch, Joop Kuik, Hugo Quene, Lou Boves, Hans Langendijk, Rene Leemans
Automatic evaluation of characteristic speech disorders in children with cleft lip and palate
Andreas Maier, Florian Hönig, Christian Hacker, Maria Schuster, Elmar Nöth
Application of weighted finite-state transducers to improve recognition accuracy for dysarthric speech
Omar Caballero Morales, Stephen Cox
The interspeech 2008 consonant challenge
Martin Cooke, Odette Scharenborg
HMM-based estimation of unreliable spectral components for noise robust speech recognition
Bengt J. Borgström, Abeer Alwan
Gammatone-domain model combination for consonant recognition in noisy environments
Jae Sam Yoon, Ji Hun Park, Hong Kook Kim
On the mask modeling and feature representation in the missing-feature ASR: evaluation on the Consonant Challenge
Peter Jančovič, Munevver Kokuer
The non-native consonant challenge for european languages
M. Luisa Garcia Lecumberri, Martin Cooke, Francesco Cutugno, Mircea Giurgiu, Bernd T. Meyer, Odette Scharenborg, Wim van Dommelen, Jan Volin
Noise reduction through compressed sensing
J. F. Gemmeke, Bert Cranen
Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement
Björn Schuller, Martin Wöllmer, Tobias Moosmayr, Gerhard Rigoll
Improving consonant identification in noise and reverberation by steady-state suppression as a preprocessing approach
Nao Hodoshima, Wataru Yoshida, Takayuki Arai
Human speech perception and feature extraction
Bryce E. Lobdell, Mark Hasegawa-Johnson, Jont B. Allen
Improving pronunciation modeling for non-native speech recognition
Tien-Ping Tan, Laurent Besacier
Online vocabulary adaptation using contextual information and information retrieval
Hagai Aronowitz
Lexicon expansion using pronunciation variations extracted on the basis of speaker-related deviation in recognition error statistics
Yoshifumi Onishi
Better nonnative intonation scores through prosodic theory
Joseph Tepperman, Shrikanth S. Narayanan
Silence models in weighted finite-state transducers
Philip N. Garner
Extracting word-pronunciation pairs from comparable set of text and speech
Tetsuro Sasada, Shinsuke Mori, Tatsuya Kawahara
Robust far-field speaker identification under mismatched conditions
Qin Jin, Tanja Schultz
Robust speaker verification using short-time frequency with long-time window and fusion of multi-resolutions
Chien-Lin Huang, Bin Ma, Chung-Hsien Wu, Brian Mak, Haizhou Li
Performance improvement of text-independent speaker verification systems based on histogram enhancement in noisy environments
C. H. Kwon, J. K. Choi, Eliathamby Ambikairajah
Filling acoustic holes through leveraged uncorellated GMMs for in-set/out-of-set speaker recognition
Jun-Won Suh, Pongtep Angkititrakul, John H. L. Hansen
Missing-feature method for speaker recognition in band-restricted conditions
Wooil Kim, John H. L. Hansen
Robust speaker identification using cross-correlation GTF-ICA feature
Yushi Zhang, Waleed H. Abdulla
Perceptual speaker identification using monosyllabic stimuli - effects of the nucleus vowels and speaker characteristics contained in nasals
Kanae Amino, Takayuki Arai
Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech
Amitava Das, Gokul Chittaranjan
Usefulness of text-conditioning and a new database for text-dependent speaker recognition research
Amitava Das, Gokul Chittaranjan, Gopala K. Anumanchipalli
Combination method of bone-conduction speech and air-conduction speech for speaker recognition
Satoru Tsuge, Takashi Osanai, Hisanori Makinae, Toshiaki Kamada, Minoru Fukumi, Shingo Kuroiwa
MAP and sub-word level t-norm for text-dependent speaker recognition
Doroteo T. Toledano, Daniel Hernandez-Lopez, Cristina Esteve-Elizalde, Joaquin Gonzalez-Rodriguez, Ruben Fernandez Pozo, Luis Hernandez Gomez
Forensic speaker recognition in Chinese: a multivariate likelihood ratio discrimination on /i/ and /y/
Cuiling Zhang, Geoffrey Stewart Morrison, Philip Rose
How many do we need? exploration of the population size effect on the performance of forensic speaker classification
Shunichi Ishihara, Yuko Kinoshita
Comparing prosodic models for speaker recognition
Cheung-Chi Leung, Marc Ferras, Claude Barras, Jean-Luc Gauvain
Combination of clean and contaminated GMM/SVM for far-field text-independent speaker verification
Christian Zieger, Maurizio Omologo
English word stress as produced by English and dutch speakers: the role of segmental and suprasegmental differences
Bettina Braun, Kristin Lemhofer, Anne Cutler
The strength of stress-related lexical competition depends on the presence of first-syllable stress
Eva Reinisch, Alexandra Jesse, James M. McQueen
Word stress placement by native speakers and Japanese learners of English
Keiichi Ishikawa, Jun Nomura
Schwa variants in american English
H. Timothy Bunnell, Jason Lilley
Covariations of English segmental durations across speakers
Jiahong Yuan
The intelligibility of the English vowel /ʌ/ produced by native speakers of Japanese and its relations to the acoustic characteristics
Akiyo Joto
Rate dependent spectral reduction for voiceless fricatives
Benjamin Weiss
Investigating perception of places of articulation in sign and speech
Stina Ojala, Olli Aaltonen, Tapio Salakoski
Six- and twelve-month-olds' discrimination of native versus non-native between- and within-organ fricative place contrasts
Michael D. Tyler, Catherine T. Best, Louis M. Goldstein, Mark Antoniou, Lidija Krebs-Lazendic
your baby can't hear you: how mothers talk to infants with simulated hearing loss
Christa Lam, Christine Kitamura
Development of communicative skills in 8- to 16-month-old children: a longitudinal study
Eeva Klintfors, Ulla Sundberg, Francisco Lacerda, Ellen Marklund, Lisa Gustavsson, Ulla Bjursäter, Iris-Corinna Schwarz, Göran Söderlund
Vocal imitation in early language acquisition
Lisa Gustavsson, Francisco Lacerda
Computational language acquisition by statistical bottom-up processing
Okko Rasanen, Unto K. Laine, Toomas Altosaar
Lexical analyses of native and non-native English language instructor speech based on a six-month co-taught classroom video corpus
Noriaki Katagiri, Goh Kawai
Perception and production of consonant clusters in Japanese-English bilingual and Japanese monolingual speakers
Hinako Masuda, Takayuki Arai
Lip synchronization: from phone lattice to PCA eigen-projections using neural networks
Samer Al Moubayed, Michael De Smet, Hugo Van hamme
Building and combining document and music spaces for music query-by-webpage system
Ryoei Takahashi, Yasunori Ohishi, Norihide Kitaoka, Kazuya Takeda
Improving searching speed and accuracy of query by humming system based on three methods: feature fusion, candidates set reduction and multiple similarity measurement rescoring
Lei Wang, Shen Huang, Sheng Hu, Jiaen Liang, Bo Xu
Towards a segmental vocoder driven by ultrasound and optical images of the tongue and lips
Thomas Hueber, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone
Phone recognition from ultrasound and optical video sequences for a silent speech interface
Thomas Hueber, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone
Feature space transforms for Czech sign-language recognition
Jan Trmal, Marek Hrúz, Jan Zelinka, Pavel Campr, Luděk Müller, Luděk Müller
Masked speech priming: no priming in dense neighbourhoods
Chris Davis, Jeesun Kim, Angelo Barbaro
Integration of audiovisual speech and priming effects
Azra N. Ali
Similarity between vowels influences response execution in word identification
Jason D. Zevin, Thomas A. Farmer
Phonotactically well-formed onset clusters as processing units in word recognition
Tom Lentz
Prelexically-driven perceptual retuning of phoneme boundaries
Anne Cutler, James M. McQueen, Sally Butterfield, Dennis Norris
Visual speech modifies the phoneme restoration effect
Erin Cvejic, Jeesun Kim, Chris Davis
An objective singing evaluation approach by relating acoustic measurements to perceptual ratings
Chuan Cao, Ming Li, Jian Liu, Yonghong Yan
On the perceived quality of noise reduced signals
Valerie Gautier-Turbin, Laetitia Gros
A methodology and tool suite for evaluation of accuracy of interoperating statistical natural language processing engines
Uma Murthy, John F. Pitrelli, Ganesh Ramaswamy, Martin Franz, Burn L. Lewis
An empirical analysis of word error rate and keyword error rate
Youngja Park, Siddharth Patwardhan, Karthik Visweswariah, Stephen C. Gates
Measuring speech quality impact on tasks performance
Virginie Durin, Laetitia Gros
Voice commands in home environment - a consumer survey
Hannu Soronen, Markku Turunen, Jaakko Hakulinen
Extended partial distance elimination and dynamic Gaussian selection for fast likelihood computation
Ghazi Bouselmi, Jun Cai
Improving the multigram algorithm by using lattices as input
Joris Driesen, Hugo Van hamme
Backward Viterbi beam search for utilizing dynamic task complexity information
Min Tang, Philippe Di Cristo
Fast speech decoding through phone confusion networks
Nicola Bertoldi, Marcello Federico, Daniele Falavigna, Matteo Gerosa
High-performance low-latency speech recognition via multi-layered feature streaming and fast Gaussian computation
Liang Gu, Jian Xue, Xiaodong Cui, Yuqing Gao
A low-power hardware search architecture for speech recognition
Patrick J. Bourke, Rob A. Rutenbar
Phonetic query expansion for spoken document retrieval
Jonathan Mamou, Bhuvana Ramabhadran
Implementation and evaluation of fast on-the-fly WFST composition algorithms
Tasuku Oonishi, Paul R. Dixon, Koji Iwano, Sadaoki Furui
Analysis of voice-quality features of speech that expresses 'anger', 'joy', and 'sadness' uttered by radio actors and actresses
Shoichi Takeda, Yuuri Yasuda, Risako Isobe, Shogo Kiryu, Makiko Tsuru
Including pitch accent optionality in unit selection text-to-speech synthesis
Leonardo Badino, Robert A. J. Clark, Volker Strom
Emotion conversion using F0 segment selection
Zeynep Inanoglu, Steve Young
Generating natural F0 trajectory with additive trees
Yao Qian, Hui Liang, Frank K. Soong
Generating intonation from a mixed CART-HMM model for speech synthesis
Cédric Boidin, Olivier Boeffard
Intonation modeling of Mandarin Chinese using a superpositional approach
Pablo Daniel Aguero, Antonio Bonafonte, Lu Yu, Juan Carlos Tulli
Two-stage prosody prediction for emotional text-to-speech synthesis
Hao Tang, Xi Zhou, Matthias Odisio, Mark Hasegawa-Johnson, Thomas S. Huang
Prosody boundary detection through context-dependent position models
Yue-Ning Hu, Min Chu, Chao Huang, Yan-Ning Zhang
Duration refinement by jointly optimizing state and longer unit likelihood
Boyang Gao, Yao Qian, Zhizheng Wu, Frank K. Soong
T-tilt: a modified tilt model for F0 analysis and synthesis in tonal languages
Ausdang Thangthai, Nattanun Thatphithakkul, Chai Wutiwiwatchai, Anocha Rugchatjaroen, Sittipong Saychum
Multilevel parametric-base F0 model for speech synthesis
Javier Latorre, Masami Akamine
On the generation of synthetic disfluent speech: local prosodic modifications caused by the insertion of editing terms
Jordi Adell, Antonio Bonafonte, David Escudero-Mancebo
A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis
Oytun Türk, Marc Schröder
Tree grammars as models of prosodic structure
Joseph Tepperman, Shrikanth S. Narayanan
Addressing the out-of-vocabulary problem for large-scale Chinese spoken term detection
Sha Meng, Jian Shao, Roger Peng Yu, Jia Liu, Frank Seide
Towards vocabulary-independent speech indexing for large-scale repositories
Jian Shao, Roger Peng Yu, Qingwei Zhao, Yonghong Yan, Frank Seide
Towards the integration of automatic speech recognition and information retrieval for spoken query processing
A. Moreno-Daniel, J. Wilpon, B.-H. Juang, S. Parthasarathy
Reducing the effect of OOV query words by using morph-based spoken document retrieval
Ville T. Turunen
Bayesian latent topic clustering model
Meng-Sung Wu, Jen-Tzung Chien
Spoken document retrieval by translating recognition candidates into correct transcriptions
Tomoyosi Akiba, Yusuke Yokota
Audio indexing for an interactive Italian literature management system
Carlo Drioli, Piero Cosi
Open-vocabulary spoken-document retrieval based on query expansion using related web documents
Makoto Terao, Takafumi Koshinaka, Shinichi Ando, Ryosuke Isotani, Akitoshi Okumura
Discriminative graph training for ultra-fast low-footprint speech indexing
Upendra Chaudhari, Hong-Kwang Jeff Kuo, Brian Kingsbury
A language-modeling approach to inverse text normalization and data cleanup for multimodal voice search applications
Yun-Cheng Ju, Julian Odell
Topic segmentation and indexation in a media watch system
Rui Amaral, Isabel Trancoso
Vocabulary independent discriminative term frequency estimation
J. Scott Olsson
Spoken keyword spotting via multi-lattice alignment
Hui Lin, Alex Stupakov, Jeff A. Bilmes
Robust spoken term detection using combination of phone-based and word-based recognition
Kenji Iwata, Koichi Shinoda, Sadaoki Furui
Language model adaptation for a speech to sign language translation system using web frequencies and a MAP framework
Luis Fernando D'Haro, Ruben San-Segundo, Ricardo de Cordoba, Jan Bungeroth, Daniel Stein, Hermann Ney
Hearing at home - communication support in home environments for hearing impaired persons
Jonas Beskow, Björn Granström, Peter Nordqvist, Samer Al Moubayed, Giampiero Salvi, Tobias Herzke, Arne Schulz
Traveling wave based group delays for cochlear implant speech processing
Daniel A. Taft, David B. Grayden, Anthony N. Burkitt
Multimodal perception of Mandarin tone for cochlear implant users
Damien J. Smith, Denis Burnham
Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments
Keigo Nakamura, Tomoki Toda, Yoshitaka Nakajima, Hiroshi Saruwatari, Kiyohiro Shikano
An acoustic typology of apraxic speech - toward reliable diagnosis
Jacqueline McKechnie, Kirrie J. Ballard, Donald A. Robin, Adam Jacks, Sallyanne Palethorpe, Kristin M. Rosen
Dysphonic voices and the 0-3000 hz frequency band
G. Pouchoulin, C. Fredouille, Jean-François Bonastre, A. Ghio, A. Giovanni
Verifying pronunciation accuracy from speakers with neuromuscular disorders
Shou-Chun Yin, Richard C. Rose, Oscar Saz, Eduardo Lleida
Multi-band and multi-cue analyses of disordered connected speech
A. Alpan, Y. Maryn, F. Grenez, A. Kacha, J. Schoentgen
Combining neural network and rule-based systems for dysarthria diagnosis
James Carmichael, Vincent Wan, Phil Green
Speech as a means of monitoring cognitive function of elderly speakers
Shona D'Arcy, Viliam Rapcan, Nils Penard, Margaret E. Morris, Ian H. Robertson, Richard B. Reilly
Integration of metamodel and acoustic model for speech recognition
Hironori Matsumasa, Tetsuya Takiguchi, Yasuo Ariki, Ichao Li, Toshitaka Nakabayashi
Frequency compression/transposition of fricative consonants for the hearing impaired with high-frequency dead regions
Francisco J. Fraga, Leticia P. Costa S. Prates, Maria Cecilia M. Iorio, Maria Cecilia M. Iorio
Relation between geometry and kinematics of articulatory trajectory associated with emotional speech production
Sungbok Lee, Tsuneo Kato, Shrikanth S. Narayanan
Intrinsic consonantal F0 perturbation in 3-way VOT contrast and its implications for aspiration-conditioned tonal split: evidence from Vietnamese
Michael J. Carne
A model based investigation of activation patterns of the tongue muscles for vowel production
Qiang Fang, Satoru Fujita, Xugang Lu, Jianwu Dang
Interrelationship between vocal effort and vocal tract acoustics: a pilot study
Maeva Garnier, Joe Wolfe, Nathalie Henrich, John Smith
Predicting tongue shapes from a few landmark locations
Chao Qin, Miguel A. Carreira-Perpinan, Korin Richmond, Alan Wrench, Steve Renals
LIPS2008: visual speech synthesis challenge
Barry-John Theobald, Sascha Fagel, Gérard Bailly, Frédéric Elisei
Speech-driven lip motion generation with a trajectory HMM
Gregor Hofer, Junichi Yamagishi, Hiroshi Shimodaira
A trainable trajectory formation model TD-HMM parameterized for the LIPS 2008 challenge
Gérard Bailly, Oxana Govokhina, Gaspard Breton, Frédéric Elisei, Christophe Savariaux
Comparing text-driven and speech-driven visual speech synthesisers
Barry-John Theobald, Gavin Cawley, Andrew Bangham, Iain Matthews, Nicholas Wilkinson
Automatic lip synchronization by speech signal analysis
Goranka Zoric, Aleksandra Cerekovic, Igor S. Pandzic
MASSY speaks English: adaptation and evaluation of a talking head
Sascha Fagel
From 3-d speaker cloning to text-to-audiovisual-speech
Sascha Fagel, Frédéric Elisei, Gérard Bailly
A development of Czech talking head
Zdeněk Krňoul, Miloš Železný
Realistic facial animation system for interactive services
Kang Liu, Joern Ostermann
Speech-driven 3d facial animation for mobile entertainment
Juan Yan, Xiang Xie, Hao Hu
A real-time text to audio-visual speech synthesis system
Lijuan Wang, Xiaojun Qian, Lei Ma, Yao Qian, Yining Chen, Frank K. Soong
Spoken language translation systems ************ ASR word lattice translation with exhaustive reordering is possible
Evgeny Matusov, Björn Hoffmeister, Hermann Ney
Development of SRI's translation systems for broadcast news and broadcast conversations
Jing Zheng, Wen Wang, Necip Fazil Ayan
Machine translation in continuous space
Ruhi Sarikaya, Yonggang Deng, Mohamed Afify, Brian Kingsbury, Yuqing Gao
Discovering phrases in machine translation by simulated annealing
Caroline Lavecchia, David Langlois, Kamel Smaïli
Towards domain independence in machine aided human translation
Aarthi Reddy, Richard C. Rose
Class-based statistical machine translation for field maintainable speech-to-speech translation
Ian R. Lane, Alex Waibel
Intonational phrases for speech summarization
Sameer R. Maskey, Andrew Rosenberg, Julia Hirschberg
Packing the meeting summarization knapsack
Korbinian Riedhammer, Dan Gillick, Benoit Favre, Dilek Hakkani-Tür
Class lecture summarization taking into account consecutiveness of important sentences
Yasuhisa Fujii, Kazumasa Yamamoto, Norihide Kitaoka, Seiichi Nakagawa
Using latent Dirichlet allocation to incorporate domain knowledge for topic transition detection
Xiaodan Zhu, Xuming He, Cosmin Munteanu, Gerald Penn
Weakly supervised training for parsing Mandarin broadcast transcripts
Wen Wang
Parsing with subdomain instance weighting from raw corpora
Barbara Plank, Khalil Sima'an
Dependency parsing of Japanese spoken monologue based on clause-starts detection
Tomohiro Ohno, Shigeki Matsubara, Hideki Kashioka, Yasuyoshi Inagaki
Online unsupervised pattern discovery in speech using parallelization
Mrugesh R. Gajjar, R. Govindarajan, T. V. Sreenivas
A comparison of input entry rates in a multimodal mobile application
Aleksi Melto, Markku Turunen, Jaakko Hakulinen, Anssi Kainulainen, Tomi Heimonen
Physically embodied conversational agents as health and fitness companions
Markku Turunen, Jaakko Hakulinen, Cameron Smith, Daniel Charlton, Li Zhang, Marc Cavazza
User perception of multi-modal interfaces for mobile applications
Florian Metze, Roman Englert, Udo Bub, Ingmar Kliche, Thomas Scheerbarth
Design and formulation for speech interface based on flexible shortcuts
Teppei Nakano, Tomoyuki Kumai, Tetsunori Kobayashi, Yasushi Ishikawa
Exploring classification techniques in speech based cognitive load monitoring
Bo Yin, Natalie Ruiz, Fang Chen, Eliathamby Ambikairajah
Finding two-level interpersonal context: proximity and conversation detection from personal audio feature data
Masayuki Okamoto, Naoki Iketani, Keisuke Nishimura, Masaaki Kikuchi, Kenta Cho, Masanori Hattori, Sougo Tsuboi
From domain specification to virtual humans: an integrated approach to authoring tactical questioning characters
Sudeep Gandhe, David DeVault, Antonio Roque, Bilyana Martinovski, Ron Artstein, Anton Leuski, Jillian Gerten, David Traum
Designing a massively multiplayer online role-playing game around text-to-speech
Mike Rozak
Robust speaker change detection using Kernel-Gaussian model
Jie Gao, Xiang Zhang, Qingwei Zhao, Yonghong Yan
A comparative study in automatic recognition of broadcast audio
Stavros Ntalampiras, Nikos Fakotakis
Joint time-frequency segmentation for transient decomposition
Charturong Tantibundhit, Gernot Kubin
Language and genre detection in audio content analysis
Vikramjit Mitra, Daniel Garcia-Romero, Carol Y. Espy-Wilson
An entropy based feature for whisper-island detection within audio streams
Chi Zhang, John H. L. Hansen
Two step speaker segmentation method using Bayesian information criterion and adapted Gaussian mixtures models
Matej Grašič, Marko Kos, Andrej Žgank, Zdravko Kačič
Domain-specific classification methods for disfluency detection
Sebastian Germesin, Tilman Becker, Peter Poller
Multi-speaker meeting audio segmentation
Tin Lay Nwe, Minghui Dong, Swe Zin Kalayar Khine, Haizhou Li
Rhythm based music segmentation and octave scale cepstral features for sung language recognition
Namunu C. Maddage, Haizhou Li
Robust voiced/unvoiced speech classification using empirical mode decomposition and periodic correlation model
Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu
A combination of data mining method with decision trees building for speech/music discrimination
Qiong Wu, Qin Yan, Jun Wang, Jun Hong
Advertisement detection in French broadcast news using acoustic repetition and Gaussian mixture models
Vishwa Gupta, Gilles Boulianne, Patrick Kenny, Pierre Dumouchel
A hybrid SVM/MCE training approach for vector space topic identification of spoken audio recordings
Timothy J. Hazen, Fred Richardson
Training audio events detectors with a sound effects corpus
Isabel Trancoso, Jose Portelo, Miguel Bugalho, João Neto, Antonio Serralheiro
Longitudinal study of ASR performance on ageing voices
Ravichander Vipperla, Steve Renals, Joe Frankel
HAC-models: a novel approach to continuous speech recognition
Hugo Van hamme
Investigations into phonological attribute classifier representations for CRF phone recognition
Prateeti Mohapatra, Eric Fosler-Lussier
Applications of virtual-evidence based speech recognizer training
Amarnag Subramanya, Jeff A. Bilmes
Spoken digit recognition using a hierarchical temporal memory
Joost van Doremalen, Lou Boves
A computational model of language acquisition: focus on word discovery
Louis ten Bosch, Hugo Van hamme, Lou Boves
Voice activity detection using modified Wigner-ville distribution
Lakshmish Kaushik, Douglas O'Shaughnessy
Energy and entropy based switching algorithm for speech endpoint detection in varying SNR conditions
Krishna Chaitanya, Rohit Sinha
Detection of speech embedded in real acoustic background based on amplitude modulation spectrogram features
Jörn Anemüller, Denny Schmidt, Jörg-Hendrik Bach
Voice activity detection algorithms using subband power distance feature for noisy environments
Tuan Van Pham, Michael Stadtschnitzer, Franz Pernkopf, Gernot Kubin
Speech-overlapped acoustic event detection for automotive applications
Christian Müller, Joan-Isaac Biel, Edward Kim, Daniel Rosario
Detection of acoustic events in interactive seminar data with temporal overlaps
Andrey Temko, Climent Nadeu
Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis
Chanwoo Kim, Richard M. Stern
Speech analysis using instantaneous frequency deviation
Anthony P. Stark, Kuldip K. Paliwal
Auditory-based formant estimation in noise using a probabilistic framework
Claudius Gläser, Martin Heckmann, Frank Joublin, Christian Goerick
Efficient representation of throat microphone speech
K. Sri Rama Murty, Saurav Khurana, Yogendra Umesh Itankar, M. R. Kesheorey, B. Yegnanarayana
Acoustic-phonetic approach for automatic evaluation of spoken grammar
Om D. Deshmukh, Ashish Verma
On estimation of a speaker's confusion matrix from sparse data
Stephen Cox
Talking heads and pronunciation training: a review
Valerie Hazan
Pronunciation training: the role of eye and ear
Dominic W. Massaro, Stephanie Bigler, Trevor Chen, Marcus Perlman, Slim Ouni
Can visualization of internal articulators support speech perception?
Preben Wik, Olov Engwall
Can audio-visual instructions help learners improve their articulation? - an ultrasound study of short term changes
Olov Engwall
Can you "read tongue movements"?
Pierre Badin, Yuliya Tarabalka, Frédéric Elisei, Gérard Bailly
Two- and three-dimensional visual articulatory models for pronunciation training and for treatment of speech disorders
Bernd J. Kröger, Verena Graf-Borttscheller, Anja Lowit
A 3-d virtual head as a tool for speech therapy for children
Sascha Fagel, Katja Madany
Anton: an animatronic model of a human tongue and vocal tract
Robin Hofe, Roger K. Moore
Physical models of the human vocal tract with gel-type material
Takayuki Arai
Mispronunciation detection for Mandarin Chinese
Chao Huang, Feng Zhang, Frank K. Soong, Min Chu
Efficient handwriting correction of speech recognition errors with template constrained posterior (TCP)
Lijuan Wang, Tao Hu, Peng Liu, Frank K. Soong
Bi-Gaussian score equalization in an audio-visual SVM-based person verification system
Pascual Ejarque, Javier Hernando
Speech recognition for vocalized and subvocal modes of production using surface EMG signals from the neck and face
Geoffrey S. Meltzner, Jason Sroka, James T. Heaton, L. Donald Gilmore, Glen Colby, Serge Roy, Nancy Chen, Carlo J. De Luca
Distinctive feature fusion for recognition of australian English consonants
Trent W. Lewis, David M. W. Powers
Time-lag adaptation for semi-synchronous speech and pen input
Yasushi Watanabe, Koichi Shinoda, Sadaoki Furui
Continuous pose-invariant lipreading
Patrick Lucey, Sridha Sridharan, David Dean
Czech-to-slovak adapted broadcast news transcription system
Jan Nouza, Jan Silovsky, Jindrich Zdansky, Petr Cerva, Martin Kroul, Josef Chaloupka
Continuous phone recognition without target language training data
Dau-Cheng Lyu, Sabato Marco Siniscalchi, Tae-Yoon Kim, Chin-Hui Lee
An investigation of acoustic models for multilingual code-switching
Christopher M. White, Sanjeev Khudanpur, James K. Baker
Cross-lingual portability of MLP-based tandem features - a case study for English and Hungarian
Lászlá Tóth, Joe Frankel, Gábor Gosztolya, Simon King
Seed models combination and state level mappings of cross-lingual transfer for rapid HMM development: from English to Mandarin
Xufang Zhao, Douglas O'Shaughnessy
Multi-accent and accent-independent non-native speech recognition
Ghazi Bouselmi, Dominique Fohr, Irina Illina
Cross-lingual sentence extraction for information distillation
Adish Kumar Singla, Dilek Hakkani-Tür
On the use of a multilingual neural network front-end
Stefano Scanzio, Pietro Laface, Luciano Fissore, Roberto Gemello, Franco Mana
Context-sensitive probabilistic phone mapping model for cross-lingual speech recognition
Khe Chai Sim, Haizhou Li
A non-acoustic approach to crosslingual speech recognition performance prediction
Chen Liu, Lynette Melnar
Factored translation models for enriching spoken language translation with prosody
Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore, Shrikanth S. Narayanan
Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation
Holger Schwenk, Yannick Esteve
Strategies for building a Farsi-English SMT system from limited resources
Andreas Kathol, Jing Zheng
Stream decoding for simultaneous spoken language translation
Muntsin Kolss, Stephan Vogel, Alex Waibel
Towards unsupervised training of the classifier-based speech translator
Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Aggregating distributed STT, MT, and information extraction engines: the GALE interoperability-demo system
John F. Pitrelli, Burn L. Lewis, Edward A. Epstein, Martin Franz, Daniel Kiecza, Jerome L. Quinn, Ganesh Ramaswamy, Amit Srivastava, Paola Virga
An interval type-2 fuzzy logic system to translate between emotion-related vocabularies
Abe Kazemzadeh, Sungbok Lee, Shrikanth S. Narayanan
Applying pitch-dependent difference detection and modification to emotional speaker recognition
Ting Huang, Yingchun Yang
Automatic recognition of anger in spontaneous speech
Daniel Neiberg, Kjell Elenius
An estimation technique of style expressiveness for emotional speech using model adaptation based on multiple-regression HSMM
Takashi Nose, Yoichi Kato, Makoto Tachibana, Takao Kobayashi
A vowel based approach for acted emotion recognition
Fabien Ringeval, Mohamed Chetouani
A composite framework for affective sensing
Gordon McIntyre, Roland Goecke
Towards automatic emotional state categorization from speech signals
Arslan Shaukat, Ke Chen
Speaker-independent emotion recognition based on feature vector classification
Jeong-Sik Park, Ji-Hwan Kim, Sang-Min Yoon, Yung-Hwan Oh
An analysis of vocal tract shaping in English sibilant fricatives using real-time magnetic resonance imaging
Erik Bresch, Daylen Riggs, Louis M. Goldstein, Dani Byrd, Sungbok Lee, Shrikanth S. Narayanan
Science workshop with sliding vocal-tract model
Takayuki Arai
Segmentation cues in lexical identification and in lexical acquisition: same or different?
Odile Bagou, Ulrich H. Frauenfelder
Phonological representations in poor readers
Cecile Kuijpers, Louis ten Bosch
To what extent does tagged-MRI technique allow to infer tongue muscles' activation pattern? a modelling study
Stephanie Buchaillard, Pascal Perrier, Yohan Payan
Feature adaptation of hearing-impaired lip shapes: the vowel case in the cued speech context
Noureddine Aboutabit, Denis Beautemps, Olivier Mathieu, Laurent Besacier
Automatic detection of the context of acoustic landmark deletion
Nanette Veilleux, Stefanie Shattuck-Hufnagel
Aspects of pharyngealized phonemes in Arabic using articulography
Slim Ouni
The effect of spectral tilt on infants' discrimination of fricatives
Elizabeth Beach, Christine Kitamura, Harvey Dillon, Teresa Ching, Denis Burnham
look at the shark: evaluation of student produced standardized sentences of infant- and foreigner-directed speech
Monja Knoll, Lisa Scharrer
Vocal tract inversion by cepstral analysis-by-synthesis using chain matrices
Sankaran Panchapagesan, Abeer Alwan
DC-constrained linear prediction for glottal inverse filtering
Paavo Alku, Carlo Magi, Tom Bäckström
Voicing influences the saliency of place of articulation in audio-visual speech perception in babble
Magnus Alm, Dawn Behne
Correspondence of perception and production boundaries between single and geminate stops in Japanese
Shigeaki Amano, Yukari Hirata
Inhibitory processes of Chinese spoken word recognition
Michael C. W. Yip
Article |
---|