doi: 10.21437/Interspeech.2011
ISSN: 2958-1796
Skew Gaussian mixture models for speaker recognition
Avi Matza, Yuval Bistritz
Towards goat detection in text-dependent speaker verification
Orith Toledo-Ronen, Hagai Aronowitz, Ron Hoory, Jason Pelecanos, David Nahamoo
Speaker modeling using local binary decisions
Jean-François Bonastre, Xavier Anguera, Gabriel H. Sierra, Pierre-Michel Bousquet
New developments in voice biometrics for user authentication
Hagai Aronowitz, Ron Hoory, Jason Pelecanos, David Nahamoo
Evaluation of i-vector speaker recognition systems for forensic application
Miranti Indar Mandasari, Mitchell McLaren, David A. van Leeuwen
Mixture of PLDA models in i-vector space for gender-independent speaker recognition
Mohammed Senoussaoui, Patrick Kenny, Niko Brümmer, Edward de Villiers, Pierre Dumouchel
Segregation of whispered speech interleaved with noise or speech maskers
Nandini Iyer, Douglas S. Brungart, Brian D. Simpson
Monaural azimuth localization using spectral dynamics of speech
Roi Kliper, Hendrik Kayser, Daphna Weinshall, Israel Nelken, Jörn Anemüller
Prediction of binaural intelligibility level differences in reverberation
Jan Rennies, Thomas Brand, Birger Kollmeier
Let's all speak together! exploring the impact of various languages on the comprehension of speech in multi-linguistic babble
Aurore Gautreau, Michel Hoen, Fanny Meunier
Cross-rate variation in the intelligibility of dual-rate gated speech in older listeners
Valeriy Shafiro, Stanley Sheft, Robert Risley
An efferent-inspired auditory model front-end for speech recognition
Chia-ying Lee, James Glass, Oded Ghitza
A long-term harmonic plus noise model for speech signals
Faten Ben Ali, Laurent Girin, Sonia Djaziri Larbi
A frequency domain approach to ARX-LF voiced speech parameterization and synthesis
Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle
Automatic data-driven learning of articulatory primitives from real-time MRI data using convolutive NMF with sparseness constraints
Vikram Ramanarayanan, Athanasios Katsamanis, Shrikanth Narayanan
Online pattern learning for non-negative convolutive sparse coding
Dong Wang, Ravichander Vipperla, Nicholas Evans
Sinewave representations of nonmodality
Nicolas Malyska, Thomas F. Quatieri, Robert Dunn
Time-varying signal adaptive transform and IHT recovery of compressive sensed speech
Ch. Srikanth Raj, T. V. Sreenivas
Acoustic-linguistic recognition of interest in speech with bottleneck-BLSTM nets
Martin Wöllmer, Felix Weninger, Florian Eyben, Björn Schuller
Automatic detection of anger in human-human call center dialogs
Mustafa Erden, Levent M. Arslan
Improved classification of speaking styles for mental health monitoring using phoneme dynamics
Keng-hao Chang, Howard Lei, John Canny
“you made me do it”: classification of blame in married couples' interactions by fusing automatically derived speech and language information
Matthew P. Black, Panayiotis G. Georgiou, Athanasios Katsamanis, Brian R. Baucom, Shrikanth Narayanan
Context and priming effects in the recognition of emotion of old and young listeners
Martijn Goudbeek, Marie Nilsenová
Acoustic and prosodic correlates of social behavior
Agustín Gravano, Rivka Levitan, Laura Willson, Štefan Beňuš, Julia Hirschberg, Ani Nenkova
Decision tree-based clustering with outlier detection for HMM-based speech synthesis
Kyung Hwan Oh, June Sig Sung, Doo Hwa Hong, Nam Soo Kim
Prediction of voice aperiodicity based on spectral representations in HMM speech synthesis
Hanna Silén, Elina Helander, Moncef Gabbouj
A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM
Takashi Nose, Takao Kobayashi
Multi-speaker modeling with shared prior distributions and model structures for Bayesian speech synthesis
Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Feature-space transform tying in unified acoustic-articulatory modelling for articulatory control of HMM-based speech synthesis
Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi
The effect of using normalized models in statistical speech synthesis
Matt Shannon, Heiga Zen, William Byrne
Continuous control of the degree of articulation in HMM-based speech synthesis
Benjamin Picart, Thomas Drugman, Thierry Dutoit
Estimation of window coefficients for dynamic feature extraction for HMM-based speech synthesis
Ling-Hui Chen, Yoshihiko Nankaku, Heiga Zen, Keiichi Tokuda, Zhen-Hua Ling, Li-Rong Dai
Inverse filtering based harmonic plus noise excitation model for HMM-based speech synthesis
Zhengqi Wen, Jianhua Tao
Improved HNM-based vocoder for statistical synthesizers
Daniel Erro, Iñaki Sainz, Eva Navas, Inma Hernáez
A statistical phrase/accent model for intonation modeling
Gopala Krishna Anumanchipalli, Luís C. Oliveira, Alan W. Black
Intermediate-state HMMs to capture continuously-changing signal features
Gustav Eje Henter, W. Bastiaan Kleijn
Automatic sentence selection from speech corpora including diverse speech for improved HMM-TTS synthesis quality
Norbert Braunschweiler, Sabine Buchholz
Phonological knowledge guided HMM state mapping for cross-lingual speaker adaptation
Hui Liang, John Dines
Reformulating prosodic break model into segmental HMMs and information fusion
Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet
Multipulse sequences for residual signal modeling
Ranniery Maia, Heiga Zen, Kate Knill, M. J. F. Gales, Sabine Buchholz
Can objective measures predict the intelligibility of modified HMM-based synthetic speech in noise?
Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King
Speech synthesis based on articulatory-movement HMMs with voice-source codebooks
Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada
Large-scale subjective evaluations of speech rate control methods for HMM-based speech synthesizers
Tsuneo Kato, Makoto Yamada, Nobuyuki Nishizawa, Keiichiro Oura, Keiichi Tokuda
HMM-based emphatic speech synthesis using unsupervised context labeling
Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
Restoring the residual speaker information in total variability modeling for speaker verification
Ce Zhang, Rong Zheng, Bo Xu
New developments in joint factor analysis for speaker verification
Hagai Aronowitz, Oren Barkan
Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories
Joaquin Gonzalez-Rodriguez
Discriminatively trained i-vector extractor for speaker verification
Ondřej Glembek, Lukáš Burget, Niko Brümmer, Oldřich Plchot, Pavel Matějka
Constrained cepstral speaker recognition using matched UBM and JFA training
Michelle Hewlett Sanchez, Luciana Ferrer, Elizabeth Shriberg, Andreas Stolcke
A new perspective on GMM subspace compensation based on PPCA and wiener filtering
Alan McCree, Douglas Sturim, Douglas Reynolds
Data-driven Gaussian component selection for fast GMM-based speaker verification
Ce Zhang, Rong Zheng, Bo Xu
Analysis of i-vector length normalization in speaker recognition systems
Daniel Garcia-Romero, Carol Y. Espy-Wilson
An analysis framework based on random subspace sampling for speaker verification
Weiwu Jiang, Zhifeng Li, Helen Meng
Factor analysis back ends for MLLR transforms in speaker recognition
Nicolas Scheffer, Yun Lei, Luciana Ferrer
Report on performance results in the NIST 2010 speaker recognition evaluation
Craig S. Greenberg, Alvin F. Martin, Bradford N. Barr, George R. Doddington
ivector fusion of prosodic and cepstral features for speaker verification
Marcel Kockmann, Luciana Ferrer, Lukáš Burget, Jan Černocký
i-vector based speaker recognition on short utterances
Ahilan Kanagasundaram, Robbie Vogt, David Dean, Sridha Sridharan, Michael Mason
Study of overlapped speech detection for NIST SRE summed channel speaker recognition
Hanwu Sun, Bin Ma
Super-dirichlet mixture models using differential line spectral frequencies for text-independent speaker identification
Zhanyu Ma, Arne Leijon
Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation
Hon-Bill Yu, Man-Wai Mak
Eigen-voice based anchor modeling system for speaker identification using MLLR super-vector
A. K. Sarkar, S. Umesh
Automatic detection of speaker attributes based on utterance text
Wen Wang, Andreas Kathol, Harry Bratt
Comparison of speaker recognition approaches for real applications
Sandro Cumani, Pier Domenico Batzu, Daniele Colibro, Claudio Vair, Pietro Laface, Vasileios Vasilakakis
Modeling speaker personality using voice
Tim Polzehl, Sebastian Möller, Florian Metze
Structural joint factor analysis for speaker recognition
Marc Ferràs, Koichi Shinoda, Sadaoki Furui
Acoustic forest for SMAP-based speaker verification
Sangeeta Biswas, Marc Ferràs, Koichi Shinoda, Sadaoki Furui
Mixture of auto-associative neural networks for speaker verification
G. S. V. S. Sivaram, Samuel Thomas, Hynek Hermansky
Perceptual learning of liquids
Odette Scharenborg, Holger Mitterer, James M. McQueen
The efficiency of cross-dialectal word recognition
Annelie Tuinman, Holger Mitterer, Anne Cutler
Estimation of perceptual spaces for speaker identities based on the cross-lingual discrimination task
Minoru Tsuzaki, Keiichi Tokuda, Hisashi Kawai, Jinfu Ni
The relation between perception and production in L2 phonological processing
Sharon Peperkamp, Camillia Bouchon
The role of word-initial glottal stops in recognizing English words
Maria Paola Bissiri, Maria Luisa Garcia Lecumberri, Martin Cooke, Jan Volín
Effect of language experience on the categorical perception of Cantonese vowel duration
Caicai Zhang, Gang Peng, William S.-Y. Wang
Adaptive estimation of zeros of time-varying z-transforms
C. F. Pedersen, Ove Andersen, Paul Dalsgaard
Identifying regions of non-modal phonation using features of the wavelet transform
John Kane, Christer Gobl
Acoustic analysis of whispered speech for phoneme and speaker dependency
Xing Fan, Keith W. Godin, John H. L. Hansen
Multi-party speech recovery exploiting structured sparsity models
Afsaneh Asaei, Mohammad J. Taghizadeh, Hervé Bourlard, Volkan Cevher
Modulation spectrum analysis for recognition of reverberant speech
Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky
Discrete choice models for non-intrusive quality assessment
Petko N. Petkov, W. Bastiaan Kleijn, Bert de Vries
Single channel dereverberation using example-based speech enhancement with uncertainty decoding technique
Keisuke Kinoshita, Mehrez Souden, Marc Delcroix, Tomohiro Nakatani
A statistical room impulse response model with frequency dependent reverberation time for single-microphone late reverberation suppression
Jan S. Erkelens, Richard Heusdens
An assessment of the improvement potential of time-frequency masking for speech dereverberation
Chenxi Zheng, Tiago H. Falk, Wai-Yip Chan
Perceptual improvement of a two-stage algorithm for speech dereverberation
Thiago de M. Prego, Amaro A. de Lima, Sergio L. Netto
A model-based spectral envelope wiener filter for perceptually motivated speech enhancement
Najib Hadir, Friedrich Faubel, Dietrich Klakow
Binaural noise-reduction method based on blind source separation and perceptual post processing
Jorge I. Marin-Hurtado, Devangi N. Parikh, David V. Anderson
Region dependent transform on MLP features for speech recognition
Tim Ng, Bing Zhang, Spyros Matsoukas, Long Nguyen
Discriminant sub-space projection of spectro-temporal speech features based on maximizing mutual information
Martin Heckmann, Claudius Gläser
Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
Combining frame and segment level processing via temporal pooling for phonetic classification
Sumit Chopra, Patrick Haffner, Dimitrios Dimitriadis
Improved bottleneck features using pretrained deep neural networks
Dong Yu, Michael L. Seltzer
Minimum classification error based spectro-temporal feature extraction for robust audio classification
Yuan-Fu Liao, Chia-Hsing Lin, We-Der Fang
Integrating recent MLP feature extraction techniques into TRAP architecture
František Grézl, Martin Karafiát
Feature frame stacking in RNN-based tandem ASR systems - learned vs. predefined context
Martin Wöllmer, Björn Schuller, Gerhard Rigoll
Improved acoustic feature combination for LVCSR by neural networks
Christian Plahl, Ralf Schlüter, Hermann Ney
Hierarchical tandem features for ASR in Mandarin
Joel Pinto, Mathew Magimai-Doss, Hervé Bourlard
Analysis and comparison of recent MLP features for LVCSR systems
Fabio Valente, Mathew Magimai-Doss, Wen Wang
Deep learning of speech features for improved phonetic recognition
Jaehyung Lee, Soo-Young Lee
Globality-locality consistent discriminant analysis for phone classification
Heyun Huang, Yang Liu, Jort F. Gemmeke, Louis ten Bosch, Bert Cranen, Lou Boves
Front-end compensation methods for LVCSR under lombard effect
Hynek Bořil, František Grézl, John H. L. Hansen
Classification of fricatives using feature extrapolation of acoustic-phonetic features in telephone speech
Jung-Won Lee, Jeung-Yoon Choi, Hong-Goo Kang
Noise robust feature extraction based on extended weighted linear prediction in LVCSR
Sami Keronen, Jouni Pohjalainen, Paavo Alku, Mikko Kurimo
Comparing different flavors of spectro-temporal features for ASR
Bernd T. Meyer, Suman V. Ravuri, Marc René Schädler, Nelson Morgan
VTLN in the MFCC domain: band-limited versus local interpolation
Ehsan Variani, Thomas Schaaf
Multistream bandpass modulation features for robust speech recognition
Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali
An analysis of automatic speech recognition with multiple microphones
Davide Marino, Thomas Hain
Visualization of vocal tract shape using interleaved real-time MRI of multiple scan planes
Yoon-Chul Kim, Michael Proctor, Shrikanth Narayanan, Krishna S. Nayak
Biomechanical tongue models: an approach to studying inter-speaker variability
Ralf Winkler, Susanne Fuchs, Pascal Perrier, Mark Tiede
Quantifying articulatory distinctiveness of vowels
Jun Wang, Jordan R. Green, Ashok Samal, David B. Marx
Direct estimation of articulatory kinematics from real-time magnetic resonance image sequences
Michael Proctor, Adam Lammert, Athanasios Katsamanis, Louis Goldstein, Christina Hagedorn, Shrikanth Narayanan
Combined optical distance sensing and electropalatography to measure articulation
Peter Birkholz, Christiane Neuschaefer-Rube
Simulating post-l F0 bouncing by modeling articulatory dynamics
Santitham Prom-on, Yi Xu, Fang Liu
Learning new acoustic events in an HMM-based system using MAP adaptation
Jürgen T. Geiger, Mohamed Anouar Lakhal, Björn Schuller, Gerhard Rigoll
Alternative frequency scale cepstral coefficient for robust sound event recognition
Yi Ren Leng, Huy Dat Tran, Norihide Kitaoka, Haizhou Li
Evaluation of abnormal sound detection using multi-stage GMM in various environments
Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino
Unsupervised learning of acoustic events using dynamic time warping and hierarchical k-means++ clustering
Joerg Schmalenstroeer, Markus Bartek, Reinhold Haeb-Umbach
Feature extraction assessment for an acoustic-event classification task using the entropy triangle
David Mejía-Navarrete, Ascensión Gallardo-Antolín, Carmen Peláez-Moreno, Francisco J. Valverde-Albacete
Unsupervised audio analysis for categorizing heterogeneous consumer domain videos
Pradeep Natarajan, Stavros Tsakalidis, Vasant Manohar, Rohit Prasad, Premkumar Natarajan
Enriching text-to-speech synthesis using automatic dialog act tags
Vivek Kumar Rangarajan Sridhar, Ann Syrdal, Alistair D. Conkie, Srinivas Bangalore
Joint target and join cost weight training for unit selection synthesis
Lukas Latacz, Wesley Mattheyses, Werner Verhelst
Prominence-based prosody prediction for unit selection speech synthesis
Andreas Windmann, Igor Jauk, Fabio Tamburini, Petra Wagner
Evaluating the meaning of synthesized listener vocalizations
Sathish Pammi, Marc Schröder
A hybrid TTS approach for prosody and acoustic modules
Iñaki Sainz, Daniel Erro, Eva Navas, Inma Hernáez
Uniform speech parameterization for multi-form segment synthesis
Alexander Sorin, Slava Shechtman, Vincent Pollet
Theoretical analysis of musical noise and speech distortion in structure-generalized parametric blind spatial subtraction array
Ryoichi Miyazaki, Hiroshi Saruwatari, Kiyohiro Shikano
Subjective and objective evaluation of speech intelligibility enhancement under constant energy and duration constraints
Yan Tang, Martin Cooke
A risk-estimation-based comparison of mean square error and itakura-saito distortion measures for speech enhancement
Nagarjuna Reddy Muraka, Chandra Sekhar Seelamantula
On noise tracking for noise floor estimation
Mahdi Triki
Maximum a posteriori estimation of noise from non-acoustic reference signals in very low signal-to-noise ratio environments
Ben Milner
Blind speech prior estimation for generalized minimum mean-square error short-time spectral amplitude estimator
Ryo Wakisaka, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani
Harmonic structure transform for speaker recognition
Kornel Laskowski, Qin Jin
Combining evidence from spectral and source-like features for person recognition from humming
Hemant A. Patil, Maulik C. Madhavi, Keshab K. Parhi
Improvements in speaker characterization using spectral subband energy based on harmonic plus noise model
Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo
Implicit segmentation in two-wire speaker recognition
Yosef A. Solewicz, Hagai Aronowitz
Boosting speaker recognition performance with compact representations
Sibel Yaman, Jason Pelecanos, Mohamed Kamal Omar
Partitioning of two-speaker conversation datasets
Carlos Vaquero, Alfonso Ortega, Eduardo Lleida
Intersession compensation and scoring methods in the i-vectors space for speaker recognition
Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre
Kernel alignment maximization for speaker recognition based on high-level features
Szymon Drgas, Adam Dabrowski
Kernel partial least squares for speaker recognition
Balaji Vasan Srinivasan, Daniel Garcia-Romero, Dmitry N. Zotkin, Ramani Duraiswami
Conversational-side-specific inter-session variability compensation
Mohamed Kamal Omar, Jason Pelecanos
A speaker line-up for the likelihood ratio
David A. van Leeuwen, Niko Brümmer
Towards fully Bayesian speaker recognition: integrating out the between-speaker covariance
Jesús Villalba, Niko Brümmer
Variational Bayesian model selection for GMM-speaker verification using universal background model
Timur Pekhovsky, Alexandra Lokhanova
To weight or not to weight: source-normalised LDA for speaker recognition using i-vectors
Mitchell McLaren, David A. van Leeuwen
Maximum entropy based data selection for speaker recognition
Chien-Lin Huang, Bin Ma
Addressing the data-imbalance problem in kernel-based speaker verification via utterance partitioning and speaker comparison
Wei Rao, Man-Wai Mak
Single-channel head orientation estimation based on discrimination of acoustic transfer function
Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
Maximum likelihood i-vector space using PCA for speaker verification
Zhenchun Lei, Yingchun Yang
Speaker verification using sparse representations on total variability i-vectors
Ming Li, Xiang Zhang, Yonghong Yan, Shrikanth Narayanan
Robust speaker recognition in non-stationary room environments based on empirical mode decomposition
Taufiq Hasan, John H. L. Hansen
Range based multi microphone array fusion for speaker activity detection in small meetings
Jani Even, Panikos Heracleous, Carlos T. Ishi, Norihiro Hagita
Speaker verification robust to talking style variation using multiple kernel learning based on conditional entropy minimization
Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi
Regularized logistic regression fusion for speaker verification
Ville Hautamäki, Kong Aik Lee, Tomi Kinnunen, Bin Ma, Haizhou Li
A longest matching segment approach with Bayesian adaptation - application to noise-robust speaker recognition
Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming
Data selection with kurtosis and nasality features for speaker recognition
Howard Lei, Nikki Mirghafori
Use of the harmonic phase in speaker recognition
Inma Hernáez, Ibon Saratxaga, Jon Sanchez, Eva Navas, Iker Luengo
Jaw movement in vowels and liquids forming the syllable nucleus
Štefan Beňuš, Marianne Pouplier
Coarticulation across prosodic domains in Italian: an ultrasound investigation
Barbara Gili Fivela, Antonio Stella, Sonia D'Apolito, Francesco Sigona
Investigating the stability of intergestural timing relations
Juraj Šimko, Fred Cummins, Štefan Beňuš
Speech timing organization for the phonological length contrast in Italian consonants
Claudio Zmarich, Barbara Gili Fivela, Pascal Perrier, Christophe Savariaux, Graziano Tisato
Timing in Italian VNC sequences at different speech rates
Chiara Celata, Silvia Calamai
Automatic analysis of singleton and geminate consonant articulation using real-time magnetic resonance imaging
Christina Hagedorn, Michael Proctor, Louis Goldstein
A two-stage sample-based phone boundary detector using segmental similarity features
Yih-Ru Wang
Iterative improvement of speaker segmentation in a noisy environment using high-level knowledge
Qiang Huang, Stephen J. Cox
Hierarchical audio segmentation with HMM and factor analysis in broadcast news domain
Diego Castán, Carlos Vaquero, Alfonso Ortega, David Martínez, Jesús Villalba, Eduardo Lleida
Syllable segmentation of continuous speech using auditory attention cues
Ozlem Kalinli
Exploiting phone-class specific landmarks for refinement of segment boundaries in TTS databases
Vijayaditya Peddinti, Kishore Prahallad
Phoneme-level text to audio synchronization on speech signals with background music
Agnès Pedone, Juan José Burred, Simon Maller, Pierre Leveau
Conversational speech transcription using context-dependent deep neural networks
Frank Seide, Gang Li, Dong Yu
Sequential classification criteria for NNs in automatic speech recognition
Guangsen Wang, Khe Chai Sim
Grapheme-based automatic speech recognition using KL-HMM
Mathew Magimai-Doss, Ramya Rasipuram, Guillermo Aradilla, Hervé Bourlard
Direct error rate minimization of hidden Markov models
Joseph Keshet, Chih-Chieh Cheng, Mark Stoehr, David McAllester
On the effectiveness of statistical modeling based template matching approach for continuous speech recognition
Xie Sun, Xin Chen, Yunxin Zhao
Comparison of smoothing techniques for robust context dependent acoustic modelling in hybrid NN/HMM systems
Guangsen Wang, Khe Chai Sim
Generalized Baum-welch algorithm and its implication to a new extended Baum-welch algorithm
Roger Hsiao, Tanja Schultz
Word boundary modelling and full covariance Gaussians for Arabic speech-to-text systems
F. Diehl, M. J. F. Gales, X. Liu, M. Tomalin, P. C. Woodland
A fully automated derivation of state-based eigentriphones for triphone modeling with no tied states using regularization
Tom Ko, Brian Mak
Reducing computational complexities of exemplar-based sparse representations with applications to large vocabulary speech recognition
Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky
An i-vector based approach to training data clustering for improved speech recognition
Yu Zhang, Jian Xu, Zhi-Jie Yan, Qiang Huo
Rapid training of acoustic models using graphics processing unit
Senaka Buthpitiya, Ian Lane, Jike Chong
Semi-automatic acoustic model generation from large unsynchronized audio and text chunks
Michele Alessandrini, Giorgio Biagetti, Alessandro Curzi, Claudio Turchetti
Unsupervised testing strategies for ASR
Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei
Acoustic model training with detecting transcription errors in the training data
Gakuto Kurata, Nobuyasu Itoh, Masafumi Nishimura
Towards unsupervised training of speaker independent acoustic models
Aren Jansen, Kenneth Church
Acoustic modeling with bootstrap and restructuring based on full covariance
Xiaodong Cui, Xin Chen, Jian Xue, Peder A. Olsen, John R. Hershey, Bowen Zhou
An i-vector based approach to acoustic sniffing for irrelevant variability normalization based acoustic model training and speech recognition
Jian Xu, Yu Zhang, Zhi-Jie Yan, Qiang Huo
Log-linear optimization of second-order polynomial features with subsequent dimension reduction for speech recognition
Muhammad Ali Tahir, Ralf Schlüter, Hermann Ney
Genre categorization and modeling for broadcast speech transcription
Qingqing Zhang, Lori Lamel, Jean-Luc Gauvain
Individual error minimization learning framework and its applications to speech recognition and utterance verification
Sunghwan Shin, Ho-Young Jung, Biing-Hwang Juang
Effective triphone mapping for acoustic modeling in speech recognition
Sakhia Darjaa, Miloš Cerňak, Marián Trnka, Milan Rusko, Róbert Sabo
Analysis of dialectal influence in pan-Arabic ASR
Udhyakumar Nallasamy, Michael Garbus, Florian Metze, Qin Jin, Thomas Schaaf, Tanja Schultz
Connected digit recognition by means of reservoir computing
Azarakhsh Jalalvand, Fabian Triefenbach, David Verstraeten, Jean-Pierre Martens
Large margin - minimum classification error using sum of shifted sigmoids as the loss function
Madhavi V. Ratnagiri, Biing-Hwang Juang, Lawrence Rabiner
Representing phonological features through a two-level finite state model
Javier M. Olaso, M. Inés Torres, Raquel Justo
Optimization of the Gaussian mixture model evaluation on GPU
Jan Vaněk, Jan Trmal, Josef V. Psutka, Josef Psutka
Propagation of uncertainty through multilayer perceptrons for robust automatic speech recognition
Ramón Fernandez Astudillo, João Paulo da Silva Neto
Mapping sparse representation to state likelihoods in noise-robust automatic speech recognition
Katariina Mahkonen, Antti Hurmalainen, Tuomas Virtanen, Jort F. Gemmeke
Uncertainty measures for improving exemplar-based source separation
Heikki Kallasjoki, Ulpu Remes, Jort F. Gemmeke, Tuomas Virtanen, Kalle J. Palomäki
Maximum confidence measure based interaural phase difference estimation for noise masking in dual-microphone robust speech recognition
Hsien-Cheng Liao, Yuan-Fu Liao, Chin-Hui Lee
A performance monitoring approach to fusing enhanced spectrogram channels in robust speech recognition
Shirin Badiezadegan, Richard Rose
Generalized variable parameter HMMs for noise robust speech recognition
Ning Cheng, X. Liu, Lan Wang
Sinusoidal approach for the single-channel speech separation and recognition challenge
P. Mowlaee, R. Saeidi, Zheng-Hua Tan, M. G. Christensen, Tomi Kinnunen, P. Fränti, S. H. Jensen
Semi-supervised single-channel speech-music separation for automatic speech recognition
Cemil Demir, A. Taylan Cemgil, Murat Saraçlar
A level-dependent auditory filter-bank for speech recognition in reverberant environments
HariKrishna Maganti, Marco Matassoni
A multichannel feature-based processing for robust speech recognition
Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani
Feature normalization using structured full transforms for robust speech recognition
Xiong Xiao, Jinyu Li, Eng Siong Chng, Haizhou Li
A robust estimation method of noise mixture model for noise suppression
Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani
A versatile Gaussian splitting approach to non-linear state estimation and its application to noise-robust ASR
Volker Leutnant, Alexander Krueger, Reinhold Haeb-Umbach
Generalized-log spectral mean normalization for speech recognition
Hilman F. Pardede, Koichi Shinoda
Zero-crossing-based channel attentive weighting of cepstral features for robust speech recognition: the ETRI 2011 CHiME challenge system
Young-Ik Kim, Hoon-Young Cho, Sang-Hun Kim
Feature compensation for speech recognition in severely adverse environments due to background noise and channel distortion
Wooil Kim, John H. L. Hansen
Binaural cues for fragment-based speech recognition in reverberant multisource environments
Ning Ma, Jon Barker, Heidi Christensen, Phil D. Green
Sub-band level histogram equalization for robust speech recognition
Vikas Joshi, Raghavendra Bilgi, S. Umesh, L. Garcia, C. Benitez
GMM-based missing-feature reconstruction on multi-frame windows
Ulpu Remes, Yoshihiko Nankaku, Keiichi Tokuda
Improvements of a dual-input DBN for noise robust ASR
Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves
Denoising using optimized wavelet filtering for automatic speech recognition
Randy Gomez, Tatsuya Kawahara
Noise robust speaker-independent speech recognition with invariant-integration features using power-bias subtraction
Florian Müller, Alfred Mertins
Novel VTEO based mel cepstral features for classification of normal and pathological voices
Hemant A. Patil, Pallavi N. Baljekar
Temporal performance of dysarthric patients in speech and tapping tasks
Eiji Shimura, Kazuhiko Kakehi
A comparative acoustic study on speech of glossectomy patients and normal subjects
Xinhui Zhou, Maureen Stone, Carol Y. Espy-Wilson
Dysperiodicity analysis of perceptually assessed synthetic speech stimuli
Ali Alpan, Francis Grenez, Jean Schoentgen
Is the perception of voice quality language-dependent? a comparison of French and Italian listeners and dysphonic speakers
Alain Ghio, Frédérique Weisz, Giovanna Baracca, Giovanna Cantarella, Danièle Robert, Virginie Woisard, Franco Fussi, Antoine Giovanni
Automatic selection of acoustic and non-linear dynamic features in voice signals for hypernasality detection
J. R. Orozco-Arroyave, S. Murillo-Rendón, A. M. Álvarez-Meza, J. D. Arias-Londoño, E. Delgado-Trejos, J. F. Vargas-Bonilla, C. G. Castellanos-Domínguez
Learning from mistakes: expanding pronunciation lexicons using word recognition errors
Sravana Reddy, Evandro Gouvêa
Improving non-native ASR through stochastic multilingual phoneme space transformations
David Imseng, Hervé Bourlard, John Dines, Philip N. Garner, Mathew Magimai-Doss
Unsupervised Arabic dialect adaptation with self-training
Scott Novotney, Rich Schwartz, Sanjeev Khudanpur
Template-based automatic speech recognition meets prosody
Dino Seppi, Kris Demuynck, Dirk Van Compernolle
Pronunciation learning from continuous speech
Ibrahim Badr, Ian McGraw, James Glass
State-level data borrowing for low-resource speech recognition based on subspace GMMs
Yanmin Qian, Daniel Povey, Jia Liu
Blind speech separation in multiple environments using a frequency oriented PCA method for convolutive mixtures
Y. Benabderrahmane, Sid-Ahmed Selouani, Douglas O'Shaughnessy
Blind speech separation in time-domain using block-toeplitz structure of reconstructed signal matrices
Zbyněk Koldovský, Jiří Málek, Petr Tichavský
Generalized method for solving the permutation problem in frequency-domain blind source separation of convolved speech signals
Auxiliadora Sarmiento, Iván Durán, Sergio Cruces, Pablo Aguilera
Adaptation of speaker-specific bases in non-negative matrix factorization for single channel speech-music separation
Emad M. Grais, Hakan Erdogan
An informed source separation system for speech signals
Shuhua Zhang, Laurent Girin
Adaptive blocking beamformer for speech separation
Ngoc Thuy Tran, William Cowley, André Pollok
Asynchronous multimodal text entry using speech and gesture keyboards
Per Ola Kristensson, Keith Vertanen
Robust bimodal person identification using face and speech with limited training data and corruption of both modalities
Niall McLaughlin, Ji Ming, Danny Crookes
Toward a multi-speaker visual articulatory feedback system
Atef Ben Youssef, Thomas Hueber, Pierre Badin, Gérard Bailly
Statistical mapping between articulatory and acoustic data for an ultrasound-based silent speech interface
Thomas Hueber, Elie-Laurent Benaroya, Bruce Denby, Gérard Chollet
Unsupervised geometry calibration of acoustic sensor networks using source correspondences
Joerg Schmalenstroeer, Florian Jacob, Reinhold Haeb-Umbach, Marius H. Hennecke, Gernot A. Fink
Investigations on speaking mode discrepancies in EMG-based speech recognition
Michael Wand, Matthias Janke, Tanja Schultz
Empirical evaluation and combination of advanced language modeling techniques
Tomáš Mikolov, Anoop Deoras, Stefan Kombrink, Lukáš Burget, Jan Černocký
Personalizing model M for voice-search
Geoffrey Zweig, Shuangyu Chang
Sentence selection by direct likelihood maximization for language model adaptation
Takahiro Shinozaki, Yu Kubota, Sadaoki Furui, Eiji Utsunomiya, Yasutaka Shindoh
Feature combination approaches for discriminative language models
Ebru Arısoy, Bhuvana Ramabhadran, Hong-Kwang Jeff Kuo
On-line language model biasing for multi-pass automatic speech recognition
Sankaranarayanan Ananthakrishnan, Stavros Tsakalidis, Rohit Prasad, Premkumar Natarajan
Mandarin word-character hybrid-input neural network language model
Moonyoung Kang, Tim Ng, Long Nguyen
Unary data structures for language models
Jeffrey Sorensen, Cyril Allauzen
Bayesian language model interpolation for mobile speech input
Cyril Allauzen, Michael Riley
On the estimation of discount parameters for language model smoothing
Martin Sundermeyer, Ralf Schlüter, Hermann Ney
N-grams for conditional random fields or a failure-transition(ϕ) posterior for acyclic FSTs
Patrick Lehnen, Stefan Hahn, Hermann Ney
Hybrid language models using mixed types of sub-lexical units for open vocabulary German LVCSR
M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney
Morpheme based factored language models for German LVCSR
Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney
Compound word recombination for German LVCSR
Markus Nußbaum-Thom, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney
Lattice-based risk minimization training for unsupervised language model adaptation
Akio Kobayashi, Takahiro Oku, Shinichi Homma, Toru Imai, Seiichi Nakagawa
Similarity language model
Christian Gillot, Christophe Cerisara
Data sampling and dimensionality reduction approaches for reranking ASR outputs using discriminative language models
Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın
Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition
Ryo Masumura, Seongjun Hahm, Akinori Ito
Large vocabulary SOUL neural network language models
Hai-Son Le, Ilya Oparin, Abdel Messaoudi, Alexandre Allauzen, Jean-Luc Gauvain, François Yvon
Improved spoken query transcription using co-occurrence information
Jonathan Mamou, Abhinav Sethy, Bhuvana Ramabhadran, Ron Hoory, Paul Vozila
Unsupervised latent speaker language modeling
Yik-Cheung Tam, Paul Vozila
Laryngealization and breathiness in persian
Vahid Sadeghi
Age-dependent differences in the neutralization of the intervocalic voicing contrast: evidence from an apparent-time study on east franconian
Viola Müller, Jonathan Harrington, Felicitas Kleber, Ulrich Reubold
Comparing syllable frequencies in corpora of written and spoken language
Barbara Samlowski, Bernd Möbius, Petra Wagner
Sylli: automatic phonological syllabification for Italian
Luca Iacoponi, Renata Savy
A preliminary study on the production of signs in brazilian sign language when one of the manual articulators is unavailable
André N. Xavier, Plínio A. Barbosa
Electroglottograph and acoustic cues for phonation contrasts in taiwan min falling tones
Ho-hsien Pan, Mao-Hsu Chen, Shao-Ren Lyu
One-to-many voice conversion based on tensor representation of speaker space
Daisuke Saito, Keisuke Yamamoto, Nobuaki Minematsu, Keikichi Hirose
A study on bag of Gaussian model with application to voice conversion
Yu Qiao, Tong Tong, Nobuaki Minematsu
A Bayesian approach to voice conversion based on GMMs using multiple model structures
Lei Li, Yoshihiko Nankaku, Keiichi Tokuda
Quality improvement of voice conversion systems based on trellis structured vector quantization
Mahdi Eslami, Hamid Sheikhzadeh, Abolghasem Sayadiyan
Voice conversion using GMM with enhanced global variance
Hadas Benisty, David Malah
Spectral envelope transformation using DFW and amplitude scaling for voice conversion with parallel or nonparallel corpora
Elizabeth Godoy, Olivier Rosec, Thierry Chonavel
Multi-task learning for spoken language understanding with shared slots
Xiao Li, Ye-Yi Wang, Gokhan Tur
Learning weighted entity lists from web click logs for spoken language understanding
Dustin Hillard, Asli Celikyilmaz, Dilek Hakkani-Tür, Gokhan Tur
Bootstrapping domain detection using query click logs for new domains
Dilek Hakkani-Tür, Gokhan Tur, Larry Heck, Elizabeth Shriberg
Approximate inference for domain detection in spoken language understanding
Asli Celikyilmaz, Dilek Hakkani-Tür, Gokhan Tur
Speech indexing using semantic context inference
Chien-Lin Huang, Bin Ma, Haizhou Li, Chung-Hsien Wu
Automatically optimizing utterance classification performance without human in the loop
Yun-Cheng Ju, Jasha Droppo
In search of cues discriminating West-african accents in French
Philippe Boula de Mareüil, Jean-Luc Rouas, Manuela Yapomo
Computer and human recognition of regional accents of british English
Abualsoud Hanani, Martin Russell, Michael J. Carey
Target-aware lattice rescoring for dialect recognition
Rong Tong, Bin Ma, Haizhou Li, Eng Siong Chng
Effective Arabic dialect classification using diverse phonotactic models
Murat Akbacak, Dimitra Vergyri, Andreas Stolcke, Nicolas Scheffer, Arindam Mandal
Characterizing deletion transformations across dialects using a sophisticated tying mechanism
Nancy F. Chen, Wade Shen, Joseph P. Campbell
Dialect and accent recognition using phonetic-segmentation supervectors
Fadi Biadsy, Julia Hirschberg, Daniel P. W. Ellis
The multi timescale phoneme acquisition model of the self-organizing based on the dynamic features
Kouki Miyazawa, Hideaki Miura, Hideaki Kikuchi, Reiko Mazuka
The time-course of talker-specificity effects for newly-learned pseudowords: evidence for a hybrid model of lexical representation
Helen Brown, M. Gareth Gaskell
A parametric approach to intonation acquisition research: validation on child-directed speech data
Britta Lintfert, Antje Schweitzer, Bernd Möbius
Modelling novelty preference in word learning
Maarten Versteegh, Louis ten Bosch, Lou Boves
Using imitation to learn infant-adult acoustic mappings
G. Ananthakrishnan, Giampiero Salvi
Thresholding word activations for response scoring - modelling psycholinguistic data
Christina Bergmann, Louis ten Bosch, Lou Boves
User study of spoken decision support system
Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hisashi Kawai, Satoshi Nakamura
Efficient probabilistic tracking of user goal and dialog history for spoken dialog systems
Antoine Raux, Yi Ma
Tackling a shilly-shally classifier for predicting task success in spoken dialogue interaction
Alexander Schmitt, Alexander Zgorzelski, Wolfgang Minker
Evaluation of listening-oriented dialogue control rules based on the analysis of HMMs
Toyomi Meguro, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka
Large-scale experiments on data-driven design of commercial spoken dialog systems
D. Suendermann, J. Liscombe, J. Bloom, G. Li, Roberto Pieraccini
Comparing system-driven and free dialogue in in-vehicle interaction
Fredrik Kronlid, Jessica Villing, Alexander Berman, Staffan Larsson
Optimizing situated dialogue management in unknown environments
Heriberto Cuayáhuitl, Nina Dethlefs
Acoustic-similarity based technique to improve concept recognition
Om D. Deshmukh, Shajith Ikbal, Ashish Verma, Etienne Marcheret
Dialog methods for improved alphanumeric string capture
Doug Peters, Peter Stubley
Detecting the status of a predictive incremental speech understanding model for real-time decision-making in a spoken dialogue system
David DeVault, Kenji Sagae, David Traum
User simulation in dialogue systems using inverse reinforcement learning
Senthilkumar Chandramohan, Matthieu Geist, Fabrice Lefèvre, Olivier Pietquin
Lossless value directed compression of complex user goal states for statistical spoken dialogue systems
Paul A. Crook, Oliver Lemon
Rapid evaluation of speech representations for spoken term discovery
Michael A. Carlin, Samuel Thomas, Aren Jansen, Hynek Hermansky
Phonemic similarity metrics to compare pronunciation methods
Ben Hixon, Eric Schneider, Susan L. Epstein
Investigating the effect of number of interlocutors on the quality of experience for multi-party audio conferencing
Janto Skowronek, Alexander Raake
On development of consistently punctuated speech corpora
Jáchym Kolář, Lori Lamel
A multimodal real-time MRI articulatory corpus for speech research
Shrikanth Narayanan, Erik Bresch, Prasanta Kumar Ghosh, Louis Goldstein, Athanasios Katsamanis, Yoon Kim, Adam Lammert, Michael Proctor, Vikram Ramanarayanan, Yinghua Zhu
Building an audio-visual corpus of Australian English: large corpus collection with an economical portable and replicable black box
Denis Burnham, Dominique Estival, Steven Fazio, Jette Viethen, Felicity Cox, Robert Dale, Steve Cassidy, Julien Epps, Roberto Togneri, Michael Wagner, Yuko Kinoshita, Roland Göcke, Joanne Arciuli, Marc Onslow, Trent Lewis, Andrew Butcher, John Hajek
Measurement of objective intelligibility of Japanese accented English using ERJ (English read by Japanese) database
Nobuaki Minematsu, Koji Okabe, Keisuke Ogaki, Keikichi Hirose
From single-call to multi-call quality: a study on long-term quality integration in audio-visual speech communication
Sebastian Möller, Chihuy Bang, Teele Tamme, Markus Vaalgamaa, Benjamin Weiss
Optimal selection of limited vocabulary speech corpora
Hui Lin, Jeff Bilmes
Open source multi-language audio database for spoken language processing applications
Stephen A. Zahorian, Jiang Wu, Montri Karnjanadecha, Chandra SekharVootkuri, Brian Wong, Andrew Hwang, Eldar Tokhtamyshev
The USC CARE corpus: child-psychologist interactions of children with autism spectrum disorders
Matthew P. Black, Daniel Bone, Marian E. Williams, Phillip Gorrindo, Pat Levitt, Shrikanth Narayanan
Towards a versatile multi-layered description of speech corpora using algebraic relations
Nelly Barbot, Vincent Barreaud, Olivier Boëffard, Laure Charonnat, Arnaud Delhay, Sébastien Le Maguer, Damien Lolive
Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus
Korin Richmond, Phil Hoole, Simon King
A pitch tracking corpus with evaluation on multipitch tracking scenario
Gregor Pirker, Michael Wohlmayr, Stefan Petrik, Franz Pernkopf
On building and evaluating a broadcast-news audio segmentation system
Taras Butko, Climent Nadeu
Time- and acoustic-mediated alignment algorithms for speech recognition evaluation
Simon Dobrišek, France Mihelič
Effects of shortening speech prompts of in-car voice user interfaces on users mental models
Julia Niemann, Kati Schulz, Ina Wechsung
Speech transcript evaluation for information retrieval
Laurens van der Werff, Wessel Kraaij, Franciska de Jong
The Albayzin 2010 language recognition evaluation
Luis Javier Rodriguez-Fuentes, Mikel Penagarikano, Amparo Varona, Mireia Diez, Germán Bordel
Progress and prospects for speech technology: results from three sexennial surveys
Roger K. Moore
Painless WFST cascade construction for LVCSR - transducersaurus
Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose
Data-driven UBM generation via tied Gaussians for GMM-supervector based accent identification
Rong Zheng, Ce Zhang, Bo Xu
I3a language recognition system for albayzin 2010 LRE
David Martínez, Jesús Villalba, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
Dimensionality reduction for using high-order n-grams in SVM-based phonotactic language recognition
Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-Fuentes, Germán Bordel
Language recognition via i-vectors and dimensionality reduction
Najim Dehak, Pedro A. Torres-Carrasquillo, Douglas Reynolds, Reda Dehak
Language recognition in ivectors space
David Martínez, Oldřich Plchot, Lukáš Burget, Ondřej Glembek, Pavel Matějka
On mispronunciation lexicon generation using joint-sequence multigrams in computer-aided pronunciation training (CAPT)
Xiaojun Qian, Helen Meng, Frank K. Soong
Validating a second language perception model for classroom context - a longitudinal study within the perceptual assimilation model
Bianca Sisinni, Mirko Grimaldi
The role of variability in non-native perceptual learning of a Japanese geminate-singleton fricative contrast
Makiko Sadakata, James M. McQueen
Fluency changes with general progress in L2 proficiency
Jared Bernstein, Jian Cheng, Masanori Suzuki
Tongue gestures awareness and pronunciation training
Slim Ouni
Impact of speaker variability on speech perception in non-native listeners
Wim A. van Dommelen, Valerie Hazan
Acquisition of timing patterns in second language
Mikhail Ordin, Leona Polyanskaya, Christiane Ulbrich
Context-dependent duration modeling with backoff strategy and look-up tables for pronunciation assessment and mispronunciation detection
Hongyan Li, Shen Huang, Shijin Wang, Bo Xu
Perceptual training of vowel length contrast of Japanese by L2 listeners: effects of an isolated word versus a word embedded in sentences
Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka
Similar vowels in L1/L2 production: confused or discerned in early L2 English learners with different amount of exposure
E-Chin Wu
Production and perception of estonian vowels by native and non-native speakers
Lya Meister, Einar Meister
New feature parameters for pronunciation evaluation in English presentations at international conferences
Hiroshi Kibishi, Seiichi Nakagawa
Synchronous reading: learning French orthography by audiovisual training
Gérard Bailly, Will Barbour
Phoneme level non-native pronunciation analysis by an auditory model-based native assessment scheme
Christos Koniaris, Olov Engwall
The open front vowel /æ/ in the production and perception of Czech students of English
Pavel Šturm, Radek Skarnitzl
Error selection for ASR-based English pronunciation training in `my pronunciation coach'
Catia Cucchiarini, Henk van den Heuvel, Eric Sanders, Helmer Strik
An experimental analysis of pitch patterns in Japanese speakers of English with verification by speech re-synthesis
Tomoko Nariai, Kazuyo Tanaka
An analysis of word duration in native speakers and Japanese speakers of English
Tomoko Nariai, Kazuyo Tanaka, Yoshiaki Ito
A template based voice trigger system using bhattacharyya edit distance
Evelyn Kurniawati, Samsudin Ng, Karthik Muralidhar, Sapna George
Acoustic look-ahead for more efficient decoding in LVCSR
D. Nolden, Ralf Schlüter, Hermann Ney
A new epsilon filter for efficient composition of weighted finite-state transducers
Frank Duckhorn, Matthias Wolff, Rüdiger Hoffmann
A bottom-up stepwise knowledge-integration approach to large vocabulary continuous speech recognition using weighted finite state machines
Sabato Marco Siniscalchi, Torbjørn Svendsen, Chin-Hui Lee
Combining information sources for confidence estimation with CRF models
M. S. Seigel, P. C. Woodland
Evaluation of fast spoken term detection using a suffix array
Kouichi Katsurada, Shinta Sawada, Shigeki Teshima, Yurie Iribe, Tsuneo Nitta
Event selection from phone posteriorgrams using matched filters
Keith Kintzley, Aren Jansen, Hynek Hermansky
A piecewise aggregate approximation lower-bound estimate for posteriorgram-based dynamic time warping
Yaodong Zhang, James Glass
OOV detection and recovery using hybrid models with different fragments
Long Qin, Ming Sun, Alexander Rudnicky
AUC optimization based confidence measure for keyword spotting
Haiyang Li, Jiqing Han, Tieran Zheng
An empirical study of multilingual spoken term detection
Zejun Ma, Xiaorui Wang, Bo Xu
Fusing multiple confidence measures for Chinese spoken term detection
Zejun Ma, Xiaorui Wang, Bo Xu
Response probability based decoding algorithm for large vocabulary continuous speech recognition
Zhanlei Yang, Hao Chao, Wenju Liu
Combining lattice-based language dependent and independent approaches for out-of-language detection in LVCSR
Yuxiang Shan, Yan Deng, Jia Liu
Evaluation of tree-trellis based decoding in over-million LVCSR
Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Lattice based discriminative model combination using automatically induced phonetic contexts
Hao Huang, Bing Hu Li
Predicting human perceived accuracy of ASR systems
Taniya Mishra, Andrej Ljolje, Mazin Gilbert
Cross-lingual study of ASR errors: on the role of the context in human perception of near-homophones
I. Vasilescu, D. Yahia, N. Snoeren, Martine Adda-Decker, Lori Lamel
Performance prediction of speech recognition using average-voice-based speech synthesis
Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii
Confidence measures for turkish call center conversations
Ali Haznedaroglu, Levent M. Arslan
Spoken document confidence estimation using contextual coherence
Taichi Asami, Narichika Nomoto, Satoshi Kobashikawa, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi
Latent topic modeling for audio corpus summarization
Timothy J. Hazen
Investigation of spontaneous speech characterization applied to speaker role recognition
Richard Dufour, Yannick Estève, Paul Deléglise
Zero-resource audio-only spoken term detection based on a combination of template matching techniques
Armando Muscariello, Guillaume Gravier, Frédéric Bimbot
Automatic learning in content indexing service using phonetic alignment
Yeon-Jun Kim, David C. Gibbon
Leveraging relevance cues for improved spoken document retrieval
Pei-Ning Chen, Kuan-Yu Chen, Berlin Chen
Spoken lecture summarization by random walk over a graph constructed with automatically extracted key terms
Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, Lin-shan Lee
Topic segmentation of TV-streams by mathematical morphology and vectorization
Vincent Claveau, Sébastien Lefèvre
Probabilistic latent semantic analysis for broadcast news story segmentation
Mimi Lu, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li
Hybrid speech recognition for voice search: a comparative study
Evandro Gouvêa
A new phonetic candidate generator for improving search query efficiency
Bo Peng, Yao Qian, Frank K. Soong, Bo Zhang
Towards voice-input symbolic pattern retrieval using parameter-based search
Yukiko Suzuki, Kiyoaki Aikawa
A language independent approach to audio search
Vikram Gupta, Jitendra Ajmera, Arun Kumar, Ashish Verma
Speaker diarization using a priori acoustic information
Hagai Aronowitz
Improved overlapped speech handling for speaker diarization
Kofi Boakye, Oriol Vinyals, Gerald Friedland
Exploiting intra-conversation variability for speaker diarization
Stephen Shum, Najim Dehak, Ekapol Chuangsuwanich, Douglas Reynolds, James Glass
Speaker clustering based on non-negative matrix factorization
Masafumi Nishida, Seiichi Yamamoto
Information bottleneck features for HMM/GMM speaker diarization of meetings recordings
Sree Harsha Yella, Fabio Valente
Cross likelihood ratio based speaker clustering using eigenvoice models
D. Wang, Robbie Vogt, Sridha Sridharan, David Dean
Prosodic and phonetic features for speaker clustering in speaker diarization systems
Janez Žibert, France Mihelič
Diarization-based speaker retrieval for broadcast television archives
Marijn Huijbregts, David A. van Leeuwen
The detection of overlapping speech with prosodic features for speaker diarization
Martin Zelenák, Javier Hernando
LP residual features for robust, privacy-sensitive speaker diarization
Sree Hari Krishnan Parthasarathi, Hervé Bourlard, Daniel Gatica-Perez
Extending the task of diarization to speaker attribution
Houman Ghaemmaghami, David Dean, Robbie Vogt, Sridha Sridharan
Comparing multi-stage approaches for cross-show speaker diarization
Viet-Anh Tran, Viet Bac Le, Claude Barras, Lori Lamel
A quantitative investigation of the prosody of verum focus in Italian
Giuseppina Turco, Michele Gubian, Jessamyn Schertz
Effects of focus on f_0 and duration in irish (gaelic) declaratives
Amelie Dorn, Ailbhe Ní Chasaide
The phonology and phonetics of perceived prosody: what do listeners imitate?
Jennifer Cole, Stefanie Shattuck-Hufnagel
Uncovering the effect of imitation on tonal patterns of French accentual phrases
Amandine Michelas, Noël Nguyen
Crossmodal prosodic and gestural contribution to the perception of contrastive focus
Pilar Prieto, Cecilia Pugliesi, Joan Borràs-Comes, Ernesto Arroyo, Josep Blat
Temporal relationship between auditory and visual prosodic cues
Erin Cvejic, Jeesun Kim, Chris Davis
Analysing the correspondence between automatic prosodic segmentation and syntactic structure
György Szaszák, Katalin Nagy, András Beke
Long-distance rhythmic dependencies and their application to automatic language identification
Joseph Tepperman, Emily Nava
Symbolic and direct sequential modeling of prosody for classification of speaking-style and nativeness
Andrew Rosenberg
Prosodic analysis and perception of Mandarin utterances conveying attitudes
Wentao Gu, Ting Zhang, Hiroya Fujisaki
Predicting taiwan Mandarin tone shapes from their duration
Chierh Cheng, Michele Gubian
Variation of accent type and of context - influences on pragmatic focus interpretation
Charlotte Wollermann, Ulrich Schade, Bernhard Schröder
New methods for template selection and compression in continuous speech recognition
Xie Sun, Yunxin Zhao
Structured support vector machines for noise robust continuous speech recognition
Shi-Xiong Zhang, M. J. F. Gales
Continuous digits recognition leveraging invariant structure
Masayuki Suzuki, Gakuto Kurata, Masafumi Nishimura, Nobuaki Minematsu
Convergence of line search a-function methods
Dimitri Kanevsky, David Nahamoo, Tara N. Sainath, Bhuvana Ramabhadran
Hidden boosted MMI and hierarchical state posterior feature for automatic speech recognition based on hidden conditional neural fields
Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa
Recognition and real time performances of a lightweight ultrasound based silent speech interface employing a language model
Jun Cai, Bruce Denby, Pierre Roussel, Gérard Dreyfus, Lise Crevier-Buchman
Model adaptation for automatic speech recognition based on multiple time scale evolution
Shinji Watanabe, Atsushi Nakamura, Biing-Hwang Juang
Integrated online speaker clustering and adaptation
C. Breslin, K. K. Chin, M. J. F. Gales, Kate Knill
A study on speaker normalized MLP features in LVCSR
Zoltán Tüske, Christian Plahl, Ralf Schlüter
Matrix-variate distribution of training models for robust speaker adaptation
Yongwon Jeong, Young Kuk Kim
Separating speaker and environmental variability using factored transforms
Michael L. Seltzer, Alex Acero
Your mobile virtual assistant just got smarter!
Mazin Gilbert, Iker Arizmendi, Enrico Bocchieri, Diamantino Caseiro, Vincent Goffin, Andrej Ljolje, Mike Phillips, Chao Wang, Jay Wilpon
Evaluating artificial bandwidth extension by conversational tests in car using mobile devices with integrated hands-free functionality
Laura Laaksonen, Ville Myllylä, Riitta Niemistö
Low-frequency bandwidth extension of telephone speech using sinusoidal synthesis and Gaussian mixture model
Hannu Pulakka, Ulpu Remes, Santeri Yrttiaho, Kalle J. Palomäki, Mikko Kurimo, Paavo Alku
Memory-based approximation of the Gaussian mixture model framework for bandwidth extension of narrowband speech
Amr H. Nour-Eldin, Peter Kabal
Speech enhancement by reconstruction from cleaned acoustic features
Philip Harding, Ben Milner
A soft decision-based speech enhancement using acoustic noise classification
Jae-Hun Choi, Sang-Kyun Kim, Joon-Hyuk Chang
A noise estimation method based on speech presence probability and spectral sparseness
Chao Li, Wenju Liu
Improved a posteriori speech presence probability estimation based on cepstro-temporal smoothing and time-frequency correlation
Chao Li, Wenju Liu
A rapid adaptation algorithm for tracking highly non-stationary noises based on Bayesian inference for on-line spectral change point detection
Md Foezur Rahman Chowdhury, Sid-Ahmed Selouani, Douglas O'Shaughnessy
Single channel speech enhancement using MMSE estimation of short-time modulation magnitude spectrum
Kuldip Paliwal, Belinda Schwerin, Kamil Wójcicki
Speech enhancement using masking properties in adverse environments
Atanu Saha, Tetsuya Shimamura
Phoneme-dependent NMF for speech enhancement in monaural mixtures
Bhiksha Raj, Rita Singh, Tuomas Virtanen
Kernel PCA for speech enhancement
Christina Leitner, Franz Pernkopf, Gernot Kubin
Objective intelligibility prediction of speech by combining correlation and distortion based techniques
Angel M. Gomez, Belinda Schwerin, Kuldip Paliwal
Multi-view approach for speaker turn role labeling in TV broadcast news shows
Géraldine Damnati, Delphine Charlet
Evaluation of an integrated authoring tool for building advanced question-answering characters
Sudeep Gandhe, Michael Rushforth, Priti Aggarwal, David Traum
Towards unsupervised spoken language understanding: exploiting query click logs for slot filling
Gokhan Tur, Dilek Hakkani-Tür, Dustin Hillard, Asli Celikyilmaz
Web-enhanced content retrieval for information access dialogue system
Donghyeon Lee, Cheongjae Lee, Minwoo Jeong, Kyungduk Kim, Seokhwan Kim, Junhwi Choi, Gary Geunbae Lee
Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system
Lucie Daubigney, Milica Gašić, Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin, Steve Young
Detection of task-incomplete dialogs based on utterance-and-behavior tag n-gram for spoken dialog systems
Sunao Hara, Norihide Kitaoka, Kazuya Takeda
Shrinkage-based features for natural language call routing
Ruhi Sarikaya, Stanley F. Chen, Bhuvana Ramabhadran
Clustering with modified cosine distance learned from constraints
Leonid Rachevsky, Dimitri Kanevsky, Ruhi Sarikaya, Bhuvana Ramabhadran
Using speaker ID to discover repeat callers of a spoken dialog system
Andrew Fandrianto, Brian Langner, Alan W. Black
Semantic graph clustering for POMDP-based spoken dialog systems
Florian Pinault, Fabrice Lefèvre
Learning place-names from spoken utterances and localization results by mobile robot
Ryo Taguchi, Yuji Yamada, Koosuke Hattori, Taizo Umezaki, Masahiro Hoguro, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano
Active learning for dialogue act classification
Björn Gambäck, Fredrik Olsson, Oscar Täckström
Speaker role recognition using question detection and characterization
Thierry Bazillon, Benjamin Maza, Michael Rouvier, Frederic Bechet, Alexis Nasr
Learning score structure from spoken language for a tennis game
Qiang Huang, Stephen J. Cox
Semi-automated classifier adaptation for natural language call routing
Silke M. Witt
Interactional style detection for versatile dialogue response using prosodic and semantic features
Wei-Bin Liang, Chung-Hsien Wu, Chih-Hung Wang, Jhing-Fa Wang
Quality aspects of multimodal dialog systems: identity, stimulation and success
Christine Kühnel, Benjamin Weiss, Matthias Schulz, Sebastian Möller
Where should pitch accents and phrase breaks go? a syntax tree transducer solution
Joseph Tepperman, Emily Nava
Phrasal prominences do not need pitch movements: postfocal phrasal heads in Italian
Giuliano Bocci, Cinzia Avesani
Intonation of left dislocated topics in modern greek
David Le Gac, Hiyon Yoo
Phrases, pitch and perceived prominence in māori
Laura Thompson, Catherine I. Watson, Ray Harlow, Jeanette King, Margaret Maclagan, Helen Charters, Peter Keegan
Perceptual sensitivity to prenuclear and nuclear intonational patterns
Tomáš Duběda
Tonal alignment defined: the case of southern irish English
Raya Kalaldeh
Using mutual information to identify regions of analysis for prosodic analysis
Andrew Rosenberg
Prosodic highlights in Mandarin continuous speech - cross-genre attributes and implications
Chiu-yu Tseng, Chao-yu Su, Chi-Feng Huang
When two newly-acquired words are one: new words differing in stress alone are not automatically represented differently
Simone Sulpizio, James M. McQueen
Automatic determination of the standard Chinese prosodic phrase boundaries by f_0 generation model
Shehui Bu, Zhenjie Zhuo, Lingling Yang, Shuichi Itahashi
Measuring speakers' similarity in speech by means of prosodic cues: methods and potential
Céline De Looze, Stéphane Rauzy
Tonal variations in Mandarin: new evidence from spontaneous and read speech
Li-chiung Yang
Accounting for prosodic information to improve ASR-based topic tracking for TV broadcast news
Camille Guinaudeau, Julia Hirschberg
Morpheme conversion for connecting speech recognizer and language analyzers in unsegmented languages
Kenji Imamura, Tomoko Izumi, Kugatsu Sadamitsu, Kuniko Saito, Satoshi Kobashikawa, Hirokazu Masataki
Emotion detection based on concept inference and spoken sentence analysis for customer service
Ren-Ying Fang, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu
Commas recovery with syntactic features in French and in Czech
Christophe Cerisara, Pavel Král, Claire Gardent
Redundancy reduction in ASR of spontaneous speech through statistical machine translation
Daniele Falavigna
From interview to news text: a study of taiwan TV Political interviews in newspaper reports
Chin-Chih Chiang
On the use of multimodal cues for the prediction of degrees of involvement in spontaneous conversation
Catharine Oertel, Stefan Scherer, Nick Campbell
Anger recognition in spoken dialog using linguistic and para-linguistic information
Narichika Nomoto, Masafumi Tamoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi
Recognition of personality traits from human spoken conversations
A. V. Ivanov, G. Riccardi, A. J. Sporka, J. Franc
Using multiple databases for training in emotion recognition: to unite or to vote?
Björn Schuller, Zixing Zhang, Felix Weninger, Gerhard Rigoll
“would you buy a car from me?” - on the likability of telephone voices
Felix Burkhardt, Björn Schuller, Benjamin Weiss, Felix Weninger
Automatic identification of salient acoustic instances in couples' behavioral interactions using diverse density support vector machines
James Gibson, Athanasios Katsamanis, Matthew P. Black, Shrikanth Narayanan
Predicting speaker changes and listener responses with and without eye-contact
Daniel Neiberg, Joakim Gustafson
Emotion classification using inter- and intra-subband energy variation
Senaka Amarakeerthi, Tin Lay Nwe, Liyanage C. De Silva, Michael Cohen
Emotion classification of infants' cries using duration ratios of acoustic segments
K. Kitahara, S. Michiwiki, M. Sato, S. Matsunaga, M. Yamashita, K. Shinohara
Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions
Bogdan Vlasenko, Dmytro Prylipko, David Philippou-Hübner, Andreas Wendemuth
Intra-, inter-, and cross-cultural classification of vocal affect
Daniel Neiberg, Petri Laukka, Hillary Anger Elfenbein
Verifying human users in speech-based interactions
Sajad Shirali-Shahreza, Yashar Ganjali, Ravin Balakrishnan
Automatic assessment of prosody in high-stakes English tests
Jian Cheng
Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus
Dean Luo, Xuesong Yang, Lan Wang
Off-topic detection in automated speech assessment applications
Jian Cheng, Jianqiang Shen
Towards context-dependent phonetic spelling error correction in children's freely composed text for diagnostic and pedagogical purposes
Sebastian Stüker, Johanna Fay, Kay Berkling
Factored translation models for improving a speech into sign language translation system
V. López-Ludeña, R. San-Segundo, R. Córdoba, J. Ferreiros, J. M. Montero, J. M. Pardo
Formant maps in Hungarian vowels - online data inventory for research, and education
Kálmán Abari, Zsuzsanna Zsófia Rácz, Gábor Olaszy
Automatic subtitling of the basque parliament plenary sessions videos
Germán Bordel, Silvia Nieto, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, Amparo Varona
Generating animated pronunciation from speech through articulatory feature extraction
Yurie Iribe, Silasak Manosavanh, Kouichi Katsurada, Ryoko Hayashi, Chunyue Zhu, Tsuneo Nitta
A tale of two tasks: detecting children's off-task speech in a reading tutor
Wei Chen, Jack Mostow
Problems encountered by Japanese EL2 with English short vowels as illustrated on a 3d vowel chart
Toshiko Isei-Jaakkola, Takatoshi Naka, Keikichi Hirose
Automatic generation of listening comprehension learning material in european portuguese
Thomas Pellegrini, Rui Correia, Isabel Trancoso, Jorge Baptista, Nuno Mamede
Candidate generation for ASR output error correction using a context-dependent syllable cluster-based confusion matrix
Chao-Hong Liu, Chung-Hsien Wu, David Sarwono, Jhing-Fa Wang
Semi-supervised tree support vector machine for online cough recognition
Thai Hoa Huynh, Vu An Tran, Huy Dat Tran
Monaural voiced speech segregation based on pitch and comb filter
Xueliang Zhang, Wenju Liu
Fast and simple iterative algorithm of lp-norm minimization for under-determined speech separation
Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
Monaural speech separation based on a 2d processing and harmonic analysis
Azam Rabiee, Saeed Setayeshi, Soo-Young Lee
Underdetermined blind source separation with fuzzy clustering for arbitrarily arranged sensors
Ingrid Jafari, Serajul Haque, Roberto Togneri, Sven Nordholm
On initial seed selection for frequency domain blind speech separation
Dang Hai Tran Vu, Reinhold Haeb-Umbach
Spatial filter calibration based on minimization of modified LSD
Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi
Probabilistic spectrum envelope: categorized audio-features representation for NMF-based sound decomposition
Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki
A high resolution multiple source localization based on generalized cumulant structure (GCS) matrix
Jinho Choi, Chang D. Yoo
Single channel speech music separation using nonnegative matrix factorization with sliding windows and spectral masks
Emad M. Grais, Hakan Erdogan
Perceptually-inspired processing for multichannel Wiener filter
Jorge I. Marin-Hurtado, David V. Anderson
Speech recognition in mixed sound of speech and music based on vector quantization and non-negative matrix factorization
Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa
Reduction of highly nonstationary ambient noise by integrating spectral and locational characteristics of speech and noise for robust ASR
Tomohiro Nakatani, Shoko Araki, Marc Delcroix, Takuya Yoshioka, Masakiyo Fujimoto
Voice processing by dynamic glottal models with applications to speech enhancement
Carlo Drioli, Andrea Calanca
Supervised sparse coding strategy in cochlear implants
Jinqiu Sang, Guoping Li, Hongmei Hu, Mark E. Lutman, Stefan Bleeck
Chinese and Italian speech rhythm: normalization and the CCI algorithm
Chiara Bertini, Pier Marco Bertinetto, Na Zhi
Rhythm metrics on syllables and feet do not work as expected
Paolo Mairano, Antonio Romano
Applying rhythm features to automatically assess non-native speech
Lei Chen, Klaus Zechner
Prosodic synchrony in co-operative task-based dialogues: a measure of agreement and disagreement
Brian Vaughan
Low and high, short and long by crook or by hook?
Oliver Niebuhr, Astrid Wolf
Estimating speaking rate by means of rhythmicity parameters
Christian Heinrich, Florian Schiel
Comparing word and syllable prominence rated by naïve listeners
Denis Arnold, Bernd Möbius, Petra Wagner
L1/L2 perception of lexical stress with F0 peak-delay: effect of an extra syllable added
Shinichi Tokuma, Yi Xu
Letter-to-phoneme conversion based on two-stage neural network focusing on letter and phoneme contexts
Kheang Seng, Yurie Iribe, Tsuneo Nitta
An international English speech corpus for longitudinal study of accent development
Rosemary Orr, Hugo Quené, Roeland van Beek, Thari Diefenbach, David A. van Leeuwen, Marijn Huijbregts
A corpus-based study of English pronunciation variations
Sunhee Kim, Kyuwhan Lee, Minhwa Chung
Long term average speech spectra in yolngu matha and pitjantjatjara speaking females and males
Hywel Stoakes, Andrew Butcher, Janet Fletcher, Marija Tabain
Context and speaker dependency in the relation of vowel formants and subglottal resonances - evidence from Hungarian
Tekla Etelka Gráczi, Steven M. Lulich, Tamás Gábor Csapó, András Beke
Fundamental frequency estimation using modified higher order moments and multiple windows
Alipah Pawi, Saeed Vaseghi, Ben Milner, Seyed Ghorshi
EM-based gain adaptation for probabilistic multipitch tracking
Michael Wohlmayr, Franz Pernkopf
Joint robust voicing detection and pitch estimation based on residual harmonics
Thomas Drugman, Abeer Alwan
Epoch extraction in high pass filtered speech using hilbert envelope
D. Govind, S. R. M. Prasanna, Debadatta Pati
Robust HNR-based closed-loop pitch and harmonic parameters estimation
Alexander Pavlovets, Alexander Petrovsky
Exploring bessel features for detection of glottal closure instants
Chetana Prakash, Dhananjaya N., Suryakanth V. Gangashetty
Evaluation of glottal epoch detection algorithms on different voice types
João P. Cabral, John Kane, Christer Gobl, Julie Carson-Berndsen
A divide et impera algorithm for optimal pitch stylization
Antonio Origlia, Giovanni Abete, Francesco Cutugno, Iolanda Alfano, Renata Savy, Bogdan Ludusan
Singing voice analysis using relative harmonic delays
Ricardo Sousa, Aníbal Ferreira
Singing voice synthesis: singer-dependent vibrato modeling and coherent processing of spectral envelope
S. W. Lee, Minghui Dong
Chorus digitalis: experiments in chironomic choir singing
Sylvain Le Beux, Lionel Feugère, Christophe d'Alessandro
Prominence model for prosodic features in automatic lexical stress and pitch accent detection
Kun Li, Shuang Zhang, Mingxing Li, Wai-Kit Lo, Helen Meng
Hierarchical stress modeling in Mandarin text-to-speech
Ya Li, Jianhua Tao, Xiaoying Xu
Automatic prosodic events detection by using syllable-based acoustic, lexical and syntactic features
Chong-Jia Ni, Wenju Liu, Bo Xu
Using dynamic time warping to compute prosodic similarity measures
Albert Rilliard, Alexandre Allauzen, Philippe Boula de Mareüil
Applying the quantitative target approximation model (qTA) to German and brazilian portuguese
Plínio A. Barbosa, Hansjörg Mixdorff, Sandra Madureira
Stylization and trajectory modelling of short and long term speech prosody variations
Nicolas Obin, Anne Lacheret, Xavier Rodet
Toward a continuous modeling of French prosodic structure: using acoustic features to predict prominence location and prominence degree
Mathieu Avanzi, Nicolas Obin, Anne Lacheret-Dujour, Bernard Victorri
Optimal models of prosodic prominence using the Bayesian information criterion
Tim Mahrt, Jui-Ting Huang, Yoonsook Mo, Margaret Fleck, Mark Hasegawa-Johnson, Jennifer Cole
Quantitative analysis of tone coarticulation in Mandarin
Hussein Hussein, Hansjörg Mixdorff, Hue San Do, Rüdiger Hoffmann
Tracking pitch contours using minimum jerk trajectories
Daniel Neiberg, G. Ananthakrishnan, Joakim Gustafson
On the use of linguistic features in an automatic system for speech analytics of telephone conversations
Benjamin Maza, Marc El-Beze, Georges Linares, Renato De Mori
Determining what questions to ask, with the help of spectral graph theory
Abe Kazemzadeh, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth Narayanan
`are you sure you're paying attention?' - `uh-huh' communicating understanding as a marker of attentiveness
Hendrik Buschmeier, Zofia Malisz, Marcin Włodarczak, Stefan Kopp, Petra Wagner
Projectability of transition-relevance places using prosodic features in Japanese spontaneous conversation
Yuichi Ishimoto, Mika Enomoto, Hitoshi Iida
Measuring final lengthening for speaker-change prediction
Anna Hjalmarsson, Kornel Laskowski
Incremental learning and forgetting in stochastic turn-taking models
Kornel Laskowski, Jens Edlund, Mattias Heldner
Reinforcement learning of argumentation dialogue Policies in negotiation
Kallirroi Georgila, David Traum
Topic switching strategies for spoken dialogue systems
Tobias Heinroth, Savina Koleva, Wolfgang Minker
Unsupervised clustering of utterances using non-parametric Bayesian methods
Ryuichiro Higashinaka, Noriaki Kawamae, Kugatsu Sadamitsu, Yasuhiro Minami, Toyomi Meguro, Kohji Dohsaka, Hirohito Inagaki
OOV sensitive named-entity recognition in speech
Carolina Parada, Mark Dredze, Frederick Jelinek
Speech translation with grammar driven probabilistic phrasal bilexica extraction
Markus Saers, Dekai Wu, Chi-kiu Lo, Karteek Addanki
An efficient unified extraction algorithm for bilingual data
Christoph Tillmann, Sanjika Hewavitharana
Using features from topic models to alleviate over-generation in hierarchical phrase-based translation
Songfang Huang, Bowen Zhou
An empirical study on improving hierarchical phrase-based translation using alignment features
Songfang Huang, Bowen Zhou
Robust speech translation by domain adaptation
Xiaodong He, Li Deng
Enhancements to the training process of classifier-based speech translator via topic modeling
Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth Narayanan
A scalable approach to building a parallel corpus from the web
Vivek Kumar Rangarajan Sridhar, Luciano Barbosa, Srinivas Bangalore
Spoken term detection results using plural subword models by estimating detection performance for each query
Yoshiaki Itoh, Kohei Iwata, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee
Speechforms: from web to speech and back
Luciano Barbosa, Diamantino Caseiro, Giuseppe Di Fabbrizio, Amanda Stent
Image processing filters for line detection-based spoken term detection
Kazuyuki Noritake, Hiroaki Nanjo, Takehiko Yoshimi
Using latent topic features for named entity extraction in search queries
Joe Polifroni, François Mairesse
Language model expansion using webdata for spoken document retrieval
Ryo Masumura, Seongjun Hahm, Akinori Ito
Effects of query expansion for spoken document passage retrieval
Tomoyosi Akiba, Koichiro Honda
Unsupervised hidden Markov modeling of spoken queries for spoken term detection without speech recognition
Chun-an Chan, Lin-shan Lee
Topic identification from audio recordings using rich recognition results and neural network based classifiers
Roberto Gemello, Franco Mana, Pier Domenico Batzu
A grammar based approach to style specific phrase prediction
Alok Parlikar, Alan W. Black
Unsupervised features from text for speech synthesis in a speech-to-speech translation system
Oliver Watts, Bowen Zhou
Unsupervised continuous-valued word features for phrase-break prediction without a part-of-speech tagger
Oliver Watts, Junichi Yamagishi, Simon King
Albayzín 2010: a Spanish text to speech evaluation
Francisco Campillo, Francisco Méndez, Montserrat Arza, Laura Docío, Antonio Bonafonte, Eva Navas, Iñaki Sainz
Combining active and semi-supervised learning for homograph disambiguation in Mandarin text-to-speech synthesis
Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai
Automatically creating a diphone set from a speech database
Thomas Ewender, Beat Pfister
Automatic viseme clustering for audiovisual speech synthesis
Wesley Mattheyses, Lukas Latacz, Werner Verhelst
Perceptual quality dimensions of text-to-speech systems
Florian Hinterleitner, Sebastian Möller, Christoph Norrenbrock, Ulrich Heute
A pointwise approach to pronunciation estimation for a TTS front-end
Shinsuke Mori, Graham Neubig
Correlating text with prosody
Mohamed Abou-Zleikha, Julie Carson-Berndsen
“what is… dengue fever?” - modeling and predicting pronunciation errors in a text-to-speech system
Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran
Aperiodicity analysis for quality estimation of text-to-speech signals
Christoph Norrenbrock, Ulrich Heute, Florian Hinterleitner, Sebastian Möller
Parallels in infants' attention to speech articulation and to physical changes in speech-unrelated objects
Eeva Klintfors, Ellen Marklund, Francisco Lacerda
Speech events are recoverable from unlabeled articulatory data: using an unsupervised clustering approach on data obtained from electromagnetic midsaggital articulography (EMA)
Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Hinrich Schütze
Children's recognition of their own voice: influence of phonological impairment
Sofia Strömbergsson
Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of speaker discrimination information
Takayuki Kagomiya, Seiji Nakagawa
Impact of different feedback mechanisms in EMG-based speech recognition
Christian Herff, Matthias Janke, Michael Wand, Tanja Schultz
Phonotactic constraints and the segmentation of Cantonese speech
Michael C. W. Yip
Reaction time and decision difficulty in the perception of intonation
Katrin Schneider, Grzegorz Dogil, Bernd Möbius
Processing of stress related acoustic cues as indexed by ERPs
Ferenc Honbolygó, Valéria Csépe
On the relationship between perceived accentedness, acoustic similarity, and processing difficulty in foreign-accented speech
Marijt J. Witteman, Andrea Weber, James M. McQueen
The perception boundary between single and geminate stops in 3- and 4-mora Japanese words
Shigeaki Amano, Yukari Hirata
Correlation analysis of acoustic features with perceptual voice quality similarity for similar speaker selection
Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno
Pointing gestures do not influence the perception of lexical stress
Alexandra Jesse, Holger Mitterer
Relationships between phonetic features and speech perception - a statistical investigation from a large anechoic british English corpus
Ian R. Cushing, Francis F. Li, Ken Worrall, Tim Jackson
The representation of speech in a nonlinear auditory model: time-domain analysis of simulated auditory-nerve firing patterns
Guy J. Brown, Tim Jürgens, Ray Meddis, Matthew Robertson, Nicholas R. Clark
An automatic voice pleasantness classification system based on prosodic and acoustic patterns of voice preference
Luis Coelho, Daniela Braga, Miguel Sales-Dias, Carmen Garcia-Mateo
Contributions of F1 and F2 (F2') to the perception of plosive consonants
René Carré, Pierre Divenyi, Willy Serniclaes, Emmanuel Ferragne, Egidio Marsico, Viet-Son Nguyen
Auditory speech processing is affected by visual speech in the periphery
Jeesun Kim, Chris Davis
Visual speech speeds up auditory identification responses
Tim Paris, Jeesun Kim, Chris Davis
Agglomerative hierarchical clustering of emotions in speech based on subjective relative similarity
Ryoichi Takashima, Tohru Nagano, Ryuki Tachibana, Masafumi Nishimura
Optimal syllabic rates and processing units in perceiving Mandarin spoken sentences
Guangting Mai, Gang Peng
Cross-lingual speaker discrimination using natural and synthetic speech
Mirjam Wester, Hui Liang
Can audio-visual speech recognition outperform acoustically enhanced speech recognition in automotive environment?
Rajitha Navarathna, Tristan Kleinschmidt, David Dean, Sridha Sridharan, Patrick Lucey
A multimodal approach to dictation of handwritten historical documents
Vicent Alabau, Verónica Romero, Antonio-L. Lagarda, Carlos-D. Martínez-Hinarejos
Weight optimization for bimodal unit-selection talking head synthesis
Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte
Modality selection and perceived mental effort in a mobile application
Stefan Schaffer, Benjamin Jöckel, Ina Wechsung, Robert Schleicher, Sebastian Möller
A cross-lingual spoken content search system
Jitendra Ajmera, Ashish Verma
Nemo: a platform for multilingual news monitoring
C. Girardi, Roberto Gretter, Daniele Falavigna, Fabio Brugnara, Diego Giuliani, M. Federico
Unsupervised learning of acoustic unit descriptors for audio content representation and classification
Sourish Chaudhuri, Mark Harvilla, Bhiksha Raj
Conditioned hidden Markov model fusion for multimodal classification
Michael Glodek, Stefan Scherer, Friedhelm Schwenker
Distant speech recognition in a smart home: comparison of several multisource ASRs in realistic conditions
Benjamin Lecouteux, Michel Vacher, François Portet
A robust approach to mining repeated sequence in audio stream
Jiansong Chen, Lei Zhu, Bailan Feng, Peng Ding, Bo Xu
Accelerated parallelizable neural network learning algorithm for speech recognition
Dong Yu, Li Deng
Deep convex net: a scalable architecture for speech pattern classification
Li Deng, Dong Yu
Modeling broad context for tone recognition with conditional random fields
Siwei Wang, Gina-Anne Levow
Improved tonal language speech recognition by integrating spectro-temporal evidence and pitch information with properly chosen tonal acoustic units
Shang-wen Li, Yow-bang Wang, Liang-che Sun, Lin-shan Lee
Kullback-leibler divergence-based ASR training data selection
Evandro Gouvêa, Marelie H. Davel
Articulatory feature classification using nearest neighbors
Arild Brandrud Næss, Karen Livescu, Rohit Prabhavalkar
Continuous episodic memory based speech recognition using articulatory dynamics
Sébastien Demange, Slim Ouni
Graphone model interpolation and Arabic pronunciation generation
T. Li, P. C. Woodland, F. Diehl, M. J. F. Gales
Grapheme-to-phoneme conversion using conditional random fields
Irina Illina, Dominique Fohr, Denis Jouvet
Bilingual acoustic model adaptation by unit merging on different levels and cross-level integration
Ching-Feng Yeh, Chao-Yu Huang, Lin-shan Lee
A qualitative evaluation of phoneme-to-phoneme technology
Marijn Schraagen, Gerrit Bloothooft
Cheap bootstrap of multi-lingual hidden Markov models
Daniele Falavigna, Roberto Gretter
Adaptive stream fusion in multistream recognition of speech
Nima Mesgarani, Samuel Thomas, Hynek Hermansky
Unsupervised audio patterns discovery using HMM-based self-organized units
Man-hung Siu, Herbert Gish, Steve Lowe, Arthur Chan
Nearest neighbors with learned distances for phonetic frame classification
John Labiak, Karen Livescu
Stop consonant recognition by temporal fine structure of burst
Seppo Fagerlund, Unto K. Laine
Phonetic classification using controlled random walks
Katrin Kirchhoff, Andrei Alexandrescu
Keyphrase cloud generation of broadcast news
Luís Marujo, Márcio Viveiros, João Paulo da Silva Neto
Optimized feature extraction and HMMs in subword detectors
Alfonso M. Canterla, Magne H. Johnsen
Real-world speech/non-speech audio classification based on sparse representation features and GPCs
Ziqiang Shi, Jiqing Han, Tieran Zheng
Privacy preserving speaker verification using adapted GMMs
Manas A. Pathak, Bhiksha Raj
Clustering expressive speech styles in audiobooks using glottal source parameters
Éva Székely, João P. Cabral, Peter Cahill, Julie Carson-Berndsen
On the use of the rhythmogram for automatic syllabic prominence detection
Bogdan Ludusan, Antonio Origlia, Francesco Cutugno
Speech modulation features for robust nonnative speech accent detection
Sethserey Sam, Xiong Xiao, Laurent Besacier, Eric Castelli, Haizhou Li, Eng Siong Chng
Frame-level vocal effort likelihood space modeling for improved whisper-island detection
Chi Zhang, John H. L. Hansen
Speaker identification for whispered speech using a training feature transformation from neutral to whisper
Xing Fan, John H. L. Hansen
An accurate and robust gender identification algorithm
Andrea DeMarco, Stephen J. Cox
Deep belief networks for automatic music genre classification
Xiaohong Yang, Qingcai Chen, Shusen Zhou, Xiaolong Wang
Image representation of the subband power distribution for robust sound classification
Jonathan Dennis, Huy Dat Tran, Haizhou Li
Acoustic and visual cues of turn-taking dynamics in dyadic interactions
Bo Xiao, Viktor Rozgić, Athanasios Katsamanis, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth Narayanan
Robust audio fingerprinting based on local spectral luminance maxima scheme
Yong-zhe Shi, Wei-Qiang Zhang, Jia Liu
Entropy-rate driven inference of stochastic grammars
Unto K. Laine
An efficient pre-processing scheme to improve the sound source localization system in noisy environment
Sheng-Chieh Lee, K. Bharanitharan, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu, Min-Jian Liao
A study on auditory feature spaces for speech-driven lip animation
Guylaine Le-Jan, Yannick Benezeth, Guillaume Gravier, Frédéric Bimbot
Phase-only speech reconstruction using very short frames
Erfan Loweimi, Seyed Mohammad Ahadi, Hamid Sheikhzadeh
Frequency-warped and stabilized time-varying cepstral coefficients
Trond Skogstad, Torbjørn Svendsen
Using human perception for automatic accent assessment
Freddy William, Abhijeet Sangwan, John H. L. Hansen
A study of the effectiveness of articulatory strokes for phonemic recognition
Carlos Molina, Sungbok Lee, Shrikanth Narayanan, Néstor Becerra Yoma
Auditory filterbank improves voice morphing
Erika Okamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara
Monaural sound localization
Anna Katharina Fuchs, Christian Feldbauer, Michael Stark
Dual-mode AVQ coding based on spectral masking and sparseness detection for ITU-t g.711.1/g.722 super-wideband extensions
Masahiro Fukui, Shigeaki Sasaki, Yusuke Hiwasaki, Kurihara Sachiko, Yoichi Haneda
Phone impact based speech transmission technique for reliable speech recognition in poor wireless network conditions
Azar Taufique, Kumaran Vijayasankar, Wooil Kim, John H. L. Hansen, Marco Tacca, Andrea Fumagalli
Automatic speech codec identification with applications to tampering detection of speech recordings
Jingting Zhou, Daniel Garcia-Romero, Carol Y. Espy-Wilson
A hybrid quasi-harmonic/CELP wideband speech coding scheme for unit selection TTS synthesis
Chang-Heon Lee, Olivier Rosec, Yannis Stylianou
Voice quality characterization of IETF opus codec
Anssi Rämö, Henri Toukomaa
Leja ordering LSFs for accurate estimation of predictor coefficients
C. F. Pedersen
Improved quality for conversational voIP using path diversity
Qipeng Gong, Peter Kabal
Tree encoding for the ITU-t g.711.1 speech coder
Abdul Hannan Khan, Peter Kabal
Parallel and hierarchical decision making for sparse coding in speech recognition
Dong Wang, Ravichander Vipperla, Nicholas Evans
A new model-based Mandarin-speech coding system
Chen-Yu Chiang, Jyh-Her Yang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horn Chen
Using unsupervised feature-based speaker adaptation for improved transcription of spoken archives
Petr Cerva, Karel Palecek, Jan Silovsky, Jan Nouza
Online speaker adaptation with pre-computed FMLLR transformations
Volker Fischer, Siegfried Kunzmann
Instantaneous speaker adaptation through selection and combination of fMLLR transformation matrices
Diego Giuliani, Fabio Brugnara
Joint bilinear transformation space based maximum a posteriori linear regression adaptation using prior with variance function
Hwa Jeon Song, Yunkeun Lee, Hyung Soon Kim
A study on combining VTLN and SAT to improve the performance of automatic speech recognition
D. R. Sanand, Mikko Kurimo
Incorporating regional information to enhance MAP-based stochastic feature compensation for robust speech recognition
Yu Tsao, Paul R. Dixon, Chiori Hori, Hisashi Kawai
A study on the effect of pitch on LPCC and PLPC features for children's ASR in comparison to MFCC
Shweta Ghai, Rohit Sinha
About handling boundary uncertainty in a speaking rate dependent modeling approach
Denis Jouvet, Dominique Fohr, Irina Illina
An active learning approach to task adaptation
Ji Wu, Zhiyang He, Ping Lv
Efficient speaker and noise normalization for robust speech recognition
Vikas Joshi, Raghavendra Bilgi, S. Umesh, C. Benitez, L. Garcia
How realistic is artificially added noise?
Thomas Winkler
Voice activity detection in MTF-based power envelope restoration
Masashi Unoki, Xugang Lu, Rico Petrick, Shota Morita, Masato Akagi, Rüdiger Hoffmann
Using spectral fluctuation of speech in multi-feature HMM-based voice activity detection
Miquel Espi, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama
Linear dynamic models for voice activity detection
Kannu Mehta, Chau Khoa Pham, Eng Siong Chng
Detection of shouted speech in the presence of ambient noise
Jouni Pohjalainen, Tuomo Raitio, Paavo Alku
Breath-detection-based telephony speech phrasing
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
Multi-channel voice activity detection based on conic constraints
Gibak Kim
Multi-sensor voice activity detection based on multiple observation hypothesis testing
Theodoros Petsatodis, Fotios Talantzis, Christos Boukis, Zheng-Hua Tan, Ramjee Prasad
Online speech activity detection in broadcast news
Chao Gao, Guruprasad Saikumar, Saurabh Khanwalkar, Avi Herscovici, Anoop Kumar, Amit Srivastava, Premkumar Natarajan
A real-time speech command detector for a smart control room
Daniel Reich, Felix Putze, Dominic Heger, Joris Ijsselmuiden, Rainer Stiefelhagen, Tanja Schultz
Robust voice activity detector for real world applications using harmonicity and modulation frequency
Ekapol Chuangsuwanich, James Glass
On noise robust voice activity detection
Tomas Dekens, Werner Verhelst
Adaptive regularization framework for robust voice activity detection
Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura
On the use of extended context for HMM-based spontaneous conversational speech synthesis
Tomoki Koriyama, Takashi Nose, Takao Kobayashi
Predicting tongue positions from acoustics and facial features
Asterios Toutios, Slim Ouni
Assessing acoustic reduction: exploiting local structure in speech
Louis ten Bosch, Annika Hämäläinen, Mirjam Ernestus
The “fortis-lenis” distinction in Bulgarian and German
Bistra Andreeva, Magdalena Wolska
Acoustic correlates of glottal gaps
Gang Chen, Jody Kreiman, Yen-Liang Shue, Abeer Alwan
Using a genetic algorithm to estimate parameters of a coarticulation model
Brian O. Bush, John-Paul Hosom, Alexander Kain, Akiko Amano-Kusumoto
Synthesis of breathy, normal, and pressed phonation using a two-mass model with a triangular glottis
Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube
Analysis of inter-articulator correlation in acoustic-to-articulatory inversion using generalized smoothness criterion
Prasanta Kumar Ghosh, Shrikanth Narayanan
Frequency-domain representation of source-filter coupling and its effect in the production of voice
Tokihiko Kaburagi
Method for speech inversion with large scale statistical evaluation
Heikki Rasilo, Unto K. Laine, Okko Räsänen, Toomas Altosaar
Italian in the no-man's land between stress-timing and syllable-timing? speakers are more stress-timed than listeners
Bettina Braun, Sabine Geiselmann
The lombard effect in spontaneous dialog speech
Laura Folk, Florian Schiel
Gaussian process experts for voice conversion
Nicholas C. V. Pilkington, Heiga Zen, M. J. F. Gales
Intonation conversion from neutral to expressive speech
Christophe Veaux, Xavier Rodet
Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speech-to-speech translation
Nobuhiko Hattori, Tomoki Toda, Hisashi Kawai, Hiroshi Saruwatari, Kiyohiro Shikano
Adding glottal source information to intra-lingual voice conversion
Javier Pérez, Antonio Bonafonte
Formant-controlled HMM-based speech synthesis
Ming Lei, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, Li-Rong Dai
Analysis of HMM-based lombard speech synthesis
Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku
Discrete/continuous modelling of speaking style in HMM-based speech synthesis: design and evaluation
Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet
Factored MLLR adaptation for singing voice generation
June Sig Sung, Doo Hwa Hong, Shin Jae Kang, Nam Soo Kim
Adaptation of prosody in speech synthesis by changing command values of the generation process model of fundamental frequency
Keikichi Hirose, Keiko Ochi, Ryusuke Mihara, Hiroya Hashimoto, Daisuke Saito, Nobuaki Minematsu
Prosody conversion for emotional Mandarin speech synthesis using the tone nucleus model
Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu
Rapid adaptation of foreign-accented HMM-based speech synthesis
Reima Karhila, Mirjam Wester
The effects of phoneme errors in speaker adaptation for HMM speech synthesis
Bálint Tóth, Tibor Fegyó, Géza Németh
Articulatory reduction in Mandarin Chinese words
Jeffrey Berry, Sunjing Ji, Ian Fasel, Diana Archangeli
Morphological variation in the adult vocal tract: a modeling study of its potential acoustic impact
Adam Lammert, Michael Proctor, Athanasios Katsamanis, Shrikanth Narayanan
Analysis and automatic estimation of children's subglottal resonances
Steven M. Lulich, Harish Arsikere, John R. Morton, Gary K. F. Leung, Abeer Alwan, Mitchell S. Sommers
Acceleration sensor based estimates of subglottal resonances: short vs. long vowels
Wolfgang Wokurek, Andreas Madsack
Comparison of nasalance measurements from accelerometers and microphones and preliminary development of novel features
Nicolas Audibert, Angélique Amelot
The effect of seeing the interlocutor on speech production in different noise types
Michael Fitzpatrick, Jeesun Kim, Chris Davis
Conversing in the presence of a competing conversation: effects on speech production
Vincent Aubanel, Martin Cooke, Julián Villegas, Maria Luisa Garcia Lecumberri
Very short utterances and timing in turn-taking
Mattias Heldner, Jens Edlund, Anna Hjalmarsson, Kornel Laskowski
Validating rt-MRI based articulatory representations via articulatory recognition
Athanasios Katsamanis, Erik Bresch, Vikram Ramanarayanan, Shrikanth Narayanan
An electropalatographic and acoustic study on anticipatory coarticulation in V1#C2V2 sequences in standard Chinese
Yinghao Li, Jiangping Kong
Final /t/ reduction in dutch past-participles: the role of word predictability and morphological decomposability
Iris Hanique, Mirjam Ernestus
Parametrising degree of articulator movement from dynamic MRI data
Zeynab Raeesy, Ladan Baghai-Ravary, John Coleman
Improving LVCSR system combination using neural network language model cross adaptation
X. Liu, M. J. F. Gales, P. C. Woodland
Towards high performance LVCSR in speech-to-speech translation system on smart phones
Jian Xue, Xiaodong Cui, Gregg Daggett, Etienne Marcheret, Bowen Zhou
Deploying google search by voice in Cantonese
Yun-Hsuan Sung, Martin Jansche, Pedro J. Moreno
An investigation in speech recognition for colloquial Arabic
Sarah Al-Shareef, Thomas Hain
A multithreaded implementation of Viterbi decoding on recursive transition networks
Fabio Brugnara
Recurrent neural network based language modeling in meeting recognition
Stefan Kombrink, Tomáš Mikolov, Martin Karafiát, Lukáš Burget
Ad-hoc meeting transcription on clusters of mobile devices
Michele Cossalter, Priya Sundararajan, Ian Lane
ROVER enhancement with automatic error detection
Kacem Abida, Fakhri Karray
Automatic comma insertion of lecture transcripts based on multiple annotations
Yuya Akita, Tatsuya Kawahara
Study on the relevance factor of maximum a posteriori with GMM for language recognition
Chang Huai You, Haizhou Li, Kong Aik Lee
Improving multiband position-pitch algorithm for localization and tracking of multiple concurrent speakers by using a frequency selective criterion
Tania Habib, Harald Romsdorfer
On the use of lattices of time-synchronous cross-decoder phone co-occurrences in a SVM-phonotactic language recognition system
Amparo Varona, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, Germán Bordel
Speaker clustering based on utterance-oriented dirichlet process mixture model
Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi
PLDA-based clustering for speaker diarization of broadcast streams
Jan Silovsky, Jan Prazak, Petr Cerva, Jindrich Zdansky, Jan Nouza
ivector approach to phonotactic language recognition
Mehdi Soufifar, Marcel Kockmann, Lukáš Burget, Oldřich Plchot, Ondřej Glembek, Torbjørn Svendsen
Discriminative features for language identification
Chris Alberti, Michiel Bacchiani
Perceptual sensitivity to dialectal and generational variations in vowels
Robert Allen Fox, Ewa Jacewicz
Investigation of cross-show speaker diarization
Qian Yang, Qin Jin, Tanja Schultz
Language identification for text chats
Vesa Siivola, Bryan Pellom, Meagan Sills
Spoken language recognition in the latent topic simplex
Kong Aik Lee, Chang Huai You, Ville Hautamäki, Anthony Larcher, Haizhou Li
Investigating robustness of spectral moments on normal- and high-effort speech
Frederike Gottsmann, Corinna Harwardt
Comparing the impact of raised vocal effort on various spectral parameters
Corinna Harwardt
Vowel context and speaker interactions influencing glottal open quotient and formant frequency shifts in physical task stress
Keith W. Godin, John H. L. Hansen
Prosodic correlates of individual physiological response to stress
Serguei Pakhomov, Michael Kotlyar
The vocal effort of dominance in scenario meetings
Marcela Charfuelan, Marc Schröder
A preliminary model of emotional prosody using multidimensional scaling
Sona Patel, Rahul Shrivastav
An exploratory study of the relations between perceived emotion strength and articulatory kinematics
Jangwon Kim, Sungbok Lee, Shrikanth Narayanan
Improved acoustic characterization of breathy and whispery voices
Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita
Neutral to target emotion conversion using source and suprasegmental information
D. Govind, S. R. M. Prasanna, B. Yegnanarayana
A multimodal analysis of vocal and visual backchannels in spontaneous dialogs
Khiet P. Truong, Ronald Poppe, Iwan de Kok, Dirk Heylen
Kernel models for affective lexicon creation
Nikos Malandrakis, Alexandros Potamianos, Elias Iosif, Shrikanth Narayanan
Automatic detection of depression in speech using Gaussian mixture modeling with factor analysis
Douglas Sturim, Pedro A. Torres-Carrasquillo, Thomas F. Quatieri, Nicolas Malyska, Alan McCree
Utterance verification for automating the hearing in noise test (HINT)
H. Timothy Bunnell, Jason Lilley, Sigfrid D. Soli, Ivan Pal
Analyzing the nature of ECA interactions in children with autism
Emily Mower, Chi-Chun Lee, James Gibson, Theodora Chaspari, Marian E. Williams, Shrikanth Narayanan
Incorporating speech recognition engine into an intelligent assistive reading system for dyslexic students
Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Evmorfia N. Argyriou, Antonis Symvonis
An investigation of depressed speech detection: features and normalization
Nicholas Cummins, Julien Epps, Michael Breakspear, Roland Goecke
Using prosodic and spectral features in detecting depression in elderly males
Michelle Hewlett Sanchez, Dimitra Vergyri, Luciana Ferrer, Colleen Richey, Pablo Garcia, Bruce Knoth, William Jarrold
Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment
Catherine Middag, Tobias Bocklet, Jean-Pierre Martens, Elmar Nöth
Speech synthesis parameter generation for the assistive silent speech interface MVOCA
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko
Computer-assisted disfluency counts for stuttered speech
Peter A. Heeman, Andy McMillin, J. Scott Yaruss
Spectral features for automatic blind intelligibility estimation of spastic dysarthric speech
Richard Hummel, Wai-Yip Chan, Tiago H. Falk
Extraction of narrative recall patterns for neuropsychological assessment
Emily T. Prud'hommeaux, Brian Roark
Gesture design of hand-to-speech converter derived from speech-to-hand converter based on probabilistic integration model
Aki Kunikoshi, Yu Qiao, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
Powered wheelchair control using acoustic-based recognition of head gesture accompanying speech
Akira Sasou
Analyzing training dependencies and posterior fusion in discriminant classification of apnea patients based on sustained and connected speech
José Luis Blanco, Rubén Fernández, Doroteo Torre, F. Javier Caminero, Eduardo López
Speaking to the crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges
Gabriel Parent, Maxine Eskenazi
A transcription task for crowdsourcing with automatic quality control
Chia-ying Lee, James Glass
Reliability-weighted acoustic model adaptation using crowd sourced transcriptions
Kartik Audhkhasi, Panayiotis G. Georgiou, Shrikanth Narayanan
Crowdsourcing for word recognition in noise
Martin Cooke, Jon Barker, Maria Luisa Garcia Lecumberri, Krzysztof Wasilewski
Crowdsourcing preference tests, and how to detect cheating
Sabine Buchholz, Javier Latorre
Growing a spoken language interface on Amazon Mechanical Turk
Ian McGraw, James Glass, Stephanie Seneff
Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk
F. Jurčíček, S. Keizer, Milica Gašić, François Mairesse, B. Thomson, K. Yu, Steve Young
Quality assessment of crowdsourcing transcriptions for african languages
Hadrien Gelas, Solomon Teferra Abate, Laurent Besacier, François Pellegrino
Using crowdsourcing to provide prosodic annotations for non-native speech
Keelan Evanini, Klaus Zechner
Podcastle: recent advances of a spoken document retrieval service improved by anonymous user contributions
Masataka Goto, Jun Ogata
Language-independent socio-emotional role recognition in the AMI meetings corpus
Fabio Valente, Alessandro Vinciarelli
Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions
Rivka Levitan, Julia Hirschberg
Automatic call quality monitoring using cost-sensitive classification
Youngja Park
Learning influences from word use in polylogue
Tomoharu Iwata, Shinji Watanabe
Identifying agreement/disagreement in conversational speech: a cross-lingual study
Wen Wang, Kristin Precoda, Colleen Richey, Geoffrey Raymond
A dual channel coupled decoder for fillers and feedback
Daniel Neiberg, Joakim Gustafson
An analysis of PCA-based vocal entrainment measures in married couples' affective spoken interactions
Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth Narayanan
Using prominence detection to generate acoustic feedback in tutoring scenarios
Lars Schillingmann, Petra Wagner, Christian Munier, Britta Wrede, Katharina Rohlfing
Bayesian extension of MUSIC for sound source localization and tracking
Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno
Speech-based non-prototypical affect recognition for child-robot interaction in reverberated environments
Martin Wöllmer, Felix Weninger, Stefan Steidl, Anton Batliner, Björn Schuller
Blind source separation for robot audition using fixed beamforming with HRTFs
Mounira Maazaoui, Yves Grenier, Karim Abed-Meraim
Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices
Marie Tahon, Agnes Delaborde, Laurence Devillers
Weighted ordered classes - nearest neighbors: a new framework for automatic emotion recognition from speech
Yazid Attabi, Pierre Dumouchel
Prosodic analysis of a corpus of tales
David Doukhan, Albert Rilliard, Sophie Rosset, Martine Adda-Decker, Christophe d'Alessandro
Analysis of acoustic-prosodic features related to paralinguistic information carried by interjections in dialogue speech
Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita
Robust intonation pattern classification in human robot interaction
Martin Heckmann, Kazuhiro Nakadai, Hirofumi Nakajima
ASR for human-symbiotic robot “EMIEW2” with mechanical noise and floor-level noise reduction
Takashi Sumiyoshi, Masahito Togami, Yasunari Obuchi
Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training
Ngoc Thang Vu, Franziska Kraus, Tanja Schultz
Places and manner of articulation of Bangla consonants: an EPG based study
Shyamal Kr. Das Mandal, Somnath Chandra, Swaran Lata, A. K. Datta
Efficient harvesting of internet audio for resource-scarce ASR
Marelie H. Davel, Charl van Heerden, Neil Kleynhans, Etienne Barnard
Automatic prosody generation for serbo-croatian speech synthesis based on regression trees
Milan Sečujski, Darko Pekar, Nikša Jakovljević
Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis
Alexey Karpov, Irina Kipyatkova, Andrey Ronzhin
Cross-language phone recognition when the target language phoneme inventory is not known
Timothy Kempton, Roger K. Moore, Thomas Hain
A paradigm for limited vocabulary speech recognition based on redundant spectro-temporal feature sets
Sourish Chaudhuri, Bhiksha Raj, Tony Ezzat
Gorup: an ontology-driven audio information retrieval system that suits the requirements of under-resourced languages
N. Barroso, K. López de Ipiña, A. Ezeiza, C. Hernández, N. Ezeiza, O. Barroso, U. Susperregi, S. Barroso
Woefzela - an open-source platform for ASR data collection in the developing world
Nic J. de Vries, Jaco Badenhorst, Marelie H. Davel, Etienne Barnard, Alta de Waal
A study on the perception of tone and intonation in Sesotho
Hansjörg Mixdorff, Lehlohonolo Mohasi, 'Malillo Machobane, Thomas Niesler
Developing a broadband automatic speech recognition system for Afrikaans
Febe de Wet, Alta de Waal, Gerhard B. van Huyssteen
Multi-accent speech recognition of Afrikaans, black and white varieties of south african English
Herman Kamper, Thomas Niesler
Perceptual representation of consonant sounds in Thai
C. Tantibundhit, C. Onsuwan, T. Saimai, N. Saimai, S. Thatphithakkul, P. Chootrakool, K. Kosawat, N. Thatphithakkul
A cross-lingual approach to the development of an HMM-based speech synthesis system for malay
Mumtaz B. Mustafa, Raja N. Ainon, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles
The INTERSPEECH 2011 speaker state challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Florian Schiel, Jarek Krajewski
Combining multiple phoneme-based classifiers with audio feature-based classifier for the detection of alcohol intoxication
Claude Montacié, Marie-José Caraty
Intoxication detection using phonetic, phonotactic and prosodic cues
Fadi Biadsy, William Yang Wang, Andrew Rosenberg, Julia Hirschberg
Drink and speak: on the automatic classification of alcohol intoxication by acoustic, prosodic and text-based features
Tobias Bocklet, Korbinian Riedhammer, Elmar Nöth
Intoxicated speech detection by fusion of speaker normalized hierarchical features and GMM supervectors
Daniel Bone, Matthew P. Black, Ming Li, Angeliki Metallinou, Sungbok Lee, Shrikanth Narayanan
Attention, sobriety checkpoint! can humans determine by means of voice, if someone is drunk… and can automatic classifiers compete?
Stefan Ultes, Alexander Schmitt, Wolfgang Minker
Does it groove or does it stumble - automatic classification of alcoholic intoxication using prosodic features
Florian Hönig, Anton Batliner, Elmar Nöth
Perception of alcoholic intoxication in speech
Florian Schiel
Detecting sleepiness by fusing classifiers trained with novel acoustic features
Tauhidur Rahman, Soroosh Mariooryad, Shalini Keshavamurthy, Gang Liu, John H. L. Hansen, Carlos Busso
An HMM-based approach to the INTERSPEECH 2011 speaker state challenge
Albino Nogueiras Rodríguez
RANSAC-based training data selection for speaker state recognition
Elif Bozkurt, Engin Erzin, Çiğdem Eroğlu Erdem, A. Tanju Erdem
University of Ljubljana system for interspeech 2011 speaker state challenge
Rok Gajšek, Simon Dobrišek, France Mihelič
Speaker state classification based on fusion of asymmetric SIMPLS and support vector machines
Dong-Yan Huang, Shuzhi Sam Ge, Zhengchen Zhang
Speech processing tools - an introduction to interoperability
Christoph Draxler, Toomas Altosaar, Sadaoki Furui, Mark Liberman, Peter Wittenburg
Easyalign: an automatic phonetic alignment tool under praat
Jean-Philippe Goldman
Mtrans: a multi-channel, multi-tier speech annotation tool
Julián Villegas, Martin Cooke, Vincent Aubanel, Marco A. Piccolino-Boniforti
The JSafran platform for semi-automatic speech processing
Christophe Cerisara, Claire Gardent
The social signal interpretation framework (SSI) for real time signal processing and recognition
Johannes Wagner, Florian Lingenfelser, Elisabeth André
ELAN - aspects of interoperability and functionality
Han Sloetjes, Peter Wittenburg, Aarthy Somasundaram
Open source voice creation toolkit for the MARY TTS platform
Marc Schröder, Marcela Charfuelan, Sathish Pammi, Ingmar Steiner
Java visual speech components for rapid application development of GUI based speech processing applications
Stefan Steidl, Korbinian Riedhammer, Tobias Bocklet, Florian Hönig, Elmar Nöth
mtalk - a multimodal browser for mobile services
Michael Johnston, Giuseppe Di Fabbrizio, Simon Urbanek
Web-based automatic speech recognition service - webASR
Stuart N. Wrigley, Thomas Hain
A web based speech transcription workplace
Markus Klehr, Andreas Ratzka, Thomas Roß
Winpitch: a multimodal tool for speech analysis of endangered languages
Philippe Martin
Recording caregiver interactions for machine acquisition of spoken language using the KLAIR virtual infant
Mark Huckvale
An affective spoken storyteller
Felix Burkhardt
Text driven 3d photo-realistic talking head
Lijuan Wang, Wei Han, Frank K. Soong, Qiang Huo
Physical models producing vowels with pitch variation
Takayuki Arai
An engine-independent text-to-speech workplace
Margot Mieskes
An application to test the emotion conveyed by vocal and musical signals
Simone Carcone, Carlo Giovannella
Automatic speech recognition system dedicated for Polish
Mariusz Ziółko, Jakub Gałka, Bartosz Ziółko, Tomasz Jadczyk, Dawid Skurzok, Mariusz Masior
Joint application of speech and speaker recognition for automation and security in smart home
Kong Aik Lee, Anthony Larcher, Helen Thai, Bin Ma, Haizhou Li
Adding a speech cursor to a multimodal dialogue system
Staffan Larsson, Alexander Berman, Jessica Villing
Prosody toolkit: integrating HTK, praat and WEKA
S. Thomas Christie, Serguei Pakhomov
Collecting life logs for experience-based corpora
F. Francesconi, A. Ghosh, G. Riccardi, M. Ronchetti, A. Vagin
Making an automatic speech recognition service freely available on the web
Stuart N. Wrigley, Thomas Hain
AT&t voicebuilder: a cloud-based text-to-speech voice builder tool
Yeon-Jun Kim, Thomas Okken, Alistair D. Conkie, Giuseppe Di Fabbrizio
Extending audio notetaker to browse webASR transcriptions
Roger Tucker, Dan Fry, Vincent Wan, Stuart N. Wrigley, Thomas Hain
A web-based tool for developing multilingual pronunciation lexicons
Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa
Speak4it and the multimodal semantic interpretation system
Michael Johnston, Patrick Ehlen
TSAB - web interface for transcribed speech collections
Tanel Alumäe, Ahti Kitsik
Visual voice mail to text on the iphone/ipad
Andrej Ljolje, Vincent Goffin, Diamantino Caseiro, Taniya Mishra, Mazin Gilbert
Percy - an HTML5 framework for media rich web experiments on mobile devices
Christoph Draxler
The KLAIR toolkit for recording interactive dialogues with a virtual infant
Mark Huckvale
Real-time prototype for integration of blind source extraction and robust automatic speech recognition
Francesco Nesta, Marco Matassoni, HariKrishna Maganti
Article |
---|