ISCA Archive ICSLP 1992
ISCA Archive ICSLP 1992

An interactive environment for speech recognition research

Mark Fanty, John Pochmara, Ron Cole

A UNIX software environment for speech recognition research is described. This environment is undergoing development under an NSF Software Capitalization grant and will be made public when complete. The speech tools will allow users to compute and display a variety of signal representations of speech, to label speech at a number of levels, to train and evaluate neural network classifiers and to display the processing stages of speech algorithms. The tools include file formats and support routines for audio files, two-dimensional data files (e.g. FFT output), neural network classifiers and time-aligned label files. The LYRE and AUTOLYRE programs display speech data in a variety of ways. NOPT is a conjugate gradient descent neural network training program which is limited in flexibility but easy to use and fast. A distributed version for networked workstations exists and will continue to be enhanced. A small but useful set of signal processing routines is provided including Perceptual Linear Predictive Analysis (PLP). Speech algorithms include a pitch tracker for high-quality speech and dynamic programming optimization code for use with phoneme probability matrices (e.g. as computed by a neural network classifier). Although the software currently reflects the biases of the Center for Spoken Language Understanding towards neural-network-based recognition using DFT- and PLP-based features, we hope that the user community continues to enhance the tool set. We have tried to design it with flexibility and growth in mind.