Broca is a parser for spoken language in which natural language processing is tightly integrated with lexical and phonological processing. This is in contrast to the N-best approach usually used in speech recognition, in which the natural language component acts as an autonomous post-process to a Viterbi style search. In our system integration is achieved by expressing phonological, lexical, and natural language structures all in the form of an augmented context free grammar. Processing in Broca proceeds through four stages. The speech signal is mapped to a perceptually based reduced representation [1], A neural network classifier produces phoneme estimates at a frame rate of nine milliseconds. This phoneme stream is segmented using a hierarchical clustering algorithm [2]. Then the integrated grammar is dynamically matched against the segmentation by the application of a probabilistic parsing algorithm.
The advantage of a parsing approach is that it exploits higher order information present in the speech signal. This potentially includes phonological phenomena such as stress pattern and prosodic grouping, as well as the syntax and semantics of natural language. This information is lost in a Viterbi style search.
Broca is a speaker-independent, continuous speech system. We have evaluated it on an in-house database of spoken utterances for an X window manager task.