ISCA Archive ECST 1987
ISCA Archive ECST 1987

A measure of deliberatenes as an aid to the construction of grammars

Richard Rohwer

The construction of a grammar from a given set of terminal symbols and a corpus illustrating their use is not a particularly straightforward, or even well-defined problem. In most any approach to this problem, it is necessary to know how the terminal symbols should be grouped into phrases. To this end, a measure of "deliberateness" or "non-randomness" of phrases is introduced. This measure can be computed directly from the N-Gram statistics of the corpus, and takes into consideration a simple model of the uncertainties in these statistics. It indicates whether the corelation is positive or negative. A high value for this deliberateness measure appears to be a sufficient, but not necessary condition for the phrase to have relevance to the grammar. The measure can also be used to judge the non-randomness of a production rule. It is concluded that this measure, while unable to provide all the information needed to construct a general phrase-structure grammar, provides a substantial subset of this information. The measure is also useful for computing probabilities for arbitrary strings of terminal symbols.