A system for phonetic and word level labelling of speech given the text representation of the utterance is being developed. A rule set designed for text-to-speech applications is used for the initial conversion of a given text to a base form phoneme transcription. Optional pronunciation is then predicted by rules and a lexicon. A speech corpus consisting of 2000 sentences spoken by one male speaker is currently being labelled at the department using this system. Results on a subset show that taking into account phone duration distribution and allowing for silent intervals between words is important. Optional pronunciation within words and at word boundaries did not improve the accuracy, possibly due to a limited phonetic discriminability of the system.
Keywords: labelling, alignment