A set of techniques for configuring a speech recognition system to a particular user are described in the context of voice label recognition over the public switched telephone network. User-configurable vocabularies are provided through automatic acoustic baseform determination based on an inventory of speaker independent subword acoustic units. The tendency of input utterances to contain out-of-vocabulary or non-speech information is accounted for using likelihood ratio based utterance verification procedures. Mismatch between a given user's utterances and the HMM model is accounted for using a frequency warping approach to speaker normalization. The performance of these techniques was evaluated on utterances taken from a trial version of a voice label recognition service.