A durational modelling technique is proposed for CDHMM-based connected digit recognition. This reduces the insertion error rate, which is typically the most frequent recognition error observed when no grammar constraint is applied. Insertion errors can be attributed in part to the acknowledged weakness of the acoustic models for accurate temporal modeling of speech signals. Two forms of durational model are investigated: an expanded-state model and an explicit model. Both forms of model significantly reduce the number of insertion errors and hence the digit string error rate. A modification to the explicit model which also accounts for speaking rate is described.