The use of dynamic conditional random fields (DCRF) has been shown to outperform linear-chain conditional random fields (L-CRF) for punctuation prediction on conversational speech texts. In this paper, we combine lexical, prosodic, and modified n-gram score features into the DCRF framework for a joint sen-tence boundary and punctuation prediction task on TDT3 En-glish broadcast news. We show that the joint prediction method outperforms the conventional two-stage method using L-CRF or maximum entropy model (MaxEnt). We show the im-portance of various features using DCRF, LCRF, MaxEnt, and hidden-event n-gram model (HEN) respectively. In addition, we address the practical issue of feature explosion by introduc-ing lexical pruning, which reduces model size and improves the F1-measure. We adopt incremental local training to overcome memory size limitation without incurring significant per-formance penalty. Our results show that adding prosodic and n-gram score features gives ~20% relative error reduction in all cases. Overall, DCRF gives the best accuracy, followed by LCRF, MaxEnt, and HEN.
Index Terms: punctuation, dynamic conditional random fields, sentence boundary detection