Segmental models have been shown effective on speech recognition recently. However, a first-pass baseline system such as HMMs is required to provide a constrained set of candidate segmentations and label sequences for most segmental models to make inference on. This paper explores one-pass segmental models based on continuous feature space for phone recognition and make the first direct comparison between a frame-based system and segmental system using the same base features. We also show that transition features can be very beneficial for segmental models, particularly the ones surrounding the segment boundaries. In order to efficiently incorporate such features, we propose the Boundary-Factored SCRF, which reduces the time complexity of a SCRF to that of a frame-level CRF.
Index Terms: Segmental Conditional Random Fields, Phone Recognition