Online streaming companies such as Netflix have become dominant in the media distribution sector. However, such media delivery services often support very rudimentary search, especially for natural language queries. To provide a more natural search interface, we have developed a conversational movie search system, which parses the recognition hypothesis of a spoken query into semantic classes using conditional random fields (CRFs), and then searches an indexed database with the identified semantics. Topic modeling on user-generated content (e.g., movie reviews) is employed for query expansion. Thirteen searching schemas are supported (such as genre, plot, character and soundtrack search). A crowd-sourcing platform was utilized to automatically collect large-scale annotated data for incremental CRF training.
Index Terms: conditional random fields, spoken dialogue system