Despite recent advances in Computer Telephony (CT) and IP Telephony (IPT) standards at defining flexible architectures to support new technologies, the current CT paradigm does not adequately support the requirements of advanced spoken dialogue systems. This paper describes an application framework based on CT and IPT standards that defines new architectural components for information access, alerting, and multi-modal input/output integration. This framework permits separation of the application logic from low-level resource management in order to facilitate the design and development of advanced, multi-modal voice-enabled services.