We propose a real-time AI system to conduct sentence-level quality assurance of conversational alignment based on speakerdiarized dialogues transcribed from automatic speech recognition of continuous audio stream. This system utilizes two new interactive engine: (1) an online registration-free speaker diarization component to perform separation of speech utterances of multiple speakers in the conversations that learns from user feedback; (2) a turn-level scoring mechanism that infers the conversation quality by computing a similarity score between the deep embeddings of a user-specified scoring inventory of interest, and the current sentence that the user is speaking. These real-time scores are known to be predictive to successful conversational outcome (such as relating to the therapeutic working alliance, which is an important indicator of clinical psychotherapy outcome). Other than evaluating the empirical advantages of the core components on existing dataset, we demonstrate the effectiveness of this system in a web-based application at https://www.baihan.nyc/viz/Voice2Alliance/.