We present a spoken dialog-based framework for the computer-assisted language learning (CALL) of conversational English. In particular, we leveraged the open-source HALEF dialog framework to develop a job interview conversational application. We then used crowdsourcing to collect multiple interactions with the system from non-native English speakers. We analyzed human-rated scores of the recorded dialog data on three different scoring dimensions critical to the delivery of conversational English — fluency, pronunciation and intonation/stress — and further examined the efficacy of automatically-extracted, hand-curated speech features in predicting each of these sub-scores. Machine learning experiments showed that trained scoring models generally perform at par with the human inter-rater agreement baseline in predicting human-rated scores of conversational proficiency.