Here we introduce our new text-to-AV (speech and face animation) system created for our Thinking Head project that provides a modular research platform to the AV community. This includes a novel phone-to-face motion module capable of synthesizing face animation from triphone data. Using phoneme timing information from human speech and combining this with information derived from our speech face motion database built from motion capture data, we build correspondences between di- and tri-phones, and face motion. A comparison between face motion synthesized from speech using only our system and face motion generated from motion capture during speech verifies our capability to synthesize AV speech motion with equivalent quality as for motion-capturedriven speech face motion.