Cognitive and neurocognitive models of face processing suggest that independent (modular) processing routes are engaged for different types of face processing tasks. This study used fMRI (1.5 T) to examine common and distinctive processing routes for the perception of speech and emotion from posed photographs. For speech categorization (`Is that a vowel or a consonant?'), there was bilateral activation in frontal and inferior temporal (including fusiform) regions. The emotion task (`Is the expression positive or negative?') also showed activation in inferior temporal and frontal regions, but this was right lateralized. While an incidental task (`Can you see her teeth?'), using the same pictures, activated identical regions for the emotion faces, this was not the case for speech, where activation was limited to fusiform regions. Emotionally expressive, but not speech face images, readily activate a right-lateralized temporo-frontal circuit, whatever the task demands. Speech categorization from still images recruits extensive frontal regions, bilaterally, confirming previous findings [1]. Unlike previous reports, superior temporal regions did not show significant levels of activation for speech faces, although right superior temporal sulcus (STS) was activated by the emotional faces.