This paper compares three methods of lipreading for visual and audio-visual speech recognition. Lip shape information is obtained using an Active Shape Model (ASM) lip tracker but is not as effective as modelling the combined shape and enclosed greylevel surface using an Active Appearance Model (AAM). A non-tracked alternative is a nonlinear transform of the image using a multiscale spatial analysis (MSA). This performs almost identically to AAM's in both visual and audio-visual recognition tasks on a multi-talker database of isolated letters.