With the growing availability of large-scale spoken databases, linguists are increasingly relying on automated tools to obtain time alignments of sound units to the speech signal. A typical automated pipeline may involve grapheme-to-phoneme conversion, forced alignment, and acoustic-phonetic measurement, and each of these stages requires a strong assumption regarding the output quality. We investigate these assumptions by auditing outliers in vowel formants from two multilingual read speech corpora, CMU Wilderness and Mozilla Common Voice, across three languages: Hausa, Kazakh, and Swedish. From this audit, we develop a novel outlier taxonomy that includes the broad outlier categories of transcript errors, alignment errors, formant tracking errors, linguistic variations, and fine samples. We show the utility of this outlier analysis in identifying weaknesses in corpus-specific and corpus-general pipeline assumptions, and discovering characteristics of particular languages.