The paper describes a framework to overcome some problems in the analysis of speech corpora used in text-to-speech systems. In particular two kinds of errors that can produce disagreeable effect at synthesis level have been examined. The first of them is the incorrect transcription of pauses (and more generally low energy intervals) and the second one is the mismatch between voiced intervals and the phonetic symbol that should represent them. For the first problem a statistical approach has been used, by comparing some features of the detected low energy intervals (LE) with those of trained data. The second problem has been faced extracting the voiced/unvoiced intervals (VU) and checking the coherence with the phonetic transcription and segmentation.