Since the beginning of the 70s, several attempts have been made to explain the phonetic structure of vowel systems by introducing extralinguistic principles, listener-orientated (perceptual contrast and stability) or speaker-orientated (articulatory contrast and economy). The best predictions have been realized with the perceptual contrast theory (PCT), but two main problems remain: the too great number of high non-peripheral vowels and the impossibility to predict the [i,y,u] series within the high vowels set. We try to get rid of these difficulties while staying within the field of listener-orientated principles. First, we study he PCT in the F1-F2-F3 space, in order to better account for the role of higher formants in the perception of front vowels. In this space, we show that the problem of high non-peripheral vowels can be solved with an increased weight of Fl, but the case of [y] can only be understood by reinforcing the stability of the [i]-[y] pair. This is done by means of a "focalization" principle, according to which vowels with strong formant convergence - [i] characterized by a strong F3-F4 convergence and [y] characterized by a strong F2-F3 convergence - would be perceptually prefered.