Recent experiments suggest that audio-visual interaction in speech perception could begin at a very early level, in which the visual input could improve the detection of speech sounds embedded in noise [1]. We show here that the "speech detection" benefit may result in a "speech identification" benefit different from lipreading per se. The experimental trick consists in using a series of lip gestures compatible with a number of different audio configurations, e.g. [y u ty tu ky ku dy du gy gu] in French. We show that the visual identification of this corpus is random, but, when added to the sound merged in a large amount of cocktail-party noise, vision happens to improve the identification of one phonetic feature, i.e. plosive voicing. We discuss this result in terms of audio-visual scene analysis.