This study focuses on confusions made by French L2 learners vs. native subjects in the perception of 11 audiovisual Mandarin Chinese attitudes, selected from a broader set of 19 attitudes previously evaluated in audio condition by both native Chinese and naïve French listeners. Two groups of French L2 learners of Mandarin Chinese were selected according to their level assessed by the Common European Framework of Reference for Languages: 9 beginners (A1) vs. 10 intermediate learners (A2). Subjects evaluated the 11 attitudes in audio, visual and audiovisual condition. Comparison of confusions between learners of level A1 vs. A2 indicates few significant differences, mostly in audiovisual condition and without a clear gain for one group over the other: confusions patterns are closer to the native reference for group A1 in expression of doubt, and for group A2 in expression of contempt. The comparison of French L2 learners pooled together vs. native speakers reference sheds light on major confusions to be targeted by specific methods and exercises. In audio-only condition, neutral surprise and politeness are less recognized by learners, who confuse contempt with question and question with obviousness. In visual-only condition, obviousness is more confused with declaration, contempt with irritation, and disappointment with doubt. In audio-visual condition, recognition of neutral surprise is lower, while infant-directed speech is better recognized; neutral surprise is more confused with irritation and contempt with disappointment. Cross- modality comparisons suggest a limited contribution of informations conveyed by acoustic prosody in the identification of audiovisual social affects by L2 learners.