Passing noise through a binary mask representing speech leads to remarkably intelligible speech. However, if the mask input is a competing speech signal, both the competing speech and the target speech represented by the mask are rendered unintelligible. The current study considers potential explanations for this abrupt breakdown. Competing speech was modified to reduce the influence of properties that may have interacted adversely with those of the target, including speaker, language, F0 and spectral detail. Properties were modified by noise-vocoding, envelope substitution and preservation of temporal modulations. The outcome of a listening experiment indicated that the impact of competing speech is largely due to conflicting formant-scale spectral detail and the absence of sufficient energy in specific temporal epochs, while conflicting F0 plays no role. These findings contribute to a broader understanding of the minimal representational basis that underlies speech perception.