Recent psychophysical studies suggest that human listeners do not segregate concurrent sounds by grouping frequency regions that have a common interaural time difference (ITD). However, such an approach is adopted by most computational auditory scene analysis (CASA) systems that use binaural cues. Here, we propose a CASA system that separates a target speech signal from a noise interferer, but does not require the ITD of the two sources to be consistent across frequency. We compare the CASA system with human performance on the same task, in which the speech reception threshold (SRT) is measured for speech and noise stimuli which have consistent or inconsistent ITDs in different frequency bands. The CASA system is shown to be in qualitative agreement with human performance.