In multi-party teleconferencing, the transport of separate speech streams to a particular user and the subsequent spatial rendering of the different streams enables a more efficient communication. A simple means of spatial presentation at client side is that of binaural rendering and headphone presentation. For downward-compatibility, e.g. when the transport mechanism does not support multiple parallel downlink streams, a system is proposed that combines an automatic speaker classification mechanism with a spatial rendering of the segregated streams. The combined system aims at a better separability of the speakers than conventional systems. The paper details the two basic components, namely automatic speaker classification, and binaural rendering. Based on a first evaluation of the approach, a proof of concept is provided, and directions for further improvement are discussed.