An interactive audio service provides an audio editing functionality to users. In the service, the users can control the wanted audio objects to make their own audio sound using a spatial audio object coding (SAOC) scheme. The SAOC has a problem in case of the Karaoke mode, because the vocal object cannot be removed perfectly from the down-mix signal. In this paper, a modified SAOC scheme with harmonic extraction and elimination structures are proposed. The proposed scheme perfectly removes a vocal object using harmonic information of the vocal object. Subjective and objective evaluation results show the proposed scheme is superior to the conventional ones.