The surge in multilingual and code-switched spoken content demands efficient Query-by-Example Spoken Term Detection (STD) systems capable of handling diverse languages. Existing STD systems are monolingual; they typically require large labeled datasets for training and use costly DTW-based matching during inference, limiting their practicality. This paper proposes a novel speech tokenizer that converts speech into language-agnostic tokens. Furthermore, a multi-stage search algorithm enables fast and efficient retrieval from large datasets. In experimental evaluations, the tokens from the proposed tokenizer demonstrate strong speaker invariance, consistent performance across languages, and a capability to generalize effectively to unseen languages, outperforming the baselines significantly.