Spoken language identification (LID) systems exhibit performance degradations as the test input duration reduces. To delve deeper, we show that 36.94% of the misclassifications on 2-second (s) LID inputs occur due to out-of-scope elements like non-speech, named entities, filler words, and overlapped speech. To mitigate this, we propose a teacher-free knowledge distillation (TF-KD) using online label smoothing. This method accumulates prediction logits of correctly classified training segments from the preceding epoch and uses them as soft-labels for distillation in the next epoch. We further enhance TF-KD with dynamic weights, conditional label update, and entropy-based soft-label computation. Compared to existing KD-based solutions for 2s inputs, our approach achieves consistent Cavg improvements for both same-corpora and cross-corpora evaluations without training a separate teacher network.