Speech signal played out from the loudspeaker is referred as loudspeaker emitted speech or loud speaker speech. Most of the automatic speech recognition (ASR) systems are trained on the natural speech signals, recorded directly from the human speakers and gives higher word error rate (WER) for the loudspeaker speech. In this paper, first, we analyzed the whisper-medium ASR performance on the loudspeaker emitted speech. Five different equalizer modes, i.e., normal, pop, rock, jazz, and classic along with the distances 0m, 3m, and 5m are considered for the study. Further, based on the spectral differences between natural and loudspeaker speech, an algorithm is proposed to generate the loudspeaker quality speech from natural speech recordings. This algorithm is used to augment the Librispeech data and used to fine-tune the whisper-medium. The fine-tuned ASR on simulated loudspeaker quality speech showed significant improvement when compared to baseline system.