For most practical purposes almost unlimited amount of transcribed training data is available in radio and TV broadcast and it appears that there are limits to capabilities of current weakly constrained HMM recognizers. There is a renewed interest in implementing more knowledge into ASR systems. This short communication expands on Chin-Hui Lee's conviction that "...robust speech recognition cannot be solved simply by collecting more data" 1 and attempts to argue for more attention to techniques which could extract reliable, reuseable, and relevant knowledge from currently available large amounts of speech data.