Throughout the present paper, the possibility of using Neural Networks to produce x-y representations from speech in real time, such in vowel and vowel-like sounds, is theoretically shown and practically documented. A certain kind of Time-Delay Neural Network, is shown to be the most efficient operator to extract formant-dynamic information for these plottings. This opens the possibility for constructing Visual User Interfaces for Language Learning Systems using relatively simple hardware.