The paper describes a neuroevolution-based novel approach to train recurrent neural networks that can process and classify audio directly from the raw waveform signal, without any assumption on the signal itself, on the features that should be extracted, or on the required network topology to perform the task. Resulting networks are relatively small in memory size, and their usage in a streaming fashion makes them particularly suited to embedded real-time applications.