In this work, we study the effect of Transfer Learning on an audio classification task. The recordings to classify are young children reading aloud isolated words in a classroom context. We aim at detecting which recordings are noisy and/or speechless. We explored both Recurrent Neural Network and Fully Connected Neural Network architectures. To train our classifiers, a Transfer Learning based feature extraction approach is introduced, using the VGGish pre-trained model made available by Google as a feature extractor. Due to pedagogical constraints, the different possible misclassifications do not have the same consequences. Therefore, an alternative metric to the F1 score is presented, which takes into account the pedagogical consequences of the possible misclassifications. Results show that networks trained on Transfer Learning based features perform better than networks trained on Mel-Frequency Cepstral Coefficients, which are typically used in speech recognition tasks, with a relative improvement of 25% in our metric between the best performing models. These networks also provide a reduction of three to five times of computation time for training.