We present a novel speech dataset for face mask type and coverage area recognition collected with a smartphone. The dataset contains 2h 27m 55s of data from 30 German speakers (15f, 15m). The baseline results exploit the functionals of the eGeMAPS feature set, the Mel-spectrogram, and the spectrogram representations of the audio samples. To model the one-dimensional features, we investigate Support Vector Classifiers (SVC) and a neural network classifier. We extract salient information from the two-dimensional representations with Convolutional Neural Network (CNN) based encoders, coupled with a classification block. We use the Unweighted Average Recall (UAR) as the evaluation metric. For the face mask type and the coverage area recognition tasks (3-class problems), the best models on the test partition score a UAR of 49.3% and 47.8%, respectively. For the face mask type and coverage area recognition task (5-class problem), the optimal model on the test partition obtains a UAR of 35.0%.