In this paper, we present the system submitted by Whispeak for the ASVSpoof5 Speech Deepfake Detection and SASV Challenge. We use an ensemble of systems, consisting of LFCC-LCNN, RawGAT-ST, Wav2Vec-RawGAT-ST and Wav2Vec-Conformer for the Speech Deepfake Detection tracks. We use a linear fusion of an ECAPA-TDNN with the previous ensemble for SASV tracks. A dozen data augmentation techniques are applied during training in order to improve the robustness of the models on the ASVSpoof5 dataset. We also test our models on external datasets and show that models are not yet able to generalize well on out-of-domain data. The final system gives an EER of 4.16% and a minDCF of 0.1124 on the track 1 evaluation set in open condition.