Audio deepfake detection is usually formulated as a binary classification between genuine and fake speech for an entire utterance. Environmental clues such as background and device noise can be used as the classification features, but they are easy to be attacked, e.g. by simply adding real noise to the fake speech. While speech spectral discrimination are more robust features, which have been used in speaker recognition models to authenticate the speaker identity. In the study, we propose a speaker recognition-assisted audio deepfake detector. Feature representation extracted by a speaker recognition model is introduced into multiple layers of deepfake detector to fully exploit the inherent spectral discrimination of speech. Speaker recognition and audio deepfake detection models are jointly optimized by a multi-objective learning method. Systematic experiments on the ASVspoof 2019 logical access corpus demonstrate the proposed approach outperforms existing single systems and significantly improves the robustness to noise.