In forensic contexts, speakers often feel emotional, which will likely influence their speech. Emotional mismatch between samples is therefore a source of variability which could have substantial effects on the performance of a forensic automatic speaker recognition system. This paper examines the issue of emotional speech in forensic casework, both in terms of emotional match and mismatch between test samples and in terms of the data used to calibrate the system (i.e. the reference population). Specifically, we tested system performance on samples of neutral and acted angry and fearful speech data across 37 test conditions. The best system performance was achieved when the test data and reference population conditions matched exactly. However, in 16 of the 37 tests, the system produced a Cllr greater than 0.8, 10 of which also exceeded a Cllr of 1. As a result, caution should be used to interpret the results of automatic and semi-automatic forensic analysis on emotional speech data.