This paper describes an experiment which elicits and then evaluates LR-like scores from non-expert, human listeners in a speaker recognition task under conditions reflective of forensic casework. In doing so, it provides a framework for comparing and combining listener judgements with the output of ASR systems (or other data-driven speaker recognition approaches). Stimuli consisted of 45 same-speaker and 45 different-speaker pairs of voices from young, male speakers of Standard Southern British English, using 10 second, channel-mismatched samples. 81 listeners provided ratings of the similarity between voices and their typicality within the wider accent population, which in turn were used to calculated LR-like scores. These scores were converted to log LRs via cross-validated logistic regression calibration. Overall, the human listeners produced an EER of 26.67% and a Cllr of 0.773. However, considerable variation was found across individual listeners (13.3-66.7% EER). Fusion of the listener judgements with an x-vector ASR system provided very marginal improvement in performance compared with the ASR system in isolation. Importantly, the magnitude of the four errors made by the ASR system were reduced because of the listener judgements. The implications of this work for forensics will be discussed.