This paper describes our team’s experiences in the VoiceMOS Challenge 2023 - a challenge centered around the evaluation of the quality of synthetic or noisy speech. Inspired by our success with an ensemble approach in the first VoiceMOS Challenge in 2022, we submitted an ensemble of four models this time, based on wav2vec 2.0, QuartzNet, CNN-RNN, and LDNet. This was enough to win one of the two tracks we participated in (Track 1b). However, post-challenge analysis shows that only two of the models offer a meaningful contribution in any of the VoiceMOS 2023 tracks, while the other two only degrade the ensemble’s overall performance. On the other hand, post-challenge results on Track 2 (singing voice conversion data) surpassed all our expectations. In the paper, we explain how we tried to deal with the new zero-shot out-of-domain scenarios, analyze the results, and discuss the lessons learned.