A speaker verification evaluation is presented on the Multi-session Audio Research Project (MARP) corpus, for which speakers were recorded at regular intervals, in consistent conditions, over a period of three years. It is observed that the performance of an i-vector system with probabilistic linear discriminant analysis (PLDA) modelling decreases progressively, in terms of both discrimination and calibration, as the time intervals between train and test sessions increase. For male speakers, the equal error rate (EER) increases from 2.4% to 4.4% when the interval between sessions grows from several months to three years. An extension to conventional linear score calibration is proposed, whereby short-term aging information is incorporated as an additional factor in the score transformation. This new approach improves discrimination and calibration performance in the presence of increasing time intervals between train and test sessions, compared with score-only calibration.