This paper describes a novel approach based on unsupervised training of the MAP adaptation rate using Q-learning. Q-learning is a reinforcement learning technique and is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. The proposed method defines the likelihood of the adapted model as a reward and learns a weight factor that indicates the relative balance between the initial model and adaptation data without the need for supervised data. We conducted recognition experiments on a lecture using a corpus of spontaneous Japanese. We were able to estimate the optimal weight factor using Q-learning in advance. MAP adaptation using the weight factor estimated with the proposed method acquired recognition accuracy that was equivalent to MAP adaptation using a weight factor determined experimentally.