Temporal-Difference Learning in Uncertain Choice: A Reinforcement Learning-Diffusion Decision Model of Two-Stage Decision-Making

Mon—HZ_10—Talks3—3003

Presented by: Nicola Schneider

Nicola Schneider ^*, Andreas Voss

Heidelberg University

Behavioral adaptation in probabilistic environments requires learning through trial and error. While reinforcement learning (RL) models can describe the temporal development of preferences through error-driven learning, they neglect mechanistic descriptions of single-trial decision-making. On the other hand, sequential sampling models such as the diffusion decision model (DDM) allow for the mapping of state preferences on single response times. We present a Bayesian hierarchical RL-DDM that integrates temporal-difference (TD) learning to bridge these perspectives. Our implementation incorporates variants of TD learning, including SARSA, Q-Learning, and Actor-Critic models. We tested the model with data from N = 60 participants in a two-stage decision-making task. Participants exhibited learning over time, becoming both more accurate and faster in their choices. They also reflected a difficulty effect, with faster and more accurate responses for easier choices, as reflected by greater subjective value differences between available options. Model comparison using predictive information criteria and posterior predictive checks demonstrated that the RL-DDM provided a better fit compared to standalone RL or DDM models. Notably, the RL-DDM captured both the temporal dynamics of learning and the difficulty effect in decision-making. Our work represents an important extension of the RL-DDM into temporal-difference learning.

Keywords: Diffusion decision model, Reinforcement Learning, Temporal-difference learning, Sequential sampling models, Decision-making, Cognitive modeling, Bayesian statistics