Effects of Mind Wandering Thought Probes on Reinforcement Learning Model Parameters
Wed-Main hall - Z3-Poster 3-9111
Presented by: Nassim Sadedin
Goal-directed and habitual behaviors have been linked to model-based (MB) and model-free (MF) reinforcement learning (RL) (Daw et al., 2005). During mind wandering (MW), attention is directed away from the external environment, which disrupts external stimulus processing (Schooler et al., 2011). According to the levels-of-inattention hypothesis, states of "weak” MW impact high-level cognitive processes while sparing lower levels (Schad et al., 2012). Recent findings show that spontaneous MW impairs MB learning while sparing MF learning, suggesting weak MW (Liu et al., 2023).
The existing literature has solely utilized static measures of trait MW and assumes momentary MW impairs RL. Given the dynamic nature of MW and RL, it is crucial to investigate the validity of this assumption, whether momentary MW influences RL model parameters during the learning process.
We modified a two-step decision task, designed to assess MF versus MB learning (Daw et al., 2011), to include thought probes about subjects' (N = 40) ongoing MW on a continuous scale. We assess the Spontaneous and Deliberate Mind Wandering Scale (Carriere et. al., 2013) to compare static versus dynamic relationships of MW with RL.
We adopt a computational dual-control model (Daw et al., 2011), implementing MF control via the SARSA temporal difference algorithm, and MB control via Bellman's equation, where the expected value of an outcome is weighted by its transition probability. Parameters and hyperparameters for RL models will be estimated using expectation maximization. Autocorrelation and crosscorrelation will be tested to inspect dynamics within and between MW and RL parameters.
The existing literature has solely utilized static measures of trait MW and assumes momentary MW impairs RL. Given the dynamic nature of MW and RL, it is crucial to investigate the validity of this assumption, whether momentary MW influences RL model parameters during the learning process.
We modified a two-step decision task, designed to assess MF versus MB learning (Daw et al., 2011), to include thought probes about subjects' (N = 40) ongoing MW on a continuous scale. We assess the Spontaneous and Deliberate Mind Wandering Scale (Carriere et. al., 2013) to compare static versus dynamic relationships of MW with RL.
We adopt a computational dual-control model (Daw et al., 2011), implementing MF control via the SARSA temporal difference algorithm, and MB control via Bellman's equation, where the expected value of an outcome is weighted by its transition probability. Parameters and hyperparameters for RL models will be estimated using expectation maximization. Autocorrelation and crosscorrelation will be tested to inspect dynamics within and between MW and RL parameters.
Keywords: Reinforcement Learning, Model-Based Learning, Model-Free Learning, Mind Wandering