Social Influence on Exploration/ Exploitation Strategies in Bandit Tasks
Mon-B22-Talk I-03
Presented by: Ludwig Danwitz
Multiple real-world scenarios require decision makers to trade off exploration (i.e., information seeking behavior) against exploitation (i.e., usage of present knowledge to maximize some kind of reward). Such agents are sometimes informed about and influenced by the whereabouts of other agents facing the same task. This type of social learning can as well help to enhance the agents’ performance, as it can be either ignored or overly relied on. E.g., in search and rescue scenarios, various agents explore, while being in touch with others. Restless bandit tasks provide an environment in which participants face the exploration exploitation trade-off constantly. Therein, participants have to maximize their reward height by choosing between multiple options (“bandits”) constantly changing in regard to the quality of their outcome. The current research investigates in two experiments how seeing fictitious other participants choices influences the way participants tackle the trade-off. Therein, participants saw either highly explorative or highly exploitative choice sequences. Copying behavior and individual exploration behavior are disentangled using a novel reinforcement learning model which links the likelihood of copying to the uncertainty of the participants: the Kalman copy-when-uncertain model. First results indicate that the copy when uncertain model outperforms comparable models. The level of exploration expressed in the social information impacts the copy behavior and the success of the participants. Results on differential effects of different types of social information remain yet inconclusive. Applications of this research include team-wise exploration of spatial environments and the prevention of myopic strategies in team work.
Keywords: Social Learning, Exploration / Exploitation, Decision-Making, Computational Modeling, Bandit Task, Information Seeking