Understanding and Mitigating Online Toxicity: A Chatroom-Based Experimental Framework with LLM Agents
P2-S49-1
Presented by: Fabrizio Gilardi, Colin Henry, Karsten Donnay
Toxicity and hate speech online are disproportionately concentrated among a small subset of users, emphasizing the importance of targeted research to understand and mitigate these behaviors. However, studying toxic users presents significant challenges. On social media platforms, such users are rare (as a proportion of all users), limiting sample sizes in experiments. Moreover, research is restricted to observing behavioral outcomes, leaving underlying motivations unclear. In controlled settings, such as experimental chatrooms, ethical concerns arise from exposing participants to toxicity, and it is uncertain whether toxic behavior can be reliably elicited.This paper introduces a novel methodology that combines a custom chatroom environment with Large Language Model (LLM)-powered agents to address these challenges. This approach enables the recruitment and engagement of toxic users while ensuring that no human participants are exposed to harmful content. Within the chatroom, LLM agents simulate distinct conversational dynamics and norms and deliver interventions, such as counterspeech. This setup allows researchers to evaluate both behavioral changes (e.g., shifts in toxic behavior) and self-reported survey responses collected post-interaction. We evaluate the feasibility of recruiting toxic users, the effectiveness of strategies to induce toxic behavior in a controlled environment, and the impact of interventions designed to reduce toxicity. Our findings demonstrate the potential of this approach to advance the study of toxic online behavior and inform the development of evidence-based strategies for improving online discourse.
Keywords: hate speech, toxic speech, experiments, LLMs