Judgments of Learning in Humans and LLMs: Evidence from a Cross-Agent Model

Submission 624

MixedTopicTalk-05

Presented by: Elanur Ulakci

Elanur Ulakci ^{1, 2}, Markus Huff ^{1, 2}

¹ Leibniz-Institut für Wissensmedien Tübingen, Germany

² University of Tübingen, Germany

Large language models (LLMs) increasingly exhibit human-like performance across a wide range of psychological tasks in which they are tested as experimental subjects. However, their capacity for metacognition—the ability to monitor and evaluate cognitive processes—has not yet been systematically compared to that of humans. In this study, participants and three LLMs (GPT-3.5-turbo, GPT-4-turbo, GPT-4o) were presented with sentence pairs comprising one garden-path sentence, which is known to be syntactically misleading and requiring reanalysis, preceded by either a fitting or an unfitting contextual sentence. By manipulating the contextual fit, we assessed how semantic relatedness influenced judgments of learning (JOL) for both humans and the models. Results showed that human JOLs were reliably sensitive to contextual cues and accurately predicted subsequent memory performance, whereas none of the LLMs demonstrated comparable sensitivity or predictive accuracy and thus failed to anticipate human memory performance. These findings reveal a fundamental gap between object-level processing and meta-level monitoring in artificial systems, suggesting that enhancing self-monitoring mechanisms of such models could advance their applications in education, personalized learning, and adaptive human-AI collaboration. More broadly, we introduce a cross-agent prediction model—where humans predict their own recognition performance and LLMs predict human recognition outcomes—as a framework for benchmarking (meta-)cognitive alignment between humans and artificial agents.