Assessing Accuracy and Cue Utilization in AI Judgments: Methods and Applications

Tue—HZ_13—Talks5—4904

Presented by: Aaron Petrasch

Aaron Petrasch ^*

LMU Munich

Artificial intelligence is developing at a rapid pace. Researchers and societies are challenged to keep up with and react to advances in this field. Recent work demonstrates the capabilities of Large Language Models (LLMs) to judge characteristics of persons (e.g., traits or emotions) and situations that cannot be explained by the detection and classification of single objects or cues. This talk outlines a framework to critically evaluate the judgmental performance of LLMs. First, I present ways to assess the accuracy of LLMs judging person or situation characteristics, outlining why merely correlating AI judgments with a human criterion may lead to wrong conclusions. Second, I demonstrate an extended Brunswikian lens model to analyze cues that humans and AI use to derive their judgments. This allows researchers to study judgment differences and biases among human and LLM judges. Empirical examples and applications for experimental designs áre presented, with both human-written texts and scene images used as stimuli to generate judgments of persons, situations, and cues.

Keywords: Person perception, situation perception, accuracy, LLMs, lens model, AI