09:30 - 11:10
P6-S157
Room: 1A.10
Chair/s:
Jeremy Siow
Measuring Complexity and Reproducibility: A Comprehensive Benchmark of LLMs for Multilingual Policy Agenda Topic Annotation
P6-S157-5
Presented by: Bastián González-Bustamante
Bastián González-Bustamante
Leiden University
Even though LLMs are taking the methodological landscape in social sciences by storm, especially in the framework of text-as-data and Natural Language Processing (NLP), they are not a panacea. Their performance varies depending on the task complexity, prompt strategy, language, and model parameters. For example, temperature is one of the most well-known parameters that could affect the reproducibility level of LLMs. In addition, there are concerns associated with using proprietary models, including limitations on reproducibility. Although open-source models offer reproducibility under controlled circumstances, they perform less effectively than closed models for several tasks. This paper provides a comprehensive benchmark to test the stability of LLMs on a complex classification task: policy agenda topics in bills and acts of Parliaments. The complexity of this task relies on using a multiclass categorisation, such as the 21 major topics of the Comparative Agendas Project, which is quite challenging for most of these models because of the high number of categories involved in the annotation. We conduct classification tasks in different languages (e.g., Dutch, English, German, Hungarian, Italian, Portuguese and Spanish) and incorporate prompt variations and temperature experiments to perform ground-truth evaluations on both proprietary and open-source models such as OpenAI’s GPTs, Claude models, xAI’s Grok, Meta’s Llama recent releases like 3.3, Alibaba’s Qwen 2.5, Mistral models, among others. In sum, this paper offers a better understanding of these models’ performance for a complex task and insights into how classification routines could affect the annotation process and reproducibility.
Keywords: LLMs, GPTs, text-as-data, NLP, policy agendas

Sponsors