11:20 - 13:00
P7-S186
Room: 1A.12
Chair/s:
Francisco Tomás-Valiente
Discussant/s:
Daniel Weitzel
Comparing Large Language Models for Text Classification: Model Selection Across Tasks, Texts, and Languages
P7-S186-3
Presented by: Michael Heseltine
Michael Heseltine
University of Amsterdam
Large-scale text analysis has grown rapidly as an analytic method in the social sciences and beyond, in recent years. To date, text-as-data methods rely on large volumes of human-annotated training examples, which places a premium on researcher resources. However, advances in large language models (LLMs) have made automated annotation increasingly viable. This paper tests the performance of 12 different LLMs in text classification across different tasks, text types, and languages. Using data in six languages across eight country contexts, the results show considerable variation in model performance, highlighting that researchers should carefully consider model selection as part of their LLM-centered classification strategy. In general, GPT-4 exhibits relatively strong performance across all classification tasks, while open-source alternatives such as LLama3 and Qwen2 also show similar or even superior performance on select tasks. However, many open-source models provide relatively unsatisfactory performance on more complex and non-English language coding tasks. The tradeoffs inherent in the use of each model are highlighted to allow researchers to make informed decisions about model selection on a specific task-by-task basis.
Keywords: Large language models, text as data, political communication

Sponsors