Predicting populism: Methodological and substantive insights from supervised classification of social media data

P7-S186-1

Presented by: Paul C. Bauer

Lukas Schuette ², Paul C. Bauer ¹

¹ LMU Munich, University of Freiburg, MZES Mannheim (external fellow)

² University of Muenster

Populism has become a prominent concept in social science. To study populism empirically, it is necessary to identify it within large, rapidly expanding text corpora that are too vast for manual analysis. In the past, a limited number of researchers have analyzed populism in social media, party manifestos, and speeches using supervised methods such as logistic regression, random forest, and BERT, with mixed results. This paper makes three key contributions. First, we systematically review previous research that has predicted populism in various contexts, highlighting the challenges in defining populism clearly and illustrating the limitations of existing definitions for classifying social media data. We then propose a clearer definition. Second, using this refined definition and a large Twitter dataset, we compare various predictive machine learning and deep learning models, including logistic regression, random forest, XGBoost, LightGBM, and three LLM models—GPT-3.5-turbo, as well as finetuned versions of GPT-4o and GPT-4o-mini. The overall accuracy (correct classification rate) varies significantly, from .62 (GPT-3.5-turbo), to .85 (Random Forest), and up to .96 (finetuned GPT). Our comparison provides additional methodological insights that can be generalized to the classification of populism in different contexts and to other related concepts (e.g., in terms of classification time, cost, and model interpretability). Finally, as an ultimate test, we explore how model accuracy affects substantive conclusions on the prevalence of populism, offering insights into which modeling choices are most suitable for analyzing such data.

Keywords: populism, machine learning, classification, accuracy, bias, llm, llms

Sponsors