15:30 - 17:45
Thursday-Panel
Chair/s:
Akitaka Matsuo
Discussant/s:
Musashi Harukawa
Meeting Room C

Tom Paskhalis
Record Linkage with Text: Merging Data Sets When Information is Limited

Akitaka Matsuo, Kentaro Fukumoto
Legislators’ Sentiment Analysis Supervised by Legislators

Lukas Stoetzer, Heike Klüver
Measuring Parties' Evolving Issue Agendas

Hauke Licht
Cross-lingual supervised classification of political texts

Anna Palau, Andreu Casas, Luz Muñoz
Who is Effective at Amending Legislation? A Text Reuse Analysis of Which Amendments Make it into Law
Cross-lingual supervised classification of political texts
Hauke Licht
University of Zurich, Department of Political Science

Large portions of political text collections are multilingual and principally invite comparative quantitative analysis. However, established methods for cross-lingual text analysis require reliance on linguistically qualified human coders, human translators, or reliable machine translation and thus tend to thwart comparative research. In this paper, I propose an alternative method that relies on multilingual text embedding: Texts written in different languages are embedded in a joint semantic space using a publicly available multilingual language model. The resulting text embeddings are then used as inputs to train a supervised machine learning classifier. To validate the proposed approach, I conduct a series of text classification experiments on three different political text corpora. These experiments show that classifiers trained on multilingual text embeddings pass three important tests: They classify held-out texts as accurately as comparable classifiers trained on monolingual or translated texts. They perform by and large consistently across languages. And they classify texts written in languages that were not present among the training data with little to no loss in predictive performance. Viewed together, these results present supervised classification from multilingual text embeddings as a reliable, replicable, and cost-efficient approach to multilingual text classification. This study thus contributes to an emerging methodological literature on multilingual quantitative text analysis in political science.