Mr BERT goes to parliament: a supervised approach to classifying parliamentary speech in Europe
P5-3
Presented by: Zachary Greene
Scholars increasingly use quantitative text analysis to derive issue level measures of preference. Yet, few existing datasets or approaches provide sufficient granular data or validation at the party or politician level across a range of topics. Furthermore, surveys come with known biases that limit the inferences that can be made for measures of preference. We propose a supervised approach based on state of the art multilingual representation learning that enables transfer of automated coding of Comparative Agendas Project (CAP) categories to new languages and domains without labelled data. This approach combines the wealth of existing hand-coded legislative data from the CAP to train a classifier to predict the content of previously unleveraged parliamentary speech. Our approach classifies parliamentary speech according to the CAP codebook. We use a massive multilingual transformer to extend content analysed data to additional languages not previously studied with this framework. This classification approach allows us to predict the issue content and supports the scaling of parliamentary speech in 7 languages in 8 countries. We compare results of this classification and scaling to existing methods (Wordfish, Wordscores, Semscale) and evaluate the effectiveness against a new set of hand coded benchmarks. Both the data and approach will be useful for predicting issue level changes in election statements, political speech, and policy changes.