Young Voices in Parliament: Assessing Substantive Youth Representation in Germany and the UK using Natural Language Processing.
P5-S125-5
Presented by: Robin Rentrop
This paper addresses the empirical gap in assessing substantive youth representation by employing advanced Natural Language Processing and supervised machine learning techniques to analyse whether young MPs are more likely to address substantive youth representation in parliamentary speeches. Prior research has highlighted descriptive youth underrepresentation and assumed that younger MPs are better suited for substantive youth representation, yet this claim remains empirically untested. To address this, I developed a methodological framework that enables to analyse a twenty-year span of parliamentary speeches across two countries. First, I hand-annotated 2,000 parliamentary speeches to create a high-quality multilingual training dataset. Using this dataset, I fine-tuned XLM-RoBERTa, a state-of-the-art multilingual Transformer-based model, to construct a supervised multi-class classifier capable of automatically identifying youth-specific content in speeches. This classifier was applied to approximately 1.6 million speeches from the UK and Germany spanning two decades, providing comprehensive and reliable information on MPs' substantive youth representation. This methodological approach enables a fine-grained and systematic analysis of youth representation across vast multilingual corpora and different political systems. Results show that younger MPs are significantly more likely to represent youth interests in their speeches, validating the hypothesis that descriptive representation influences substantive representation. By employing cutting-edge NLP techniques in a multilingual context, this research advances the study of substantive representation in political science, demonstrating the value of text-as-data methods for analysing political discourse and group representation. These findings underscore the broader implications of linking descriptive and substantive representation, while highlighting the benefits of NLP in studying political discourse.
Keywords: NLP, Text-as-data, Supervised Learning, Parliamentary Speeches, Group Representation