15:30 - 17:45
Thursday-Panel
Chair/s:
Akitaka Matsuo
Discussant/s:
Musashi Harukawa
Meeting Room C

Tom Paskhalis
Record Linkage with Text: Merging Data Sets When Information is Limited

Akitaka Matsuo, Kentaro Fukumoto
Legislators’ Sentiment Analysis Supervised by Legislators

Lukas Stoetzer, Heike Klüver
Measuring Parties' Evolving Issue Agendas

Hauke Licht
Cross-lingual supervised classification of political texts

Anna Palau, Andreu Casas, Luz Muñoz
Who is Effective at Amending Legislation? A Text Reuse Analysis of Which Amendments Make it into Law
Record Linkage with Text: Merging Data Sets When Information is Limited
Tom Paskhalis
New York University

The recent years have seen the emergence of new, more scalable ways to link information about different entities across multiple data sources. However, merging data sets when the number of variables used for record linkage is restricted remains challenging. In this paper I consider the case when the information is limited to a single multi-token text string. This situation often occurs when researchers work with organization names, user accounts or any other short labels. Using Lobbying Disclosure Act data I illustrate substantive implications that the choice of record linkage approach can have in empirical research. I review the existing approaches and consider three types of noise that can typically be encountered in this scenario: character-level, word-level or a combination of both. Furthermore, I conduct a simulation study showing the sensitivity of the existing approaches to the presence of errors occurring at different levels. The results suggest that the optimal choice of a record linkage approach depends on contextual knowledge about the most likely type of noise, as well as stress the need to conduct sensitivity analysis using different record linkage approaches.