13:10 - 14:50
P13
Room:
Room: Club D
Panel Session 13
Adam Ramey - More Than Words: Using Text to Predict Psycho-political Traits
Allison Koh - Tracking Transnational Trolls: Identifying Targeted Harassment Against Exiled Activists in Foreign Influence Operations
Thomas Robinson - SyGNet: Synthetic Data for the Social Sciences using Deep Learning
 
SyGNet: Synthetic Data for the Social Sciences using Deep Learning
P13-3
Presented by: Thomas Robinson
Thomas Robinson
Durham University
At the forefront of social science research, novel techniques are being developed to enable researchers to make robust inferences from complex data. Effective use of these tools and methods rests on demonstrations of their performance, which in turn relies on using the right kind of data to test them. Simulations are hard to conduct well because real social science data is so complex: simplified tests using parametric data may not comport well with actual social science applications. Conversely, benchmarking on well-known studies leaves researchers unable to determine performance since the population parameters are unknown. In this paper I introduce a new solution using synthetic data: a strategy in which the underlying relationships between variables in real-world data are learned, and from which an arbitrary number of entirely new but realistic observations can be generated (i.e. “synthesised”). I use generative adversarial networks (GANs) – a form of deep learning -- to model aggregated social science datasets in order to synthesise realistic looking, but brand new, data. I then show how this data can be used to benchmark statistical designs and methods, and contribute new software for researchers to use.