GPT-2 Generates Biased Texts

PS6-1

Presented by: Charles 1, William Marx

Charles 1 , William Marx

Dartmouth College

How should researchers create placebo conditions in survey experiments? A recent paper by Porter and Velez recommends that researchers should use automated processes to construct a corpus of related placebo conditions, which are then randomly sampled from during survey implementation. While we agree with their general recommendation, we suggest that researchers use caution if they employ the tool Porter and Velez recommend for placebo construction, OpenAI’s semi-supervised language model GPT-2. Through a series of simulations, we conduct the largest assessment yet of GPT-2’s biases by measuring the sentiment of 1,083,750 potential placebos generated across 4,335 unique seed phrases. We show that the polarities of placebos vary tremendously across seed phrases based on the race/ethnicity, gender, sexual and religious orientation, political affiliation and ideology, or state or territory demonym included in them. We also show considerable heterogeneity in the substance of placebos across seed phrases. Comparing our results to a similar analysis of the text corpus used to train GPT-2, we find that the language model does not just learn to reproduce biases in source material, but also creates its own. Taken together, our findings suggest that researchers should exercise great caution in using GPT-2 as an agnostic approach to generating placebos — particularly for identity-focused experiments. Finally, we offer a Google Colaboratory notebook to mitigate the effects of these biases and provide best practice recommendations for automatic placebo generation.