Fake-News Spread Index: Leveraging the ‘Wisdom of the Crowds’ and MrP to Make Representative Inference
37
Presented by: François t'Serstevens
The rise of social media platforms like Twitter has enabled the rapid spread of information, including news related to the COVID-19 pandemic. Journalistic fact-checking has been a dominant fact-checking method to identify and limit the spread of misinformation. However, its high time and monetary costs, and alleged liberal bias (Poynter, 2019) have limited its ability to prevent the spread of fake news. Wisdom-of-the-crowd-based approaches, recognized as a credible alternative to traditional fact-checking methods, have gained prominence for their independence from alleged biases, real-time availability and cost-effectiveness (Allen, Arechar, Pennycook, & Rand, 2021). This quality makes them particularly valuable in addressing fake news in a way that is universally accepted.
The objectives of this paper are twofold. (1) First, we aim to refine the methodological foundations of the wisdom of the crowd by proposing various ways to aggregate the ratings of laypeople. (2) Second, we aim to leverage the aggregated ratings to identify the characteristics of fake news sharers and generate state-wide indices of fake news sharing in the United States.
The study features a survey of 2151 respondents that rated the accuracy 1 of 5513 tweets on political topics and COVID-19 posted in 2022 on a scale of 1 to 4. We propose a series of aggregation methods, each representative of a given definition of consensus. The aggregation methods include simple and weighted averages of the ratings, an approach that enforces the parity of Democrats and Republicans, and a Bayesian multilevel model that accounts for the predisposition of political parties towards a set of predefined topics. We find that politically and socially balanced samples are necessary for the accuracy of the wisdom-of-the-crowds-based ratings, but that more sophisticated models can partly alleviate this constraint by incorporating prior information on the rater’s demographics and political perspectives.
We supplement the Twitter author data with estimates of author political alignment (Mosleh & Rand, 2022), gender, and age (Wang et al., 2019). Leveraging a Bayesian multilevel regression with post-stratification (MrP), the selection aggregated metrics, and the extended author information, we model the importance of socio-demographic characteristics on fake news sharing. We do not find that any of the included demographic characteristics consistently have a significant impact on fake news sharing across all models. On the other hand, we find that Democrats are less likely to spread fake news than Republicans in all models, i.e., even when the same amount of Democrats and Republicans ratings are used.
Lastly, the models are post-stratified to the census population to generate state-wide predictions. Though the estimates vary slightly based on the used aggregation metrics, the ordinal rankings of most states are consistent, e.g. DC and Florida consistently remain the lowest and highest fake news sharers respectively. The resulting indices, along with the transparent and replicable methodology employed, provide policymakers with a tool to estimate where fake news policies are most needed. The ratings provided are fully consensus-based, 2 i.e., beyond any bias allegations, and can be scaled to include more posts or a finer geographical area.
The objectives of this paper are twofold. (1) First, we aim to refine the methodological foundations of the wisdom of the crowd by proposing various ways to aggregate the ratings of laypeople. (2) Second, we aim to leverage the aggregated ratings to identify the characteristics of fake news sharers and generate state-wide indices of fake news sharing in the United States.
The study features a survey of 2151 respondents that rated the accuracy 1 of 5513 tweets on political topics and COVID-19 posted in 2022 on a scale of 1 to 4. We propose a series of aggregation methods, each representative of a given definition of consensus. The aggregation methods include simple and weighted averages of the ratings, an approach that enforces the parity of Democrats and Republicans, and a Bayesian multilevel model that accounts for the predisposition of political parties towards a set of predefined topics. We find that politically and socially balanced samples are necessary for the accuracy of the wisdom-of-the-crowds-based ratings, but that more sophisticated models can partly alleviate this constraint by incorporating prior information on the rater’s demographics and political perspectives.
We supplement the Twitter author data with estimates of author political alignment (Mosleh & Rand, 2022), gender, and age (Wang et al., 2019). Leveraging a Bayesian multilevel regression with post-stratification (MrP), the selection aggregated metrics, and the extended author information, we model the importance of socio-demographic characteristics on fake news sharing. We do not find that any of the included demographic characteristics consistently have a significant impact on fake news sharing across all models. On the other hand, we find that Democrats are less likely to spread fake news than Republicans in all models, i.e., even when the same amount of Democrats and Republicans ratings are used.
Lastly, the models are post-stratified to the census population to generate state-wide predictions. Though the estimates vary slightly based on the used aggregation metrics, the ordinal rankings of most states are consistent, e.g. DC and Florida consistently remain the lowest and highest fake news sharers respectively. The resulting indices, along with the transparent and replicable methodology employed, provide policymakers with a tool to estimate where fake news policies are most needed. The ratings provided are fully consensus-based, 2 i.e., beyond any bias allegations, and can be scaled to include more posts or a finer geographical area.