Who is most at-risk during pandemics? Using big data to replicate contemporary findings in historic populations

Wed-03

Presented by: Jana Berkessel

Jana Berkessel ^*, Tobias Ebert , Jochen Gebauer , Thorsteinn Jonsson , Shigehiro Oishi

Background: The COVID-19 pandemic has spurred an incredible amount of research that, for example, helped us understand how the COVID-19 pandemic spread and who was most affected by it. But, given that almost all these studies exclusively focused on COVID-19, we do not know whether achieved findings resemble general pandemic rules (that are also applicable in future pandemics) or are COVID-19-specific (and thus not applicable to future pandemics). In this project we demonstrate how big data can be used to replicate findings from the COVID-19 pandemic in historic pandemics, thus, allowing for more generalizable insights for future pandemics.

Objectives: To fight pandemics, one must understand what drives their spread. Socioeconomic status (SES) is relevant in this regard: During pandemics, individuals of lower SES are at risk because they do not have the opportunity to follow spread-prevention norms as well as people of higher SES. We question the universality of this tenet and argue that during early pandemic phases people of higher (not lower) SES are at the center of the spread.

Hypothesis: In early pandemic phases spread-prevention norms are not yet in place. People of higher SES possess heterogenous social networks, which put them at risk to catch and spread novel viruses. Later, people of lower SES are at elevated risk, because spread-prevention norms are in place (e.g., physical distancing) which people of higher SES can follow more thoroughly than people of lower SES (e.g., lower-status jobs often make physical distancing difficult).

Method: To test our hypothesis, we combined COVID-19 data from 3,132 US, 299 UK, and 400 German regions (Study 1) with a big data approach to replicate our finding in a historical setting, namely the Spanish Flu (Study 2). To do so, we web-scraped (and extensively validated) a genealogy database of historical persons. We approximated the SES of these historical persons based on their names. The final analysis included 1,159,920 individuals who were alive during the Spanish Flu in 1918/1919 with a total of 6,710 deaths during the pandemic.

Results: In Study 1, using growth curve modelling, we found that the COVID-19 pandemic initially spread in richer regions, and later spilled-over to poorer regions. In Study 2, using survival analytical techniques, we found that individuals of higher SES had a higher risk of dying during the onset of the 1918/19 Spanish Flu, while individuals of lower SES had a higher risk of dying during the later phases. For both studies, we ran various robustness and validity checks that supported our findings.

Conclusion: We uncovered a nuanced—and socially unjust—dynamic of pandemic spread: People of higher SES are most likely to import novel viruses and drive their initial spread, whereas people of lower SES carry the major burden once the pandemic unfolds. By combining recent with historical data, we here likely discovered a general pandemic rule, that can be used to guide containment measures in future pandemics. As such, our project is testimony to how big data approaches can complement conventional methods to arrive at more generalizable insights.