Reappraising Transformations in Modern China Through the Lens of AI

Submission 46

OP2-01

Presented by: Thorben Pelzer

Cécile Armand ¹, Christian Henriot ², Thorben Pelzer ³

¹ French National Centre for Scientific Research

² Aix-Marseille University

³ Hong Kong University of Science and Technology

From Computation to Interpretation: Reappraising Transformations in Modern China through the Lens of AI

While a great deal of energy is currently invested in developing highly complex and sophisticated Artificial Intelligence (AI) models, these models are not always of immediate relevance to humanities scholars, nor do they necessarily produce interpretable results. This panel shifts the focus to a crucial yet often overlooked challenge: how can such methods be applied to concrete historical questions in order to generate substantively meaningful new knowledge? And how can historians integrate these unfamiliar tools into their traditional workflows?

Our testing ground is the study of modern China—specifically, the transformation of knowledge elites from the late Qing dynasty to the early People’s Republic (1850–1950s). This period was marked by profound upheavals: the abolition of the civil service examinations, foreign intrusion and the treaty-port system, the rise of nationalism and communism, and China’s contested integration into the “family of nations.” Against this backdrop, we address two interrelated questions: (1) How can we understand the resilience of Chinese elites in this turbulent era? (2) How can we conceptualize the complex participation of Chinese actors in an emerging global order beyond the dominant “nationalism versus imperialism” or “impact–response” paradigms?

The scattered nature of the documentation and persistent national historiographical divides have long posed challenges for historians of modern China. The dominant China-centred approach, which often fostered the paradigm of “Chinese exceptionalism,” has obscured a fuller understanding of China’s long-term historical trajectory and its integration into the globalized world. At the same time, the growing but uneven availability of digitized corpora and artificial intelligence (AI) tools, combined with the rise of transnational and global history, offers new opportunities. Yet these developments also expose historians’ methodological vulnerabilities: we often lack the training to harness such technologies and must grapple with issues of scale, complexity, heterogeneous source genres, diverse languages and scripts, as well as AI biases and opacity.

This panel tackles these twin issues by combining epistemological and methodological reflection with concrete case studies. The two papers presented by Thorben Pelzer and Cécile Armand demonstrate how computational methods can illuminate social transformations in China across political regimes and national boundaries, while also overcoming the limitations of earlier approaches—whether group-based analyses or biographical studies centred on famous intellectuals and scientists. Christian Henriot introduces the digital ecosystem developed by the “Elites, Networks, and Power in modern China” (ENP-China) project, designed to bridge the gap between computation and historical hermeneutics, while also mapping the challenges and avenues opened by the incorporation of large language models into historical research.

Thorben Pelzer, “Imperial Governance in the Taishū jinji-roku: Data Mining the National Diet’s Next Digital Library.” An important source of understanding the personal and institutional makeup of the Japanese Empire and its colonial environs from a Japanese viewpoint is the Taishū jinji-roku, the “Records of Public Individuals.” Overall, 17 pre-war volumes of the Records are available, leading to roughly 400,000 biographical entries. The circa 20,000 individual biographies contained in the overseas volumes of 1940 and 1943 include numerous observations that help understand the imperial governance of East Asia, including transnational entanglements between Chinese and Japanese individuals active in the Japan-controlled Sinophone regions. Given the large quantity of contained information, the Records can only realistically be parsed via automatised data mining. The paper first discusses ways of segmenting and structuring the optically recognised data output of NDLOCR, an advanced model of the National Diet Library’s “Next Digital Library” service. In the second half, the paper provides a first prosopographic exploration of the mined data using clustering techniques and related digital means of distant reading.

Cécile Armand & Christian Henriot, “The Great Talent Divergence: A Computational Reassessment of China’s Doctoral Elite (1905–1962)”. This paper investigates the trajectories of more than 4,600 Chinese PhD holders trained abroad between 1905 and 1962, drawing on dissertation catalogues compiled by bibliographer Yuan Tongli (1895–1965) and enriched with biographical data. Employing a mixed-method approach that integrates AI-assisted data extraction with established tools of statistical analysis, data visualization, and micro-biographical portraits, it explores how these scholars’ educational paths and careers were shaped by shifting national priorities, geopolitical upheavals, and the rise of U.S. academic dominance. A central contribution lies in its empirical analysis of the Communist Revolution’s impact on the diverging fates of these intellectuals. While some secured positions within the People’s Republic of China, the majority experienced repression or remained in exile. By moving beyond nationalist narratives of intellectual return, this study highlights diasporic knowledge production, institutional displacement, and strategies of personal survival. Ultimately, it calls for a re-evaluation of transnational intellectual mobility and its entanglements with authoritarianism, academic freedom, and global hierarchies of knowledge. Methodologically, this paper advances a hybrid approach in which computational and manual methods mutually reinforce one another, with human validation and close reading of historical narratives used to evaluate and enrich AI-generated outputs.

Christian Henriot, “Taming the Digital Dragon: A Digital Ecosystem for the Study of Modern China” introduces a historian-centred digital infrastructure designed to address the “scale paradox” of modern research: the abundance of digitized sources and the enduring need for contextual interpretation. At the core lies the Modern China TextBase (MCTB), a curated repository of over 40 corpora—newspapers, reports, archival records—enriched with metadata and provenance information. Alongside, the Modern China Biographical Database (MCBD), the Modern China Geospatial Database (MCGD), and the HistText search interface provide structured environments for linking textual, biographical, and spatial data with computational methods such as natural language processing (NLP), topic modelling, and classification. Anchored in sustainable infrastructures and collaborative practices, this ecosystem ensures transparency, reproducibility, and long-term accessibility. More than a set of tools, it constitutes a methodological intervention, demonstrating how computational approaches can amplify historical interpretation and open new scales of discovery for the study of modern China.