Lianhuanhua in the Age of AI: Multimodality, Distant Perceiving, and the Ethics

Submission 62

SP04-01

Presented by: Aijia Zhang

Aijia Zhang

Heidelberg University

The accelerating AI revolution is bringing a new range of possibilities for digital humanities research that has heavily relied on rule-based and statistical tools. Methods like keyword search, n-gram analyses, metadata visualization, and basic image annotation have facilitated forms of “distant reading” (Moretti) and “distant viewing” (Tilton and Arnold) and have proven value in navigating large corpora. However, as the forms of data become complex, new tools and methods are required to meet the needs of research into multimodal artefacts. This paper focuses on lianhuanhua, a sequential illustrated picture book form that circulated widely in China during the 20th century, as a case to study the promises and limits of digital humanities in the age of AI. Produced in palm-sized format, with an image-to-text layout, and published in tens of thousands of titles and hundreds of millions of copies, lianhuanhua presents itself as a promising medium for data-driven multimodal computational research. By reviewing the AI-assisted tools I used to approach lianhuanhua, I reflect on the challenges and affordances of tools like transformer-based OCR, unsupervised computer vision models, and text-to-image generative AI, and by discussing the reconfigured practices of distant reading, I propose “distant perceiving” as a method to interpret multimodal data in AI-assisted cultural analytics.

My study addresses these challenges by assembling and processing a large corpus of approximately 3,000 volumes of lianhuanhua collected at Heidelberg University, using text mining, machine learning, and computer vision techniques to analyze their visual and textual dimensions. Through distant perceiving, I aim to uncover the ideological construction underneath the narratives of lianhuanhua stories. Concretely, my project employs three core methods: 1) AI-assisted OCR to extract text from digitized lianhuanhua panels, 2) convolutional neural networks to identify and cluster visual features (such as recurring character types, gendered depictions, and stylistic patterns), and 3) natural language processing (using Chinese segmentation and topic modeling) to analyze thematic and narrative patterns. These tools enable new forms of inquiry that are impossible through human close reading alone, such as tracing major thematic changes of lianhuanhua across decades, identifying the visual construction of women figures in mass-market productions, and extracting strategies adopted in the ideological propaganda.

Yet, the project also highlights the methodological and ethical tensions that accompany AI-assisted cultural analytics, either unique to the medium of lianhuanhua or longstanding in the field of digital humanities. First, on the methodological pane, when I use clustering algorithms to cluster character images or extract narrative themes, regarding the question ofion of authorship and agency, how to balance statistical regularities and cultural nuances in analysis? Then, regarding the application of tools, how to generate clean data with the OCR model I applied, which struggles with some lianhuanhua volumes in traditional Chinese vertical layout? This also brings questions of authenticity and authority: what does it mean to write cultural history based on machine-driven correlations, especially when the results may flatten ambiguity; how to make well-grounded cultural historical arguments with statistics while defending their legitimacy? Second, on the ethical aspect, regarding the issue of bias and opacity, how to design a fair annotation scheme for recognizing female characters in lianhuanhua to avoid a priori gender stereotypes; how to interpret the results generated by the black box of the algorithm? These limitations and challenges force me to confront the interpretive blind spot of algorithmic perception with a rigorous research design and a considered application of computational methods.

Rather than viewing the mentioned challenges as obstacles, I argue that they can serve as productive entry points for rethinking the epistemology of digital humanities in Asia and beyond, by transforming the limitations into affordances of each technique. In the case of lianhuanhua, machine-assisted methods do not replace traditional modes of reading and contextualization, but instead generate patterns, anomalies, and provocations that necessitate re-examination through historically grounded close analysis. For instance, when computer vision identifies a recurring template for "female soldier" characters, close reading of individual panels reveals the interplay of gender ideology, visual convention, and historical moment. When topic modeling surfaces clusters of terms linked to "the CCP", "the war", and "the people", I can connect these to the ideological struggles, national identity building, and the propagandistic strategies by placing these terms back into their contexts with the aid of large language models. In this way, the encounter between distant perceiving and close reading becomes dialogical rather than oppositional, focusing on generalities while maintaining specificities.

The broader implications of this case study of lianhuanhua extend beyond contemporary Chinese history and popular culture. First, it highlights how multimodal cultural corpora that consist of both words and images require scholars to move beyond purely text or image approaches in digital humanities. Second, it underscores the importance of critical reflexivity by calling attention to the biases, limitations, and epistemic stakes of AI tools when selecting from the toolbox. Third, it contributes to the ongoing discussions about the global scope of digital humanities, emphasizing how Asian cultural materials, with their unique histories of production, circulation, and preservation, both challenge and enrich theoretical frameworks developed primarily in Euro-American contexts, and introduce a high-quality pre-aligned dataset to the field.

In sum, I argue that lianhuanhua offers a case study for a fertile site for thinking through the opportunities and dilemmas of AI-assisted cultural analytics. This study thus shows how large-scale, multimodal analysis can reveal new dimensions of visual-texutal culture, while also raising fundamental questions about bias, authenticity, and the interpretive horizons of machine perception. By placing Chinese lianhuanhua in dialogue with global debates in digital humanities, I reflect on the possibilities and limits of engaging with mass cultural archives in the age of AI.