Literary Canons and Algorithmic Framing: Analysis of LLM-Generated Paratexts

Submission 58

SP01-01

Presented by: Mengyuan Zhou

Mengyuan Zhou^*

Department of Translation, The Chinese University of Hong Kong

As large language models (LLMs) increasingly mediate how readers encounter literature through auto-generated summaries, keywords, and promotional descriptions, they begin to shape not only how books are discovered, but also how they are valued. These forms of AI-generated paratext, while often overlooked, now serve as the initial point of contact between global readers and literary works. In this context, the literary canon is no longer constructed solely by critics, institutions, or publishers, but also by algorithms.

This pilot project examines how LLMs participate in the framing of literary prestige. It focuses on a seemingly simple but culturally significant form of output: the book description. The central research question asks whether these machine-generated texts reflect existing cultural hierarchies or participate in reshaping them when applied to different literary traditions. The study compares descriptions of two groups of contemporary novels. The first group includes works by Nobel Prize laureates in literature since the year 2000, representing texts that have been institutionally recognized at the highest level of global literary prestige. The second group includes works from the Global South that have received major national or regional literary prizes.

Although both groups are marked by literary recognition, they circulate through different mechanisms of validation. By comparing how LLMs describe these works, the study explores whether some novels are more likely to be framed in terms of universality, authority, and timelessness, while others are associated with locality, cultural specificity, or regional context. The project also considers how different input formats influence the model’s output, including metadata-only prompts and platform-style prompts that more closely resemble commercial or public-facing summaries.

Rather than aiming for comprehensive coverage, this pilot study offers a focused, small-scale investigation designed to establish a reproducible method and to pose larger conceptual questions. Using a controlled set of twenty novels, the project generates multiple descriptions under two prompt conditions and analyzes the results using topic clustering, lexical framing indices, and consistency metrics. The analysis investigates whether thematic patterns and framing tendencies are consistent across generations, and how these patterns relate to the institutional or regional identity of the original works.

While the study is limited in scope, it establishes a flexible framework for future inquiry. It contributes to the growing body of research that combines cultural analytics with critical AI studies and introduces new ways of understanding how literary value is encoded and circulated through algorithmic systems.

Ultimately, the study argues that cultural sensitivity must become a central concern in the development and deployment of generative AI. As LLMs increasingly produce the paratexts that shape global reading experiences, we need better tools to understand how these systems frame literature, whose narratives are being amplified, and what implicit hierarchies are being reproduced. The literary back cover, once written by editors or marketers, has become a new site of algorithmic interpretation. As scholars, educators, and readers, we must ask what forms of knowledge and authority are embedded in these generated texts, and how they affect the way literature is encountered in a digital age.