QL Study of Translator Identity Using LLM: Chinese Ver of My Brilliant Friend

Submission 39

Poster-03

Presented by: Xu Shiyi

Xu Shiyi

This study adopts quantitative linguistics as its theoretical framework and employs the Gemini 2.5 model to perform multidimensional linguistic feature extraction and quantitative analysis on two Chinese translations of Elena Ferrante's novel My Brilliant Friend, translated by Chen Ying who is form Mainland China and Li Jingyi who is from Taiwan China, for the purpose of translator identity determination.

After text preprocessing, the study applies a hierarchical Prompt engineering approach, designing multi-level, task-oriented instructions to fully leverage the feature extraction capabilities of the large language model. At the lexical level, the model is guided to extract and count high-frequency words and key lexical chunks in both translations, analyzing lexical diversity, lexical density, and fidelity to the source vocabulary. At the syntactic level, the model quantifies sentence length, syntactic complexity, and the frequency of subordinate clauses. At the semantic and affective level, the model analyzes differences in emotional expression and tone handling between the two translators based on sentiment-related vocabulary and mood markers. At the discourse level, the model extracts cohesion devices, discourse markers, and paragraph structures to comprehensively characterize the translators’ strategies in textual organization.

To enhance transparency and interpretability, the study incorporates chain-of-thought reasoning, allowing the model to decompose its inference process step by step and explain the basis for its feature recognition at each stage, thereby avoiding “black-box” outputs. This approach not only increases the reliability of results but also provides theoretical support for fine-grained analysis of translation style differences.

After feature extraction, the study further applies clustering analysis and classification algorithms to statistically model and visualize the linguistic features of the translations. Dimensionality reduction techniques, such as principal component analysis (PCA) or t-SNE, are used to visualize the differences between the two translators in a multidimensional feature space, intuitively revealing style distributions. Quantitative indicators are then employed to assess the stylistic differences between the translators. Finally, machine learning algorithms are used to build a translator identity prediction model to identify and predict the corresponding translator for each text.

The results indicate that Chen Ying’s translation tends to preserve the syntactic complexity of the source text, whereas Li Jingyi's version favors concise and fluent Chinese expressions. And complex sentence structures, alternation of long and short sentences, and selective translation of certain cultural terms in Chen Ying’s translation contrast sharply with the simplified sentences and straightforward expressions in Li Jingyi's translation. Significant differences are also observed in lexical choices and emotional tendencies between the two translations.

Large language models provide a novel and efficient quantitative approach to translator identity determination, enabling rapid extraction and quantification of stylistic features from large volumes of text, and complementing traditional manual analysis. And by analyzing the two translations of My Brilliant Friend using large language models, it is possible not only to reveal the translators stylistic and identity characteristics but also to provide an important empirical case and theoretical support for the development of language research in the era of digital intelligence. This study offers new perspectives and technical references for translator identity research.