The Multidimensional Nature of Semantic Transparency in a Cross-Linguistic Perspective: Evidence from Human Intuitions, Computational Estimates and Processing Data for Chinese Compounds

Submission 488

Posterwall-42

Presented by: Jing Chen

Jing Chen ¹, Emmanuele Chersoni ², Chu-Ren Huang ², Marco Marelli ¹

¹ Department of Psychology, University of Milano-Bicocca, Italy

² Department of Language Science and Technology, The Hong Kong Polytechnic University, China

Semantic transparency is a key construct for understanding how complex words are represented and processed in the human brain, yet it has been conceptualized and operationalized in diverse ways across studies. In this study, we validate the multidimensionality of this construct in Mandarin Chinese. We first construct a comprehensive database consisting of 2,675 nominal compounds, including human ratings at both the constituent and the compound levels and computational estimates from traditional distributional semantic models (DSMs) as well as large language models (LLMs). We then tested how these diverse measures predict lexical decision performance.

Our factor analysis reveals that semantic transparency in Chinese is fundamentally multidimensional, with measures concerning semantic contribution of each constituent and semantic predictability of overall compounds representing distinct factors in the latent structure. Critically, these derived composite factors, particularly measuring semantic contribution of the second constituent, significantly predict semantic transparency effects. Our work extends the cross-linguistic validity of the multidimensionality hypothesis, previously established in English and German, to Mandarin Chinese. Additionally, we found that LLMs provide more closely aligned ratings for compound-level transparency, while DSMs retain advantage for the constituent-level transparency, providing methodological suggestions for using computational estimates to augment psycholinguistic datasets on semantic transparency.