Large language models as artificial semantic annotators

Wed—HZ_12—Talks8—8104

Presented by: Emin Çelik

Emin Çelik ^*, Mariya Toneva

Max Planck Institute for Software Systems Saarbrücken, Germany

Comprehensive study of how the semantic representation of a word changes with context via human behavior or brain recordings is difficult due to the sheer number of possible contexts. Here, we considered large language models (LLMs) as a model organism that is tasked with assessing word meaning. As a first step, we tested whether LLMs can rate a large set of individual words across a number of semantic properties similarly to the way humans do. The words included abstract and concrete nouns, verbs, and adjectives. The semantic properties also covered a wide range, from sensory and motor to social and emotion-related properties. Specifically, we used GPT-4 Turbo and Llama3.1-8B to produce such ratings by using prompts that mimicked the original queries presented to human raters. The prompts additionally included a few words and their corresponding ratings to facilitate accurate rating generation. We found that there was a close match between these rating estimates and those produced by humans. Overall, our results suggest that LLMs are useful tools to semantically annotate a large pool of words out-of-context. In the future, we plan to use our method to annotate a whole book with words in context and model fMRI and MEG data while subjects listen to this audiobook.

Keywords: LLMs, semantics, concrete, abstract, annotation