Submission 191
When Averages Shine: Computing Group- and Individual-Level Concept Representations Using Centroid Analysis
SymposiumTalk-03
Presented by: Aliona Petrenco
While large-scale vector space models can be used to construct general, population-level meaning representations, they are often not suited for measuring concepts in specific individuals or groups, or within particular situations and contexts. To address this gap, the present work introduces centroid analysis—a computational method for quantifying variability in meaning representations by mapping open-ended verbal responses onto a semantic vector space and representing concepts as geometric centres (centroids) of the responses they elicit.
We evaluate this method using two distributional semantic models across several calculation methods, reference lexicon sizes, response types, and datasets with tasks ranging from single word substitutions to single and multiple free associations and multiple feature generation.
At the group level, results show that centroid analysis performs best with multiple free associations (about 70 unique and 245 total responses per cue), using fastText for meaning-to-vector mapping for responses and cue concepts, and considering each response in the centroid calculation as often as it occurred in the data. In this setting, the cue concept is identified as the closest neighbour in the semantic neighbourhood of its response centroid in 50% of cases and within the 20 closest neighbours in 85% of cases.
At the individual level, the best results are obtained using fastText and including at least eight responses per item per participant in the centroid calculation. In this setting, the cue concept is the closest neighbour of its response centroid in 22% of cases and within the 20 closest neighbours in 60% of cases.