Optimal Transport as a Computational Framework for Low-Level Orthography

Wed—HZ_8—Talks8—7703

Presented by: Jack E. Taylor

Jack E. Taylor ^{1, 2}^*, Rasmus Sinn ¹, Cosimo Iaia ¹, Christian J. Fiebach ^{1, 3}

¹ Department for Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany, ² School of Psychology and Neuroscience, University of Glasgow, Glasgow, United Kingdom, ³ Brain Imaging Center, Goethe University Frankfurt, Frankfurt am Main, Germany

In the earliest stages of visual word recognition, readers have to recognise and combine strokes into characters, radicals, and letters. Human readers are able to tolerate a great deal of variability in the visual features of orthography, including considerable variability in typography, and geometric features like size and rotation. Despite this, cognitive models of visual word recognition have traditionally either simplified or overlooked this early stage of orthographic processing. We suggest that optimal transport theory may provide a computationally explicit framework for modeling early orthographic processing. In a representational similarity analysis of electroencephalography (EEG) data from a letter recognition task, we show that Wasserstein distances, capturing a global cost between letter shapes, align with neural representational dissimilarities, outperforming existing models based on the degree of overlap between pixels. We compare the model’s performance to that of artificial neural networks trained to recognise letters from images, showing that Wasserstein distance performs similarly to features extracted from Resnet-50 and CORnet-Z, despite being more computationally explicit than a neural network, and requiring no training. We additionally show that an optimal transport framework can be extended to capture invariance to geometric transformations like rescaling, rotation, and translation. Finally, we show how the optimal transport approach that we suggest can account for the rich typography of real world letters, and relate it to existing approaches to modeling visual letter and word recognition.

Keywords: orthography, vision, representational similarity analysis, reading