Distilling the neural code of invariant visual word recognition

Wed—HZ_8—Talks8—7701

Presented by: Aakash Agrawal

Aakash Agrawal ^*

Neurospin, CEA Gif-sur-Yvette, Paris, France

Learning to read significantly rewires the brain, placing constraints on the visual system. For example, to distinguish between anagrams like FORM and FROM, we need to encode both individual letters and their exact positions, despite changes in size, font, or position. Earlier models proposed "bigram units” that encode letter pairs (e.g., FO, OR in FORM and FR, RO in FROM) to differentiate such words. However, recent studies did not find evidence for these units, instead suggesting a model where letters are encoded by their position. To empirically test this competing hypothesis at the circuit level, we trained convolutional neural networks (CNNs) to recognize words. After training, a small group of units became specialized for word recognition, similar to the Visual Word Form Area (VWFA). These units responded to specific letters and their positions, either from the left or right of the word. A shift from retinotopic to position-based coding was achieved by "space bigram" units, which detect letter positions relative to blank spaces. This coding approach aligns with wirelessly recorded neural activity from the inferior temporal (IT) cortex of macaque monkeys trained on orthographic tasks over five successive days. While retinotopic based coding was dominant in these monkeys, we did find neurons encoding ordinal positions. Thus, our CNN-based simulations offer a possible model for how the brain processes written words.

Keywords: Reading, convolutional neural network, orthography