Unsupervised Learning Captures Scene Category Specific Information During Early and Late Processing While Failing to Capture High-Level Scene Structure.

Wed-P12-Poster III-105

Presented by: Aylin Kallmayer

Aylin Kallmayer, Melissa Vo

Scene Grammar Lab, Department of Psychology, Goethe University Frankfurt, Germany

Real-world scenes are complex and rich in information, yet we understand scenes quickly and seemingly effortlessly. Key to understanding vision is understanding the computations and the structure of representations that support efficient processing. We hypothesize that we exploit scene structures by learning hierarchical object-to-object and scene-to-object relations captured by a scene grammar. Does unsupervised learning automatically lead to representations that reflect properties of scene grammar? To assess how well scenes generated by generative adversarial networks (GANs) capture real-world scene structure perceived over time we conducted an EEG experiment. Participants viewed 180 generated scenes across six categories (30 exemplars per category) and performed a surprise categorization task. Generated scenes varied in their “realness” as assessed by three different measures from previous experiments: ratings, false-alarm (FA) rates, and categorization performance for 50 and 500ms presentation times. We were able to decode scene category from generated scenes with peak performances around 140 and 640 ms, suggesting that generated scenes contain scene category specific information used during early, as well as late processing. To test whether activation patterns across time could predict our behavioral measures, we ran ridge regularized regressions for each timepoint. Models predicting ratings and FA rates in the 50 ms condition achieved highest performance (peak around 330 ms). Surprisingly, categorization performance could not be predicted by the neural signal irrespective of presentation times. We conclude that while generated scenes contain scene category specific information during early and late processing, they fail to capture high-level scene structure usually exploited for scene categorization.

Keywords: scene perception, scene Grammar, unsupervised learning, generative adversarial networks, eeg