Masked image modeling as a framework for self-supervised learning across eye movements

Weiler, Robin; Brucklacher, Matthias; Pennartz, Cyriel; Bohte, Sander

doi:10.1007/978-3-031-72341-4_2

R. Weiler (Robin), M. Brucklacher (Matthias), C. Pennartz (Cyriel) and S.M. Bohte (Sander)

2024-09-17

Masked image modeling as a framework for self-supervised learning across eye movements

Presented at the 33rd International Conference on Artificial Neural Networks (September 2024), Lugano, Switzerland

To make sense of their surroundings, intelligent systems must transform complex sensory inputs to structured codes that are reduced to task-relevant information such as object category. Biological agents achieve this in a largely autonomous manner, presumably via self-super-vised learning. Whereas previous attempts to model the underlying mechanisms were largely discriminative in nature, there is ample evidence that the brain employs a generative model of the world. Here, we propose that eye movements, in combination with the focused nature of primate vision, constitute a generative, self-supervised task of predicting and revealing visual information. We construct a proof-of-principle model starting from the framework of masked image modeling (MIM), a common approach in deep representation learning. To do so, we analyze how core components of MIM such as masking technique and data augmentation influence the formation of category-specific representations. This allows us not only to better understand the principles behind MIM, but to then reassemble a MIM more in line with the focused nature of biological perception. We find that MIM disentangles neurons in latent space without explicit regularization, a property that has been suggested to structure visual representations in primates. Together with previous findings of invariance learning, this highlights an interesting connection of MIM to latent regularization approaches for self-supervised learning. The source code is available under https://github.com/RobinWeiler/FocusMIM.

Additional Metadata
Keywords	Self-supervised learning, Representation learning, Generative model
Persistent URL	doi.org/10.1007/978-3-031-72341-4_2
Series	Lecture Notes in Computer Science
Project	Human Brain Project - SGA3
Conference	33rd International Conference on Artificial Neural Networks
Grant	This work was funded by the European Commission 7th Framework Programme; grant id h2020/945539 - Human Brain Project - SGA3 (HBP-SGA3)
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Weiler, R., Brucklacher, M., Pennartz, C., & Bohte, S. (2024). Masked image modeling as a framework for self-supervised learning across eye movements. In Proceedings of Artificial Neural Networks and Machine Learning, ICANN 2024 (pp. 17–31). doi:10.1007/978-3-031-72341-4_2

View at Publisher

See Also
techReport Masked image modeling as a framework for self-supervised learning across eye movements R. Weiler (Robin), M. Brucklacher (Matthias), C. Pennartz (Cyriel) and S.M. Bohte (Sander)
software\|data Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movement R. Weiler (Robin)

Masked image modeling as a framework for self-supervised learning across eye movements

Publication

Publication

techReport
Masked image modeling as a framework for self-supervised learning across eye movements

software|data
Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movement

Address

CWI researchers

Questions or comments?

Masked image modeling as a framework for self-supervised learning across eye movements

Publication

Publication

techReport Masked image modeling as a framework for self-supervised learning across eye movements

software|data Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movement

Workflow

Workflow

Add Content

techReport
Masked image modeling as a framework for self-supervised learning across eye movements

software|data
Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movement