Abstract
Animals develop and use cognitive maps, which are internal models of the external environment, to understand the spatial characteristics of their natural environment. Previous studies have shown that a hierarchical structure of recurrent neural networks contributes to the extraction of high-level concepts in sequential sensorimotor experiences. However, the previous studies did not focus on the spatial aspects of these experiences and did not acquire cognitive maps. We modified previous models and trained the proposed model with the visuomotor experiences of an agent in a simulated two-dimensional environment. The proposed model was trained to predict future visual and motion inputs even when only one modality was provided (crossmodal prediction). The proposed model correctly predicted visual images, even when the agent experienced unknown paths. Comparisons of the crossmodal predictions of the models under different conditions revealed that the crossmodal predictions related to motion resulted in self-organization of the cognitive map. Further experiments of mental simulation abilities showed that two-way crossmodal predictions (from vision and motion only) were required for consistent generation of vision and motion. These results indicated that predictive learning involving integrated vision and motion was necessary for self-organization of spatial recognition with a cognitive map.
Get full access to this article
View all access options for this article.
