Computer Vision: Recognizing objects faster and more accurately with CNNs

Despite constant body, head, or eye movements, our visual perception of the objects around us remains stable, even as the physical information hitting our retinas is constantly changing. Scientists from the RIKEN Institute in Japan have studied all the unnoticed eye movements we make and shown that they allow us to recognize objects in a stable way. These results can be applied to computer vision and are particularly useful for autonomous driving systems. They published their study entitled “ Motor-related cues support localization invariance for stable visual perception.” in the journal PLOS Computational Biology.

RIKEN, the largest comprehensive research institution in Japan, is recognized worldwide for its high-quality research in a variety of scientific disciplines. For Brain Sciences, Andrea Benucci is director of the Laboratory of Neural Circuits and Behavior and author of the article.

He explains :

“Our lab investigates the neural basis of sensory processing with a particular focus on vision. We are particularly interested in understanding the computational rules used by populations of neurons in the visual cortex to process visual information: how does the coordinated activity of groups of neurons “talking” to each other via action potentials generate a visual perception? What are the relevant spatial and temporal scales used to process visual information? To answer these questions, we use the primary visual cortex of mice trained in behavioral tasks as a model system. The experimental tools we use are based on state-of-the-art methods in optogenetics, optical imaging and electrode recording. »


Our ability to perceive a stable visual world with continuous body, head, and eye movements has long fascinated researchers in the field of neuroscience. The various studies of perceptual stability have highlighted a variety of computational and physiological phenomena acting on multiple spatio-temporal scales and regions of the brain. Neural copies of movement commands sent through the brain each time we move could allow the brain to report our own movements and keep our cognition stable.

Besides this stable perception, eye movements and their motor copies could also help us to stably recognize objects in the world, but how this happens remains a mystery.

The Convolutional Neural Network

Andrea Benucci and his team designed a CNN whose architectures were inspired by the hierarchical signal processing of the mammalian visual system to optimize the classification of objects in a visual scene during movement.

In the beginning, CNN was trained to classify 60,000 black-and-white images into 10 categories, and it succeeded. However, when tested with shifted images that mimic naturally changing visual input during eye movements, performance dropped significantly to randomness. The researchers were able to solve this problem by training it with shifted images, including the direction and magnitude of the eye movements that caused the shift. Adding eye movements and their motor copies to the network model allowed the system to better handle visual noise in images.

Andrea Benucci says:

“This breakthrough will help avoid dangerous errors in computer vision. With more efficient and robust computer vision, self-driving cars are less likely to use pixel manipulation, aka enemy attacks, to label a stop sign as a street lamp, or military drones to misclassify a hospital building as an enemy target. »

According to Andrea Benucci, it would be possible to extrapolate these results to the real world of computer vision, which explains:

“The advantages of mimicking eye movements and their efferent copies are to ‘force’ an image processing sensor to have controlled motion patterns, while informing the image processing network responsible for processing the associated images of the self-generated movements, making machine vision more robust and robust.” would make similar to what is felt in human vision. »

This research continues in collaboration with colleagues of Andrea Bonucci working with neuromorphic technologies. The idea is to implement real silicon-based circuits based on the principles highlighted in this study and test whether they improve machine vision capabilities in real applications.

Sources of the article:

Benucci A (2022) Motor-related cues support location invariance for stable visual perception.

PLoS Computer Biol. doi:10.1371/journal.pcbi.1009928

#Computer #Vision #Recognizing #objects #faster #accurately #CNNs

Leave a Comment

Your email address will not be published.