Self-Supervised Learning of Color Constancy

Markus R. Ernst,Francisco M. López,Arthur Aubret,Roland W. Fleming,Jochen Triesch
2024-04-12
Abstract:Color constancy (CC) describes the ability of the visual system to perceive an object as having a relatively constant color despite changes in lighting conditions. While CC and its limitations have been carefully characterized in humans, it is still unclear how the visual system acquires this ability during development. Here, we present a first study showing that CC develops in a neural network trained in a self-supervised manner through an invariance learning objective. During learning, objects are presented under changing illuminations, while the network aims to map subsequent views of the same object onto close-by latent representations. This gives rise to representations that are largely invariant to the illumination conditions, offering a plausible example of how CC could emerge during human cognitive development via a form of self-supervised learning.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to explore how color constancy (CC) is acquired through self-supervised learning mechanisms during human development. Specifically, the researchers propose a hypothesis: color constancy can be learned by observing changes in objects under different lighting conditions. To test this hypothesis, they constructed a new dataset called C3R, which includes different colored cubes presented under rapidly changing lighting conditions. Additionally, the researchers used a self-supervised learning method based on temporal contrast loss (SimCLR-TT) to demonstrate how to learn lighting-invariant color representations through this mechanism. The study found that this method successfully learned color constancy and outperformed traditional supervised learning methods that rely solely on color augmentation. Experimental results showed that higher-level representations in the network hierarchy significantly outperformed lower-level representations and raw pixel representations, thereby proving the effectiveness of the self-supervised learning mechanism. Furthermore, the research revealed the importance of the ground plane as contextual information, which is crucial for inferring the true color of objects. Despite some limitations of the model, such as simplifying the complexity of the actual infant visual environment, the study provides a preliminary proof of concept for the development of color constancy.