Redefining Learning in Visual Comparison with Spatio-relational Context-aware Representations

Kingsley Nketia Acheampong,Wenhong Tian,Addis Abebe Assifa
DOI: https://doi.org/10.1145/3361758.3361783
2019-01-01
Abstract:Visual comparison, the task of differentiating between identical patterns, images or scenes, and telling the difference in the form of descriptors is deemed cognitively challenging for humans and computational models alike. This difficulty heightens when the differences in the patterns, images or scenes in context are subtle. Triggering of subtle differences in similar patterns, images or scenes ultimately requires precise visual attention capabilities and an equally good natural language interpretation models to describe the differences found. Unlike previous works that use cluster-based detection approaches, we propose a deep convolution neural network that leverages latent spatial representations to capture and localize differences, while actively aligning the differences to their respective annotations for the task of visual comparison. We introduce an image reconditioning method to enhance images, prior detecting target differences and complement our detection methods with a descriptor generation module to describe coarse to fine-grained differences. Experimental results validate the performance of our proposed model for visual comparison tasks, transcending other models by an appreciable extent.
What problem does this paper attempt to address?