Abstract:Objects, in the real world, rarely occur in isolation and exhibit typical arrangements governed by their independent utility, and their expected interaction with humans and other objects in the context. For example, a chair is expected near a table, and a computer is expected on top. Humans use this spatial context and relative placement as an important cue for visual recognition in case of ambiguities. Similar to human's, DNN's exploit contextual information from data to learn representations. Our research focuses on harnessing the contextual aspects of visual data to optimize data annotation and enhance the training of deep networks. Our contributions can be summarized as follows: (1) We introduce the notion of contextual diversity for active learning CDAL and show its applicability in three different visual tasks semantic segmentation, object detection and image classification, (2) We propose a data repair algorithm to curate contextually fair data to reduce model bias, enabling the model to detect objects out of their obvious context, (3) We propose Class-based annotation, where contextually relevant classes are selected that are complementary for model training under domain shift. Understanding the importance of well-curated data, we also emphasize the necessity of involving humans in the loop to achieve accurate annotations and to develop novel interaction strategies that allow humans to serve as fact-checkers. In line with this we are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads. For large-scale annotation, we are employing a strategic combination of human expertise and zero-shot models, while also integrating human input at various stages for continuous feedback.

Redefining Learning in Visual Comparison with Spatio-relational Context-aware Representations

Learn to Differ: Sim2Real Small Defection Segmentation Network

Spatially-Aware Context Neural Networks.

Learning an Adaptation Function to Assess Image Visual Similarities

Incorporating simulated spatial context information improves the effectiveness of contrastive learning models

Revisiting Sparse Convolutional Model for Visual Recognition

On Network Design Spaces for Visual Recognition

Learning-based Relational Object Matching Across Views

Connectivity-Inspired Network for Context-Aware Recognition

With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations

Visual Descriptor Learning from Monocular Video

Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

Evaluating the progress of deep learning for visual relational concepts

P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding

Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models

Collaborative Image Relevance Learning for Visual Re-Ranking.

Semantic-Aware Fine-Grained Correspondence

Visualizing Deep Similarity Networks

Deep Unsupervised Learning of Visual Similarities

Addressing Discrepancies in Semantic and Visual Alignment in Neural Networks

Spatial Relationship Representation for Visual Object Searching