Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models

Sharat Agarwal
2024-11-20
Abstract:Objects, in the real world, rarely occur in isolation and exhibit typical arrangements governed by their independent utility, and their expected interaction with humans and other objects in the context. For example, a chair is expected near a table, and a computer is expected on top. Humans use this spatial context and relative placement as an important cue for visual recognition in case of ambiguities. Similar to human's, DNN's exploit contextual information from data to learn representations. Our research focuses on harnessing the contextual aspects of visual data to optimize data annotation and enhance the training of deep networks. Our contributions can be summarized as follows: (1) We introduce the notion of contextual diversity for active learning CDAL and show its applicability in three different visual tasks semantic segmentation, object detection and image classification, (2) We propose a data repair algorithm to curate contextually fair data to reduce model bias, enabling the model to detect objects out of their obvious context, (3) We propose Class-based annotation, where contextually relevant classes are selected that are complementary for model training under domain shift. Understanding the importance of well-curated data, we also emphasize the necessity of involving humans in the loop to achieve accurate annotations and to develop novel interaction strategies that allow humans to serve as fact-checkers. In line with this we are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads. For large-scale annotation, we are employing a strategic combination of human expertise and zero-shot models, while also integrating human input at various stages for continuous feedback.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to optimize the training of deep models by leveraging the contextual uncertainty in visual data. Specifically, the paper focuses on the following aspects: 1. **Introducing Contextual Diversity**: The paper proposes the concept of "Contextual Diversity (CD)" and demonstrates its application in three different visual tasks: semantic segmentation, object detection, and image classification. This concept aims to select training data with diverse contexts to improve the generalization ability of the model. 2. **Data Repair Algorithm**: To reduce model bias, the paper proposes a data repair algorithm for generating context - fair data sets. This enables the model to better detect objects appearing in atypical backgrounds. 3. **Category - based Labeling Strategy**: The paper proposes a category - based labeling method, selecting context categories relevant to the current task for labeling to address the domain transfer problem. This method helps the model generalize better in new domains. 4. **Human - Machine Collaborative Labeling**: The paper emphasizes the importance of introducing human feedback in the data - labeling process, especially in large - scale labeling tasks. By combining the knowledge of human experts and zero - shot models, the labeling task can be completed more efficiently. Overall, this paper aims to improve the performance and robustness of deep - learning models in visual tasks by optimizing the quality and diversity of data and introducing human - machine collaborative methods.