Abstract:Visual active search (VAS) has been proposed as a modeling framework in which visual cues are used to guide exploration, with the goal of identifying regions of interest in a large geospatial area. Its potential applications include identifying hot spots of rare wildlife poaching activity, search-and-rescue scenarios, identifying illegal trafficking of weapons, drugs, or people, and many others. State of the art approaches to VAS include applications of deep reinforcement learning (DRL), which yield end-to-end search policies, and traditional active search, which combines predictions with custom algorithmic approaches. While the DRL framework has been shown to greatly outperform traditional active search in such domains, its end-to-end nature does not make full use of supervised information attained either during training, or during actual search, a significant limitation if search tasks differ significantly from those in the training distribution. We propose an approach that combines the strength of both DRL and conventional active search by decomposing the search policy into a prediction module, which produces a geospatial distribution of regions of interest based on task embedding and search history, and a search module, which takes the predictions and search history as input and outputs the search distribution. We develop a novel meta-learning approach for jointly learning the resulting combined policy that can make effective use of supervised information obtained both at training and decision time. Our extensive experiments demonstrate that the proposed representation and meta-learning frameworks significantly outperform state of the art in visual active search on several problem domains.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to improve the adaptability of search strategies to different tasks in the Visual Active Search (VAS) task, especially when there is a significant difference between the training data and the actual task. Specifically, existing Deep Reinforcement Learning (DRL) methods perform poorly in handling such tasks because their end-to-end nature cannot fully utilize the supervision information obtained during training or the actual search process. This leads to a significant decline in search performance when facing tasks different from the training data. To overcome this issue, the authors propose a Partially Supervised Reinforcement Learning Framework for Visual Active Search (PSVAS), which combines the advantages of traditional active search and deep reinforcement learning. PSVAS achieves this by decomposing the search strategy into two modules: 1. **Task-Specific Prediction Module**: Predicts the location of the target object based on the aerial images of the task and the labels obtained during the search process. 2. **Task-Agnostic Search Module**: Generates the search distribution based on the predictions provided by the prediction module and the previous search history. This decomposition allows the parameters of the prediction module to be updated using supervision information during decision-making without changing the search module. Additionally, the authors propose a Meta-Learning Approach for PSVAS (MPS-VAS) to jointly learn the initialization parameters of the prediction module and the search strategy to adapt to changes in predictions during the search process. In this way, PSVAS and MPS-VAS can significantly outperform existing baseline methods in various problem domains, especially showing stronger adaptability and higher search performance when dealing with tasks that differ significantly from the training data.

A Partially Supervised Reinforcement Learning Framework for Visual Active Search

A Visual Active Search Framework for Geospatial Exploration

MVSSC: Meta-reinforcement Learning Based Visual Indoor Navigation Using Multi-View Semantic Spatial Context

Learning Efficient Multi-Agent Cooperative Visual Exploration

A Hierarchical SLAM Framework Based on Deep Reinforcement Learning for Active Exploration

Space Noncooperative Object Active Tracking with Deep Reinforcement Learning.

Active Visual Localization in Partially Calibrated Environments.

Multiple Self-Supervised Auxiliary Tasks for Target-Driven Visual Navigation Using Deep Reinforcement Learning.

Reinforcement Learning Meets Visual Odometry

SAVE: Spatial-Attention Visual Exploration.

Autonomous Multi-View Navigation Via Deep Reinforcement Learning

Visual Sensor Network Reconfiguration with Deep Reinforcement Learning

Deep Reinforcement Learning for Autonomous Ground Vehicle Exploration Without A-Priori Maps

Unsupervised Active Visual Search with Monte Carlo Planning under Uncertain Detections

Self-Supervised Reinforcement Learning for Active Object Detection

Active Object Perceiver: Recognition-Guided Policy Learning for Object Searching on Mobile Robots

Self-supervised Visual Reinforcement Learning with Object-centric Representations

ViSaRL: Visual Reinforcement Learning Guided by Human Saliency

VizNav: A Modular Off-Policy Deep Reinforcement Learning Framework for Vision-Based Autonomous UAV Navigation in 3D Dynamic Environments

Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey