Abstract:It is a quite challenging problem for robots to track the target in complex environment due to appearance changes of the target and background, large variation of motion, partial and full occlusion, motion of the camera and so on. However, humans are capable to cope with these difficulties by using their cognitive capability, mainly including the visual attention and learning mechanisms. This paper therefore presents a single-object tracking method for robots based on the object-based attention mechanism. This tracking method consists of four modules: pre-attentive segmentation, top-down attentional selection, post-attentive processing and online learning of the target model. The pre-attentive segmentation module first divides the scene into uniform proto-objects. Then the top-down attention module selects one proto-object over the predicted region by using a discriminative feature of the target. The post-attentive processing module then validates the attended proto-object. If it is confirmed to be the target, it is used to obtain the complete target region. Otherwise, the recovery mechanism is automatically triggered to globally search for the target. Given the complete target region, the online learning algorithm autonomously updates the target model, which consists of appearance and saliency components. The saliency component is used to automatically select a discriminative feature for top-down attention, while the appearance component is used for bias estimation in the top-down attention module and validation in the post-attentive processing module. Experiments have shown that this proposed method outperforms other algorithms without using attention for tracking a single target in cluttered and dynamically changing environment.

What Do I See? Modeling Human Visual Perception for Multi-person Tracking

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

How Does Human Interest Modeling Help in Computer Vision: Tracking-by-saliency in Unconstrained Social Videos.

Real-time visual tracking based on an appearance model and a motion mode

Target tracking for moving robots using object-based visual attention

Predicting Social Interactions for Visual Tracking.

Modelling Human Visual Motion Processing with Trainable Motion Energy Sensing and a Self-attention Network

Multi-object Model-Free Tracking with Joint Appearance and Motion Inference

Modeling Local Behavior for Predicting Social Interactions Towards Human Tracking

A Survey of Visual Attention Based Methods for Object Tracking

A Multi-Hypothesis Tracker with Enhanced Appearance Model for Generic Crowded Scene.

A Single-Object Tracking Method for Robots Using Object-Based Visual Attention.

Visual Tracking with a Cognitive Observation Model

Multihuman Tracking Based on a Spatial–Temporal Appearance Match

Detection driven adaptive multi-cue integration for multiple human tracking

Multi-person Vision-Based Head Detector for Markerless Human Motion Capture

Multi-invariance appearance model for object tracking

Multi-User and Multi-View Human Eyes' Detection and Tracking

Describe and Attend to Track: Learning Natural Language guided Structural Representation and Visual Attention for Object Tracking

Temporal Dynamic Appearance Modeling for Online Multi-Person Tracking