Abstract:The authors propose a novel adaptive attention for tracking task that enhances features through spatial sparse attention mechanism with less than 1/4 of the computational complexity of multi‐head attention. Based on adaptive attention, the authors build an efficient transformer tracking framework. It can perform deep interaction between search and template features to activate target information and aggregate multi‐level interaction features to enhance the representation ability. The evaluation results on seven benchmarks show that our tracker achieves outstanding performance with a speed of 43 fps and significant advantages in hard circumstances. Recently, several trackers utilising Transformer architecture have shown significant performance improvement. However, the high computational cost of multi‐head attention, a core component in the Transformer, has limited real‐time running speed, which is crucial for tracking tasks. Additionally, the global mechanism of multi‐head attention makes it susceptible to distractors with similar semantic information to the target. To address these issues, the authors propose a novel adaptive attention that enhances features through the spatial sparse attention mechanism with less than 1/4 of the computational complexity of multi‐head attention. Our adaptive attention sets a perception range around each element in the feature map based on the target scale in the previous tracking result and adaptively searches for the information of interest. This allows the module to focus on the target region rather than background distractors. Based on adaptive attention, the authors build an efficient transformer tracking framework. It can perform deep interaction between search and template features to activate target information and aggregate multi‐level interaction features to enhance the representation ability. The evaluation results on seven benchmarks show that the authors' tracker achieves outstanding performance with a speed of 43 fps and significant advantages in hard circumstances.

Efficient Feature Interactions Learning with Gated Attention Transformer

A Novel Interest Evolution Network Based on Transformer and a Gated Residual for CTR Prediction in Display Advertising

Self-gated FM: Revisiting the Weight of Feature Interactions for CTR Prediction

Neighbour Interaction based Click-Through Rate Prediction via Graph-masked Transformer

Interpretable Click-Through Rate Prediction through Hierarchical Attention

Gated recurrent neural networks discover attention

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction

Efficient transformer tracking with adaptive attention

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

GRAformer: A Gated Residual Attention Transformer for Multivariate Time Series Forecasting

Predictive Attention Transformer: Improving Transformer with Attention Map Prediction

RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction

CAT: Cross Attention in Vision Transformer

CGTS: A Transformer framework for time series prediction based on feature extraction

FLatten Transformer: Vision Transformer using Focused Linear Attention

Attention Enhanced Transformer for Multi-agent Trajectory Prediction

CAT-DTI: cross-attention and Transformer network with domain adaptation for drug-target interaction prediction

A Transformer Architecture with Adaptive Attention for Fine-Grained Visual Classification

FAM: Improving columnar vision transformer with feature attention mechanism

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer