Abstract:Deep learning has shown superiority in change detection (CD) tasks, notably the Transformer architecture with its self-attention mechanism, capturing long-range dependencies and outperforming traditional models. This capability provides the Transformer with significant advantages in capturing global-level features of complex changes in objects within high-resolution remote sensing images. Though Transformers are mature in Natural Language Processing (NLP), their application in computer vision, particularly CD tasks, is nascent. Current research on leveraging Transformers for CD reveals limitations, especially under varied lighting and seasonal changes. To address this, we propose VisionTwinNet, a two-stage strategy. First, our Gated EnhanceClearNet, a specially designed deep network reduces image noise and enhances brightness, preserving shadows and correcting color distortions. With its unique gating mechanism, this network can adaptively adjust the importance of features, thereby exhibiting superior performance in various remote sensing image degradation issues. Secondly, we have developed Hybrid Light-Robust CDNet, a hybrid robust lightweight network custom-designed for CD in remote sensing images. This module deeply integrates the advantages of CNN and Transformer and introduces an innovative attention mechanism design, optimizing the key/value dimensions separately, instead of adopting traditional single linear transformations, ensuring efficient detection. Specifically, the LR-Transformer Block employs a lightweight multi-head self-attention mechanism, optimizing computational efficiency while providing richer feature representations. Comparative studies with six CD methods on three public datasets validate VisionTwinNet’s robustness and efficacy. Our approach notably reduces algorithmic complexity and enhances the efficiency of the model.

Convolution-Enhanced Vision Transformer Network for Smoke Recognition

A Slight Smoke Perceptual Network.

Wildfire Smoke Detection with Cross Contrast Patch Embedding

Hybrid CNN-ViT architecture to exploit spatio-temporal feature for fire recognition trained through transfer learning

Visual Smoke Detection Based on Ensemble Deep CNNs.

Visual Smoke Recognition Based on an Inverse-Radiating Attention Pyramid Network

A modified vision transformer architecture with scratch learning capabilities for effective fire detection

An Efficient Dehazing Algorithm Based on the Fusion of Transformer and Convolutional Neural Network.

An Integrated Smoke Detection Method Based on Convolutional Neural Network and Image Processing

A transformer boosted UNet for smoke segmentation in complex backgrounds in multispectral LandSat imagery

A Smart Visual Sensor for Smoke Detection Based on Deep Neural Networks

Fire smoke detection based on target-awareness and depthwise convolutions

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

BoucaNet: A CNN-Transformer for Smoke Recognition on Remote Sensing Satellite Images

An efficient fire and smoke detection algorithm based on an end-to-end structured network

VisionTwinNet: Gated Clarity Enhancement Paired With Light-Robust CD Transformers

Vision Transformer with Convolutions Architecture Search

SDV-Net: A Two-Stage Convolutional Neural Network for Smoky Diesel Vehicle Detection

STCNet: Spatio-Temporal Cross Network for Industrial Smoke Detection

CMT: Convolutional Neural Networks Meet Vision Transformers

Convolutional Embedding Makes Hierarchical Vision Transformer Stronger