Abstract:Onboard land cover classification provides ever-updating land cover information, supporting various intelligent satellite applications that demand timely autonomous decision-making based on current and continuous land cover data. However, due to space, weight, and power constraints, satellites possess limited computational resources, rendering them unable to execute conventional land cover classification networks. In response to this challenge, we have designed a lightweight network for land cover classification featuring two efficient transformer attention mechanisms enhanced by multigranularity tokens. Diverging from traditional transformer attention mechanisms that solely capture token-to-token correlations at a single granularity, our approach splits the tokens into four segments and uses atrous convolutions across various dilation rates to aggregate token segments from diverse receptive fields, forming token segment combinations that encompass not only point information but also information from patches of varying sizes. These multigranularity tokens are subsequently processed through the windowed squeeze axial transformer attention (WSATA) and multigranularity bilevel routing attention (MGBRA) for feature enhancement. In another aspect, empirical observations reveal that prediction errors are more prone to manifest on land covers of small extent; however, conventional methods treat all pixels uniformly. This realization motivates us to propose a novel network-agnostic loss named connected component loss (CCL), which specifically targets small-scale land covers and their boundaries. Quantitative metrics and visual interpretations from comprehensive experiments confirm that our method attains state-of-the-art accuracy on two land cover classification datasets while exhibiting significantly faster inference speed than other lightweight networks, underscoring the practical potential of our method on embedded systems.

A Light-Weight Model with Granularity Feature Representation for Fine-Grained Visual Classification

Hybrid Granularities Transformer for Fine-Grained Image Recognition

Lightweight Vision Transformer with Cross Feature Attention

Fine-grained image classification based on TinyVit object location and graph convolution network

Image recognition based on lightweight convolutional neural network: Recent advances

FET-FGVC: Feature-enhanced transformer for fine-grained visual classification

MGFN: A Multi-Granularity Fusion Convolutional Neural Network for Remote Sensing Scene Classification

A novel dual-granularity lightweight transformer for vision tasks

Hybrid ViT-CNN Network for Fine-Grained Image Classification

AA-Trans: Core Attention Aggregating Transformer with Information Entropy Selector for Fine-grained Visual Classification

A Lightweight Model of VGG-16 for Remote Sensing Image Classification

TransFG: A Transformer Architecture for Fine-Grained Recognition

MFF-Trans: Multi-level Feature Fusion Transformer for Fine-Grained Visual Classification

Part-Guided Relational Transformers for Fine-Grained Visual Recognition

A Lightweight Hybrid Model with Location-Preserving ViT for Efficient Food Recognition

Fine-grained ship image classification and detection based on a vision transformer and multi-grain feature vector FPN model

A Lightweight Transformer With Multigranularity Tokens and Connected Component Loss for Land Cover Classification

Lightweight monocular depth estimation using a fusion-improved transformer

Learning Granularity-Aware Convolutional Neural Network for Fine-Grained Visual Classification

CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction