Abstract:Weed competitiveness monitoring is crucial for field management at specific locations. Recent research in the fusion of multimodal data from unmanned aerial vehicles (UAVs) has propelled this advancement. However, these studies merely stack extracted features equivalently, neglecting the full utilization of fused information. This study utilizes hyperspectral and LiDAR data collected by UAVs to proposes a multimodal deep fusion model (MulDFNet) using Transformer and multi-layer residuals. It utilizes a comprehensive competitive index (CCI-A) based on multidimensional phenotypes of maize to assess the competitiveness of weeds in farmland ecosystems. To validate the effectiveness of this model, a series of ablation studies were conducted involving different modalities data, with/without the Transformer Encoder (TE) modules, and different fusion modules (shallow residual fusion module, deep feature fusion module). Additionally, a comparison was made with early/late stacking fusion models, traditional machine learning models, and deep learning models from relevant studies. The results indicate that the multimodal deep fusion model utilizing HSI, VI, and CHM data achieved a predictive effect of R 2 = 0.903 (RMSE = 0.078). Notably, the best performance was observed during the five-leaf stage. The combination of shallow and deep fusion modules demonstrated better predictive performance compared to a single fusion module. The positive impact of the TE module on model performance is evident, as its multi-head attention mechanism aids in better capturing the relationships and importance between feature maps and competition indices, thereby enhancing the model's predictive capability. In weed competition prediction, the multimodal deep fusion model proposed in this study has demonstrated significantly better predictive performance compared to early/late stacking fusion models and other machine learning models (RF, SVR, PLS, DNN-F2 and Multi-channel CNN). Overall, the multimodal deep fusion model developed in this study demonstrates outstanding performance in assessing weed competitiveness and can predict the competitive intensity of weeds in maize across various growth stages on a broad scale.

Weed Recognition Method based on Hybrid CNN-Transformer Model

Crop Disease Identification by Fusing Multiscale Convolution and Vision Transformer.

An Improved Transformer Network With Multi-Scale Convolution for Weed Identification in Sugarcane Field

Multi-Class Weed Recognition Using Hybrid CNN-SVM Classifier

Vision Transformers For Weeds and Crops Classification Of High Resolution UAV Images

Fine-grained weed recognition using Swin Transformer and two-stage transfer learning

A Hybrid CNN-transformer Network: Accurate and Efficient Semantic Segmentation of Crops and Weeds on Resource-Constrained Embedded Devices

Transformer-Based Weed Segmentation for Grass Management

DenseNet weed recognition model combining local variance preprocessing and attention mechanism

SWFormer: A Scale-Wise Hybrid CNN-Transformer Network for Multi-Classes Weed Segmentation

Weed recognition using deep learning techniques on class-imbalanced imagery

CNN feature based graph convolutional network for weed and crop recognition in smart farming

A hybrid CNN–SVM classifier for weed recognition in winter rape field

Development of Weed Detection Method in Soybean Fields Utilizing Improved DeepLabv3+ Platform

Efficient Crop Segmentation Net and Novel Weed Detection Method

SkipResNet: Crop and Weed Recognition Based on the Improved ResNet

Multimodal deep fusion model based on Transformer and multi-layer residuals for assessing the competitiveness of weeds in farmland ecosystems

Image patch-based deep learning approach for crop and weed recognition

Real-Time Crop Recognition In Transplanted Fields With Prominent Weed Growth: A Visual-Attention-Based Approach

Beet seedling and weed recognition based on convolutional neural network and multi-modality images

An Improved Mask R-CNN Method for Weed Segmentation