A Hybrid CNN-transformer Network: Accurate and Efficient Semantic Segmentation of Crops and Weeds on Resource-Constrained Embedded Devices

Yifan Wei,Yuncong Feng,Dongcheng Zu,Xiaoli Zhang
DOI: https://doi.org/10.1016/j.cropro.2024.107018
IF: 3.036
2025-01-01
Crop Protection
Abstract:Weed control plays a crucial role in agricultural production. The utilization of advanced vision algorithms on intelligent weeding robots enables the autonomous and efficient resolution of weed-related issues. Vision transformers are highly sensitive to plant texture and shape, but their computational cost is too high. Consequently, we propose a novel hybrid CNN-transformer network for the semantic segmentation of crops and weeds on Resource-Constrained Embedded Devices. Our network follows an encoder–decoder structure, incorporating the proposed concat extended downsampling block in the encoder, which increases inference speed by reducing memory access time and improves the accuracy of feature extraction. For global semantic extraction, we introduce the proposed Parallel input transformer semantic enhancement module, which employs a shared transformer block to increase the computation rate. Additionally, global–local semantic fusion block mitigates the semantic gap problem well. To fully utilize the transformer’s ability to process plant texture and shape, we employ the fusion enhancement block in the decoder, thus minimizing the loss of feature information. Segmentation results on three publicly benchmark datasets show that our network outperforms the commonly used CNN-based, transformer-based, and hybrid CNN-transformer-based methods in terms of segmentation accuracy. Moreover, our network comprises only 0.1887M parameters and 0.2145G floating-point operations. We also evaluate the inference speed on an NVIDIA Jetson Orin NX embedded system, which result for inference single image 28.28 msec, and achieving a detection speed of 35.36 FPS. The experimental results highlight that our network maintains the best inference speed and exhibits the strongest segmentation performance on resource-constrained embedded systems.
What problem does this paper attempt to address?