Abstract:Accurate identification of small tea buds is a key technology for tea harvesting robots, which directly affects tea quality and yield. However, due to the complexity of the tea plantation environment and the diversity of tea buds, accurate identification remains an enormous challenge. Current methods based on traditional image processing and machine learning fail to effectively extract subtle features and morphology of small tea buds, resulting in low accuracy and robustness. To achieve accurate identification, this paper proposes a small object detection algorithm called STF-YOLO (Small Target Detection with Swin Transformer and Focused YOLO), which integrates the Swin Transformer module and the YOLOv8 network to improve the detection ability of small objects. The Swin Transformer module extracts visual features based on a self-attention mechanism, which captures global and local context information of small objects to enhance feature representation. The YOLOv8 network is an object detector based on deep convolutional neural networks, offering high speed and precision. Based on the YOLOv8 network, modules including Focus and Depthwise Convolution are introduced to reduce computation and parameters, increase receptive field and feature channels, and improve feature fusion and transmission. Additionally, the Wise Intersection over Union loss is utilized to optimize the network. Experiments conducted on a self-created dataset of tea buds demonstrate that the STF-YOLO model achieves outstanding results, with an accuracy of 91.5% and a mean Average Precision of 89.4%. These results are significantly better than other detectors. Results show that, compared to mainstream algorithms (YOLOv8, YOLOv7, YOLOv5, and YOLOx), the model improves accuracy and F1 score by 5-20.22 percentage points and 0.03-0.13, respectively, proving its effectiveness in enhancing small object detection performance. This research provides technical means for the accurate identification of small tea buds in complex environments and offers insights into small object detection. Future research can further optimize model structures and parameters for more scenarios and tasks, as well as explore data augmentation and model fusion methods to improve generalization ability and robustness.

Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX

An Object Detection Method Based on Improved YOLOX

A Transformer-Based Object Detector with Coarse-Fine Crossing Representations

Swin-Transformer-Based YOLOv5 for Small-Object Detection in Remote Sensing Images

Remote sensing object detection based on a combination of a CNN and the Swin transformer

Swin-Transformer-Enabled YOLOv5 with Attention Mechanism for Small Object Detection on Satellite Images

SwinSOD: Salient object detection using swin-transformer

Swin-Transformer -YOLOv5 for lightweight hot-rolled steel strips surface defect detection algorithm

SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Obstacle detection: improved YOLOX-S based on swin transformer-tiny

Enhancement of Human Face Mask Detection Performance by Using Ensemble Learning Models.

CNN-transformer mixed model for object detection

YotoR-You Only Transform One Representation

A Transformer-Based Framework for Tiny Object Detection

SRE-YOLOv8: An Improved UAV Object Detection Model Utilizing Swin Transformer and RE-FPN

Small object detection algorithm incorporating swin transformer for tea buds

An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation

YOLOv5s maritime distress target detection method based on swin transformer

DETRs Beat YOLOs on Real-time Object Detection

Swin transformer adaptation into YOLOv7 for road damage detection