Abstract:Introduction: Yunnan Xiaomila is a pepper variety whose flowers and fruits become mature at the same time and multiple times a year. The distinction between the fruits and the background is low and the background is complex. The targets are small and difficult to identify. Methods: This paper aims at the problem of target detection of Yunnan Xiaomila under complex background environment, in order to reduce the impact caused by the small color gradient changes between xiaomila and background and the unclear feature information, an improved PAE-YOLO model is proposed, which combines the EMA attention mechanism and DCNv3 deformable convolution is integrated into the YOLOv8 model, which improves the model's feature extraction capability and inference speed for Xiaomila in complex environments, and achieves a lightweight model. First, the EMA attention mechanism is combined with the C2f module in the YOLOv8 network. The C2f module can well extract local features from the input image, and the EMA attention mechanism can control the global relationship. The two complement each other, thereby enhancing the model's expression ability; Meanwhile, in the backbone network and head network, the DCNv3 convolution module is introduced, which can adaptively adjust the sampling position according to the input feature map, contributing to stronger feature capture capabilities for targets of different scales and a lightweight network. It also uses a depth camera to estimate the posture of Xiaomila, while analyzing and optimizing different occlusion situations. The effectiveness of the proposed method was verified through ablation experiments, model comparison experiments and attitude estimation experiments. Results: The experimental results indicated that the model obtained an average mean accuracy (mAP) of 88.8%, which was 1.3% higher than that of the original model. Its F1 score reached 83.2, and the GFLOPs and model sizes were 7.6G and 5.7MB respectively. The F1 score ranked the best among several networks, with the model weight and gigabit floating-point operations per second (GFLOPs) being the smallest, which are 6.2% and 8.1% lower than the original model. The loss value was the lowest during training, and the convergence speed was the fastest. Meanwhile, the attitude estimation results of 102 targets showed that the orientation was correctly estimated exceed 85% of the cases, and the average error angle was 15.91°. In the occlusion condition, 86.3% of the attitude estimation error angles were less than 40°, and the average error angle was 23.19°. Discussion: The results show that the improved detection model can accurately identify Xiaomila targets fruits, has higher model accuracy, less computational complexity, and can better estimate the target posture.

Pose Detection of the Grain-Leveling Robot Based on the Improved YOLOv8-Pose Algorithm

YOLOv8-PoseBoost: Advancements in Multimodal Robot Pose Keypoint Detection

Research on Human Posture Estimation Algorithm Based on YOLO-Pose

Efficient and Lightweight Grape and Picking Point Synchronous Detection Model Based on Key Point Detection

Improved YOLO-Pose Crowd Pose Estimation.

Object Pose Estimation for Robotic Grasping based on Multi-view Keypoint Detection

Efficient Grasp Detection Network with Gaussian-Based Grasp Representation for Robotic Manipulation

An improved YOLO v4 used for grape detection in unstructured environment

KSL-POSE: A Real-Time 2D Human Pose Estimation Method Based on Modified YOLOv8-Pose Framework

Object Detection Method for Grasping Robot Based on Improved YOLOv5

A lightweight Yunnan Xiaomila detection and pose estimation based on improved YOLOv8

RFA-YOLO-POSE: A Fusion Algorithm for Pose Detection and Object Identification Amidst Complex Crowds

Fruit Detection and Pose Estimation for Grape Cluster-Harvesting Robot Using Binocular Imagery Based on Deep Neural Networks

GA-YOLO: A Lightweight YOLO Model for Dense and Occluded Grape Target Detection

Lightweight object detection algorithm for robots with improved YOLOv5

Multicow pose estimation based on keypoint extraction

Fruit fast tracking and recognition of apple picking robot based on improved YOLOv5

Object Detection Algorithm Based on Improved YOLOv5 for Basketball Robot

An object planar grasping pose detection algorithm in low-light scenes

A multilevel object pose estimation algorithm based on point cloud keypoints

The application prospects of robot pose estimation technology: exploring new directions based on YOLOv8-ApexNet