Spatial Transformer Network YOLO Model for Agricultural Object Detection

Yash Zambre,Ekdev Rajkitkul,Akshatha Mohan,Joshua Peeples
2024-09-16
Abstract:Object detection plays a crucial role in the field of computer vision by autonomously locating and identifying objects of interest. The You Only Look Once (YOLO) model is an effective single-shot detector. However, YOLO faces challenges in cluttered or partially occluded scenes and can struggle with small, low-contrast objects. We propose a new method that integrates spatial transformer networks (STNs) into YOLO to improve performance. The proposed STN-YOLO aims to enhance the model's effectiveness by focusing on important areas of the image and improving the spatial invariance of the model before the detection process. Our proposed method improved object detection performance both qualitatively and quantitatively. We explore the impact of different localization networks within the STN module as well as the robustness of the model across different spatial transformations. We apply the STN-YOLO on benchmark datasets for Agricultural object detection as well as a new dataset from a state-of-the-art plant phenotyping greenhouse facility. Our code and dataset are publicly available.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance challenges of the YOLO model in handling complex scenes, partially occluded or low - contrast objects in agricultural object detection. Specifically, the paper proposes a new method, that is, integrating Spatial Transformer Networks (STNs) into the YOLO model to improve the performance of the model. Through this method, the STN - YOLO model aims to enhance the focus on important regions of the image and improve the spatial invariance of the model before the detection process, thereby improving the performance of object detection both qualitatively and quantitatively. The paper pays special attention to object detection in the agricultural field, such as applications in crop disease detection, pest detection and crop harvesting. In these application scenarios, the YOLO model has some limitations due to various spatial transformations. Therefore, by introducing STN to improve the robustness of the model to spatial transformations, the accuracy and efficiency of agricultural object detection are further improved. In addition, the paper also introduces a new high - quality Plant Growth and Phenotyping (PGP) dataset. This dataset contains multispectral images with features such as different heights, lighting conditions, plant sizes and shape changes, which further increases the challenges of agricultural object detection. Through experiments on this new dataset, the effectiveness of the STN - YOLO model is verified.