Plant species recognition with optimized 3D polynomial neural networks and variably overlapping time–coherent sliding window

Habib Ben Abdallah,Christopher J. Henry,Sheela Ramanna,Henry, Christopher J.,Ramanna, Sheela
DOI: https://doi.org/10.1007/s11042-024-18480-w
IF: 2.577
2024-03-08
Multimedia Tools and Applications
Abstract:Plant species recognition is a primordial task that forms the basis of solving several plant-related computer-vision problems such as disease detection or growth monitoring. However, a lack of voluminous datasets that cover specific needs is observed in the digital agriculture community. Therefore, the EAGL—I system was developed to rapidly create massive labeled datasets of plants intended to be commonly used by farmers and researchers to create AI-driven solutions in agriculture. As a result, a publicly available plant species recognition dataset composed of 40,000 images with different sizes consisting of 8 plant species was created with the system in order to demonstrate its capabilities. This paper proposes a novel method, called Variably Overlapping Time—Coherent Sliding Window (VOTCSW), that transforms a dataset composed of images with variable size to a 3D representation with a common fixed size that is suitable for convolutional neural networks, and demonstrates that this representation is more informative than resizing the images of the dataset to a given size. We theoretically formalized the use cases of the method as well as its inherent properties and we proved that it has an oversampling and a regularization effect on the data. By combining the VOTCSW method with the 3D extension of a recently proposed machine learning model called 1-Dimensional Polynomial Neural Networks, we were able to create a model that achieved a state-of-the-art accuracy of 99.9% on the dataset created by the EAGL-I system, surpassing well-known architectures such as ResNet and Inception. We also demonstrated its use in a plant semantic segmentation application. In addition, we created a heuristic algorithm that enables the degree reduction of any pre-trained N-Dimensional Polynomial Neural Network and which compresses it without altering its performance, thus making the model faster and lighter. Furthermore, we established that the currently available dataset could not be used for machine learning in its present form, due to a substantial class imbalance between the training set and the test set. Hence, we created a specific preprocessing and a model development framework that enabled us to improve the accuracy from 49.23% to 99.9%.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?