Research on Predicting Photosynthetic Pigments in Tomato Seedling Leaves Based on Near-Infrared Hyperspectral Imaging and Machine Learning

Binshan Huang,Songhao Li,Teng Long,Shudai Bai,Jing Zhao,Haitao Xu,Yubin Lan,Houcheng Liu,Yongbing Long
DOI: https://doi.org/10.1016/j.microc.2024.111076
IF: 5.304
2024-01-01
Microchemical Journal
Abstract:Conventional chemical approaches could be limited in monitoring the concentration of pigments in plants in high volumes. To overcome these limitations, researchers often turn to non-invasive, high-throughput, and real-time monitoring techniques, such as spectroscopy and hyperspectral imaging, which allow for the assessment of pigment concentration in plants without the need for destructive sampling and offer the ability to monitor large volumes of plants efficiently. This research focused on the utilization of machine learning in conjunction with hyperspectral imaging to develop models for predicting the concentration of three pigments, namely chlorophylla, chlorophyll-b, and carotenoids, in tomato seedlings. The sample tomato seedlings were sourced from two distinct varieties: the wild type and the Long Hypocotyl-5-deficient (HY5) type. The spectral data were acquired using a near-infrared (NIR) camera with a spectral range spanning approximately 900-1700 nm. Machine learning algorithms such as partial least squares regression (PLSR) and extreme learning machine (ELM) were utilized to explore the latent relationship between hyperspectral information and chemical measurements. In addition, principal component analysis (PCA), independent component analysis (ICA), and competitive adaptive reweighted sampling (CARS) methods were used to extracted informative wavelengths from the reflectance spectrum. And a comprehensive analysis regarding to spectroscopy was conducted to investigate the validity and efficiency of the results of feature extraction. The ELM model demonstrated the highest effectiveness, achieving R2 values of 0.86, 0.83, and 0.83 for chlorophyll-a, chlorophyll-b, and carotenoids, respectively, on the test set. By integrating the predictive models with classifiers such as Logistic Regression, Support Vector Classifier (SVC), and K-nearest Neighbors (KNN), tomato seedlings were categorized into wild type and HY5 type. The findings showed that the proposed approach efficiently predicted the concentration of the pigments in tomato seedlings and explored the feasibility of using these results to identify tomato gene types.
What problem does this paper attempt to address?