Quantification of chlorophyll-a in typical lakes across China using Sentinel-2 MSI imagery with machine learning algorithm

Sijia Li,Kaishan Song,Shuai Wang,Ge Liu,Zhidan Wen,Yingxin Shang,Lili Lyu,Fangfang Chen,Shiqi Xu,Hui Tao,Yunxia Du,Chong Fang,Guangyi Mu
DOI: https://doi.org/10.1016/j.scitotenv.2021.146271
2021-07-01
Abstract:<p>Lake eutrophication has attracted the attention of the government and general public. Chlorophyll-a (Chl-a) is a key indicator of algal biomass and eutrophication. Many efforts have been devoted to establishing accurate algorithms for estimating Chl-a concentrations. In this study, a total of 273 samples were collected from 45 typical lakes across China during 2017–2019. Here, we proposed applicable machine learning algorithms (i.e., linear regression model (LR), support vector machine model (SVM) and Catboost model (CB)), which integrate a broad scale dataset of lake biogeochemical characteristics using Multispectral Imager (MSI) product to seamlessly retrieve the Chl-a concentration. A K-means clustering approach was used to cluster the 273 normalized water leaving reflectance spectra [<em>Rrs</em> (λ)] extracted from MSI imagery with Case 2 Regional Coast Colour (CR2CC) processor into three groups. The pH, electrical conductivity (EC), total suspended matter (TSM) and dissolved organic carbon (DOC) from three clustering groups had significant differences (<em>p</em> &lt; 0.05**), indicating that water quality parameters have an integrated impact on <em>Rrs</em>(λ)-spectra. The results of machine learning algorithms integrating demonstrated that SVM obtained a better degree of measured- and derived- fitting (calibration: slope = 0.81, R<sup>2</sup> = 0.91; validation: slope = 1.21, R<sup>2</sup> = 0.88). On the contrary, the documented nine Chl-a algorithms gave poor results (fitting 1:1 linear slope &lt; 0.4 and R<sup>2</sup> &lt; 0.70) with synchronous train and test datasets. It demonstrated that machine learning provides a robust model for quantifying Chl-a concentration. Further, considering three <em>Rrs</em>(λ) clustering groups by k-means, Chl-a SVM model indicated that cluster 1 group gave a better retrieving performance (slope = 0.71, R<sup>2</sup> = 0.78), followed by cluster 3 group (slope = 0.77, R<sup>2</sup> = 0.64) and cluster 2 group (slope = 0.67, R<sup>2</sup> = 0.50). These are related to the low TSM and high DOC levels for cluster-1 and cluster-3 <em>Rrs</em>(λ) spectra, which reduce the influence of particle in red bands for <em>Rrs</em>(λ) signal. Our results highlighted the quantification of lake Chl-a concentrations using MSI imagery and SVM, which can realize the large-scale monitoring and more appropriate for medium/low Chl-a level. The remote estimation of Chl-a based on artificial intelligence can provide an effective and robust way to monitor the lake eutrophication on a macro-scale; and offer a better approach to elucidate the response of lake ecosystems to global change.</p>
environmental sciences
What problem does this paper attempt to address?