Imputing missing data in a swat water quality modelling study using statistical methods

Hulya Boyacioglu,Meltem Kaya Uyar,Hayal Boyacioglu
DOI: https://doi.org/10.30638/eemj.2024.044
2024-05-24
Environmental Engineering and Management Journal
Abstract:Large water-quality databases are useful in modeling studies to identify optimal measures for pollution mitigation and management of water basins. The objective of the study was to conduct statistical methods to impute missing data in the water quality simulation study in the K k Menderes River Basin, T rkiye, where missing data caused by a lack of periodic sampling is an important challenge. In the study, the Soil Water Assessment Tool (SWAT) was used to simulate nitrate-nitrogen concentrations (NO3-N). Water-quality data collected between 2001 and 2012 from the outlet of the basin was subjected to regression analysis-based imputation methods. In this scope, simple regression models were developed to estimate missing water quality data. Hence, a continuous data set was created, and then the SWAT water quality model was calibrated and validated. Since the calculated Nash Sutcliffe model efficiency coefficient values were above 0.65, model simulations were judged "good". Furthermore, the Mann-Whitney U test was applied to test model performance by comparing continuous data generated by the SWAT model with the limited observed water quality data. It can be concluded that a simple regression model and non-parametric Mann-Whitney U tests can be performed to impute missing data and evaluate model performance in modeling studies of data shortage basins.
environmental sciences
What problem does this paper attempt to address?