A multiple linear regression model with multiplicative log-normal error term for atmospheric concentration data

Kezheng Liao,Eun Sug Park,Jie Zhang,Linjun Cheng,Dongsheng Ji,Qi Ying,Jian Zhen Yu
DOI: https://doi.org/10.1016/j.scitotenv.2020.144282
2021-05-01
Abstract:<p>The homoscedasticity assumption (the variance of the error term is the same across all the observations) is a key assumption in the ordinary linear squares (OLS) solution of a linear regression model. The validity of this assumption is examined for a multiple linear regression model used to determine the source contributions to the observed black carbon concentrations at 12 background monitoring sites across China using a hybrid modeling approach. Residual analysis from the traditional OLS method, which assumes that the error term is additive and normally distributed with a mean of zero, shows pronounced heteroscedasticity based on the Breusch–Pagan test for 11 datasets. Noticing that the atmospheric black carbon data are log-normally distributed, we make a new assumption that the error terms are multiplicative and log-normally distributed. When the coefficients of the multilinear regression model are determined using the maximum likelihood estimation (MLE), the distribution of the residuals in 8 out of the 12 datasets is in good accordance with the revised assumption. Furthermore, the MLE computation under this novel assumption could be proved mathematically identical to minimizing a log-scale objective function, which considerably reduces the complexity in the MLE calculation.</p><p>The new method is further demonstrated to have clear advantages in numerical simulation experiments of a 5-variable multiple linear regression model using synthesized data with prescribed coefficients and lognormally distributed multiplicative errors. Under all 9 simulation scenarios, the new method yields the most accurate estimations of the regression coefficients and has significantly higher coverage probability (on average, 95% for all five coefficients) than OLS (79%) and weighted least squares (WLS, 72%) methods.</p>
environmental sciences
What problem does this paper attempt to address?