Accounting for Access Costs in Validation of Soil Maps: A Comparison of Design-Based Sampling Strategies

Lin Yang,Dick J. Brus,A-Xing Zhu,Xinming Li,Jingjing Shi
DOI: https://doi.org/10.1016/j.geoderma.2017.11.028
IF: 6.1
2018-01-01
Geoderma
Abstract:The quality of soil maps can best be estimated by collecting additional data at locations selected by probability sampling. These data can be used in design-based estimation of map quality measures such as the population mean of the squared prediction errors (MSE) for continuous soil maps and overall accuracy for categorical soil maps. In areas with large differences in access costs it can be attractive to account for these differences in selecting validation locations. In this paper two types of sampling design are compared that take access costs into account: sampling with probabilities proportional to size (pps) and stratified simple random sampling (STSI). In pps the inverse of the square root of the access costs is used as a size variable. Two estimators of MSE are applied, the Hansen-Hurwitz and Hajek estimator. In STSI optimal strata are constructed based on access costs. Simple random sampling (SI) is taken as a reference design. The sampling strategies were compared on the basis of: 1) the variance of the estimated MSE; 2) the variance of the total pointwise access costs; 3) the 95-percentile of the sampling distribution of the total access costs. The comparison was done at equal expected total pointwise access costs. The sampling strategies were compared in a simulation study and a real-world case study in Anhui, China. In the case study car travel and hiking costs were considered in computing access costs per point. The results showed that the variance of estimated MSE with pps(Hansen-Hurwitz) was larger than with pps(Hajek) and STSI. The variances of estimated MSE of pps(Hajek) and STSI were about equal and smaller than that of SI. The gain in precision compared to SI depends on the cost distribution. The larger the coefficient of variation of the costs, the larger the gain. The 95 percentile of the sampling distribution of the total pointwise access costs with STSI was smaller than with pps and SI. The gain in precision of pps(Hajek) and STSI was about 30% accounting for hiking costs only, and about 10% accounting for the sum of car travel and hiking costs in the case study. The proposed sampling strategies are of interest for surveying any soil property in areas with marked differences in access costs, not just for validation of soil maps.
What problem does this paper attempt to address?