A Generalization of Ripley's K Function for the Detection of Spatial Clustering in Areal Data
Stella Self,Anna Overby,Anja Zgodic,David White,Alexander McLain,Caitlin Dyckman
DOI: https://doi.org/10.48550/arXiv.2204.10852
2022-04-22
Methodology
Abstract:Spatial clustering detection has a variety of applications in diverse fields, including identifying infectious disease outbreaks, assessing land use patterns, pinpointing crime hotspots, and identifying clusters of neurons in brain imaging applications. While performing spatial clustering analysis on point process data is common, applications to areal data are frequently of interest. For example, researchers might wish to know if census tracts with a case of a rare medical condition or an outbreak of an infectious disease tend to cluster together spatially. Since few spatial clustering methods are designed for areal data, researchers often reduce the areal data to point process data (e.g., using the centroid of each areal unit) and apply methods designed for point process data, such as Ripley's K function or the average nearest neighbor method. However, since these methods were not designed for areal data, a number of issues can arise. For example, we show that they can result in loss of power and/or a significantly inflated type I error rate. To address these issues, we propose a generalization of Ripley's K function designed specifically to detect spatial clustering in areal data. We compare its performance to that of the traditional Ripley's K function, the average nearest neighbor method, and the spatial scan statistic with an extensive simulation study. We then evaluate the real world performance of the method by using it to detect spatial clustering in land parcels containing conservation easements and US counties with high pediatric overweight/obesity rates.