Enabling Cost-Effective Population Health Monitoring By Exploiting Spatiotemporal Correlation: An Empirical Study

Dawei Chen,Jiangtao Wang,Wenjie Ruan,Qiang Ni,Sumi Helal
DOI: https://doi.org/10.48550/arXiv.2005.01423
2020-04-26
Abstract:Because of its important role in health policy-shaping, population health monitoring (PHM) is considered a fundamental block for public health services. However, traditional public health data collection approaches, such as clinic-visit-based data integration or health surveys, could be very costly and time-consuming. To address this challenge, this paper proposes a cost-effective approach called Compressive Population Health (CPH), where a subset of a given area is selected in terms of regions within the area for data collection in the traditional way, while leveraging inherent spatial correlations of neighboring regions to perform data inference for the rest of the area. By alternating selected regions longitudinally, this approach can validate and correct previously assessed spatial correlations. To verify whether the idea of CPH is feasible, we conduct an in-depth study based on spatiotemporal morbidity rates of chronic diseases in more than 500 regions around London for over ten years. We introduce our CPH approach and present three extensive analytical studies. The first confirms that significant spatiotemporal correlations do exist. In the second study, by deploying multiple state-of-the-art data recovery algorithms, we verify that these spatiotemporal correlations can be leveraged to do data inference accurately using only a small number of samples. Finally, we compare different methods for region selection for traditional data collection and show how such methods can further reduce the overall cost while maintaining high PHM quality.
Applications
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the high cost and long - time - consuming of traditional Population Health Monitoring (PHM) methods. Specifically, traditional PHM methods mainly rely on clinic visit data integration or resident health surveys. These methods not only require a large amount of resources and time, but also involve data privacy protection and the complexity of technical implementation. To meet this challenge, the paper proposes a new method named "Compressive Population Health" (CPH). The core idea of CPH is to select a part of the area within a given area for traditional data collection, and use the data of these areas and the spatial correlation between adjacent areas to infer the health status of other areas where data has not been collected. In this way, CPH aims to complete the population health monitoring task at a lower cost and with higher efficiency. ### Main research questions of the paper: 1. **RQ1: Is there spatio - temporal correlation in population health monitoring?** - Research background: The successful application of compressive sensing technology in environmental monitoring is based on the premise that environmental data has strong spatio - temporal correlation. Similarly, in order to verify the feasibility of the CPH method, it is first necessary to confirm whether there is a similar spatio - temporal correlation in population health data. 2. **RQ2: If there is spatio - temporal correlation, are these correlations significant enough to support data inference in uncollected areas?** - Research objective: If spatio - temporal correlation does exist, further verify whether the existing missing - data - recovery algorithms can accurately infer the health data of uncollected areas by using these correlations. 3. **RQ3: Does the selection of which areas to conduct traditional data collection have an impact on the effectiveness of CPH?** - Research purpose: Explore the impact of different area - selection strategies on the performance of CPH. For example, which areas are selected for data collection can further reduce costs or improve the inference quality. ### Main contributions of the paper: 1. **Descriptive analysis**: By analyzing the relationship between the incidence of multiple chronic diseases and spatio - temporal factors, the significant correlation between the incidence difference and regional distance and time difference is revealed. 2. **Data inference model**: Formalize the data inference problem in CPH as a missing - data - filling problem, and apply a series of advanced missing - data - recovery algorithms to verify their performance under different settings. 3. **Area - selection strategy**: Propose and compare two methods for selecting the most informative areas (TS - A). The results show that the optimized area - selection strategy can significantly improve the implementation effect of CPH. ### Methods and data: - **Dataset**: Use the 10 - year chronic disease incidence data of more than 500 areas in London, as well as the geographical information of these areas. - **Spatio - temporal correlation analysis**: Quantify the spatial correlation by calculating the Euclidean distance between different area pairs and multiple similarity indicators (such as arithmetic difference, Euclidean distance, dynamic time warping cumulative distance, and Pearson distance). - **Time - correlation analysis**: Quantify the time correlation by calculating the incidence difference between different years in each area. - **Data - recovery algorithm**: Apply algorithms such as User - based Collaborative Filtering (UCF), Item - based Collaborative Filtering (ICF), Non - negative Matrix Factorization (NMF), and High - order Tensor Decomposition (HOTD) to verify the accuracy of data inference. Through these studies, the paper proves the feasibility and effectiveness of using spatio - temporal correlation for population health monitoring, and provides a new solution for reducing the cost of PHM.