Federated Epidemic Surveillance

Ruiqi Lyu,Roni Rosenfeld,Bryan Wilder
2024-09-14
Abstract:Epidemic surveillance is a challenging task, especially when crucial data is fragmented across institutions and data custodians are unable or unwilling to share it. This study aims to explore the feasibility of a simple federated surveillance approach. The idea is to conduct hypothesis tests for a rise in counts behind each custodian's firewall and then combine p-values from these tests using techniques from meta-analysis. We propose a hypothesis testing framework to identify surges in epidemic-related data streams and conduct experiments on real and semi-synthetic data to assess the power of different p-value combination methods to detect surges without needing to combine the underlying counts. Our findings show that relatively simple combination methods achieve a high degree of fidelity and suggest that infectious disease outbreaks can be detected without needing to share even aggregate data across institutions.
Applications,Artificial Intelligence,Computers and Society,Methodology
What problem does this paper attempt to address?
The paper attempts to address the problem of how to effectively conduct epidemic surveillance when data is dispersed and difficult to share. Specifically, the paper explores a Federated Epidemic Surveillance method, which aims to detect epidemic outbreaks by conducting hypothesis tests within each data holder and aggregating these test results (such as p-values) without sharing raw or aggregated data. ### Background and Challenges 1. **Data Dispersion**: Epidemic surveillance requires real-time data, but in many countries, such as the United States, relevant data is typically held by multiple independent entities (e.g., hospitals, laboratories, insurance companies, and local governments). These entities often cannot or are unwilling to routinely share data, even aggregated time series data. 2. **Privacy and Competition**: Even if data sharing is allowed from a privacy perspective (e.g., compliant with US HIPAA rules), other barriers such as competitiveness, commercial value, and reputation may lead to reluctance in data sharing. 3. **Limitations of Existing Methods**: Current epidemic surveillance systems rely on mandatory reporting under specific conditions, a process that is cumbersome and passive, and cannot respond promptly to new public health threats. ### Solution 1. **Federated Epidemic Surveillance**: The core idea of this method is that health information (including aggregated counts) never leaves the systems of the data holders. Each data holder only shares specific statistics of their data, such as p-values from hypothesis tests. These statistics are then aggregated to detect potential new epidemic trends. 2. **Hypothesis Testing Framework**: The paper proposes a hypothesis testing framework to identify surges in epidemic-related data streams. Specific steps include: - Conducting hypothesis tests at different sites to detect "surges." - Using meta-analysis methods (such as Stouffer, Fisher, Pearson, and Tippett methods) to aggregate p-values from different sites into a single hypothesis test. ### Experiments and Results 1. **Theoretical Analysis**: The paper first conducts a theoretical analysis on an idealized generative model to evaluate the statistical power of different p-value combination methods. 2. **Real Data Experiments**: Experiments are conducted using two real datasets (COVID-19 hospitalization data and outpatient insurance claims data) to validate the effectiveness of the Federated Epidemic Surveillance method in practical applications. 3. **Performance Comparison**: Results show that the federated surveillance method can effectively reconstruct the information of centralized data and detect epidemic surges when p-values are appropriately combined. Specifically, the Stouffer method performs well in facility-level hospitalization data, while the Fisher method performs better in county-level insurance claims data. ### Conclusion 1. **Feasibility**: The Federated Epidemic Surveillance method performs well in a data-dispersed environment, enabling effective epidemic surveillance without sharing raw data. 2. **Optimization Directions**: The performance of the federated surveillance method can be further improved by introducing auxiliary information (such as the relative share of each site) and weighting methods. 3. **Practical Application**: This method provides new ideas for modern epidemic surveillance systems, helping to address current and future public health threats. Overall, the paper addresses the issues of data dispersion and privacy protection through the Federated Epidemic Surveillance method, providing a new solution for early detection of epidemic outbreaks.