Abstract:Background Multi-site studies facilitate the study of rare outcomes or exposures through integrating patient information from several distinct care sites. Due to patient privacy concerns, sharing of patient-level information among collaborating sites is often prohibited, suggesting a need for privacy-preserving data analysis methods. Several such methods exist, but have been shown to sometimes result in biased estimation or require extensive communication among sites. Objective We present a communication-efficient, privacy-preserving method for performing distributed regression on Electronic Health Records (EHR) data across multiple sites for zero-inflated count outcomes. Our approach is motivated by two real-world data problems: examining risk factors associated with pediatric avoidable hospitalization and modeling frequency of serious adverse events in colorectal cancer patients. Methods We use hurdle regression, a two-part (logistic-Poisson) regression model, to characterize the effects of risk factors on zero-inflated count outcomes. We develop a one-shot algorithm for performing hurdle regression (ODAH) across multiple sites, using individual patient data at one site and aggregated data from all other sites to approximate the complete data log likelihood. We evaluate ODAH through extensive simulations and an application to EHR data from the Children's Hospital of Philadelphia (CHOP) and the OneFlorida Clinical Research Consortium. We compare ODAH estimates to those from meta-analysis and pooled analysis (all patient data pooled together, the gold standard). Results In simulations, ODAH estimates exhibited bias relative to the gold standard of less than 0.1% across several settings. In contrast, meta-analysis estimated exhibited relative bias up to 12.7%, largely dependent on event rate. When applying ODAH to CHOP data, relative biases for estimates in both components of the hurdle model were less than 5.1%, while meta-analysis estimates exhibited relative bias as high as 63.6%. When analyzing OneFlorida data, ODAH relative biases were less than 10% for eight of the ten estimated coefficients, while meta-analysis estimates again showed substantially greater bias. Conclusions Our simulations and real-world applications suggest ODAH is a promising method for performing privacy-preserving distributed learning on EHR data when modeling zero-inflated count outcomes.

An Efficient Fraud Identification Method Combining Manifold Learning and Outliers Detection in Mobile Healthcare Services

MedicareVis: a Joint Visual Analytics Approach for Anti-Fraud in Medical Insurance

FraudAuditor: A Visual Analytics Approach for Collusive Fraud in Health Insurance

Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection.

A Novel Multi-view Bi-clustering method for identifying abnormal Co-occurrence medical visit behaviors

Building prediction models and discovering important factors of health insurance fraud using machine learning methods

Identifying fraud in medical insurance based on blockchain and deep learning

Pre-trained Online Contrastive Learning for Insurance Fraud Detection

Inpatinets' FWA Detection - Mismatch Between the Clinical Path and Medical Condition.

Design and development of big data-based model for detecting fraud in healthcare insurance industry

A Study of Health Insurance Fraud in China and Recommendations for Fraud Detection and Prevention

Mining Fraudsters and Fraudulent Strategies in Large-Scale Mobile Social Networks

Approaches for identifying U.S. medicare fraud in provider claims data

Multivariate outlier detection in medicare claims payments applying probabilistic programming methods

Accurate Map Matching Method for Mobile Phone Signaling Data under Spatio-Temporal Uncertainty

Distributed Learning from Multi-Site Observational Health Data for Zero-Inflated Count Outcomes

Fraud Detection in Mobile Payment Systems using an XGBoost-based Framework

Health insurance fraud detection based on multi-channel heterogeneous graph structure learning

Health insurance fraud detection by using an attributed heterogeneous information network with a hierarchical attention mechanism

Financial Fraud Detection in Healthcare Using Machine Learning and Deep Learning Techniques

Unsupervised Machine Learning for Explainable Health Care Fraud Detection