Medical Concept Representation Learning from Claims Data and Application to Health Plan Payment Risk Adjustment

Qiu-Yue Zhong,Andrew H. Fairless,Jasmine M. McCammon,Farbod Rahmanian
DOI: https://doi.org/10.48550/arXiv.1907.06600
2019-07-16
Abstract:Risk adjustment has become an increasingly important tool in healthcare. It has been extensively applied to payment adjustment for health plans to reflect the expected cost of providing coverage for members. Risk adjustment models are typically estimated using linear regression, which does not fully exploit the information in claims data. Moreover, the development of such linear regression models requires substantial domain expert knowledge and computational effort for data preprocessing. In this paper, we propose a novel approach for risk adjustment that uses semantic embeddings to represent patient medical histories. Embeddings efficiently represent medical concepts learned from diagnostic, procedure, and prescription codes in patients' medical histories. This approach substantially reduces the need for feature engineering. Our results show that models using embeddings had better performance than a commercial risk adjustment model on the task of prospective risk score prediction.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to improve the risk - adjustment model in health insurance to more accurately predict patients' future medical expenses. Specifically, the author proposes a new method, using semantic embeddings to extract representations of patients' medical histories from claims data and applying it to risk - adjustment for health plan payments. ### Problem Background 1. **Importance of Risk - Adjustment** - Risk - adjustment plays a stabilizing role in the medical insurance market, aiming to reduce insurers' incentives to avoid high - cost patients. - Without risk - adjustment, insurers may be inclined to enroll healthy patients and avoid those with complex conditions. - Through risk - adjustment, insurers pay higher fees for patients expected to cost more (such as those with multiple chronic diseases) and lower fees for those expected to cost less. 2. **Limitations of Existing Models** - Current risk - adjustment models usually use linear regression for estimation, which fails to fully utilize the information in claims data, especially the interactions and non - linear relationships between variables. - Developing these linear regression models requires a great deal of domain - expert knowledge and data pre - processing work. - The performance of existing risk - adjustment models is limited (the \( R^2 \) ranges from 0.15 to 0.17), leaving much room for improvement. ### Proposed Method The author proposes a new method based on semantic embeddings, which specifically includes the following: 1. **Using Embedding Algorithms** - Utilize established and easy - to - implement embedding algorithms (such as doc2vec) to learn general patient - level representations from claims data without relying on medical - expert knowledge and heavy data pre - processing. 2. **Applying Embedded Representations** - Use the learned embedded representations to predict future risk scores and demonstrate their performance superiority over commercial risk - adjustment models. 3. **Experimental Design** - Use linear and non - linear machine - learning algorithms (such as RIDGE regression and XGBoost) for prediction and show the performance improvement of non - linear algorithms. ### Main Contributions - Propose a fast and easy - to - implement risk - adjustment method, reducing the dependence on domain - expert knowledge and the need for data pre - processing. - Demonstrate that the embedding representation has better prediction performance at the individual and group levels than existing commercial risk - adjustment tools. - The method can be widely applied to various prediction problems and may further improve performance. ### Summary The main objective of this paper is to improve the risk - adjustment model in health insurance by introducing semantic - embedding technology, thereby more accurately predicting patients' future medical expenses, reducing the opportunity for the system to be manipulated (such as the "up - coding" phenomenon), and improving the overall prediction accuracy.