Abstract:Risk adjustment has become an increasingly important tool in healthcare. It has been extensively applied to payment adjustment for health plans to reflect the expected cost of providing coverage for members. Risk adjustment models are typically estimated using linear regression, which does not fully exploit the information in claims data. Moreover, the development of such linear regression models requires substantial domain expert knowledge and computational effort for data preprocessing. In this paper, we propose a novel approach for risk adjustment that uses semantic embeddings to represent patient medical histories. Embeddings efficiently represent medical concepts learned from diagnostic, procedure, and prescription codes in patients' medical histories. This approach substantially reduces the need for feature engineering. Our results show that models using embeddings had better performance than a commercial risk adjustment model on the task of prospective risk score prediction.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to improve the risk - adjustment model in health insurance to more accurately predict patients' future medical expenses. Specifically, the author proposes a new method, using semantic embeddings to extract representations of patients' medical histories from claims data and applying it to risk - adjustment for health plan payments. ### Problem Background 1. **Importance of Risk - Adjustment** - Risk - adjustment plays a stabilizing role in the medical insurance market, aiming to reduce insurers' incentives to avoid high - cost patients. - Without risk - adjustment, insurers may be inclined to enroll healthy patients and avoid those with complex conditions. - Through risk - adjustment, insurers pay higher fees for patients expected to cost more (such as those with multiple chronic diseases) and lower fees for those expected to cost less. 2. **Limitations of Existing Models** - Current risk - adjustment models usually use linear regression for estimation, which fails to fully utilize the information in claims data, especially the interactions and non - linear relationships between variables. - Developing these linear regression models requires a great deal of domain - expert knowledge and data pre - processing work. - The performance of existing risk - adjustment models is limited (the \( R^2 \) ranges from 0.15 to 0.17), leaving much room for improvement. ### Proposed Method The author proposes a new method based on semantic embeddings, which specifically includes the following: 1. **Using Embedding Algorithms** - Utilize established and easy - to - implement embedding algorithms (such as doc2vec) to learn general patient - level representations from claims data without relying on medical - expert knowledge and heavy data pre - processing. 2. **Applying Embedded Representations** - Use the learned embedded representations to predict future risk scores and demonstrate their performance superiority over commercial risk - adjustment models. 3. **Experimental Design** - Use linear and non - linear machine - learning algorithms (such as RIDGE regression and XGBoost) for prediction and show the performance improvement of non - linear algorithms. ### Main Contributions - Propose a fast and easy - to - implement risk - adjustment method, reducing the dependence on domain - expert knowledge and the need for data pre - processing. - Demonstrate that the embedding representation has better prediction performance at the individual and group levels than existing commercial risk - adjustment tools. - The method can be widely applied to various prediction problems and may further improve performance. ### Summary The main objective of this paper is to improve the risk - adjustment model in health insurance by introducing semantic - embedding technology, thereby more accurately predicting patients' future medical expenses, reducing the opportunity for the system to be manipulated (such as the "up - coding" phenomenon), and improving the overall prediction accuracy.

Medical Concept Representation Learning from Claims Data and Application to Health Plan Payment Risk Adjustment

Medical Concept Representation Learning from Electronic Health Records and its Application on Heart Failure Prediction

Exploiting Convolutional Neural Network for Risk Prediction with Medical Feature Embedding

Distributed representation of patients and its use for medical cost prediction

Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis

Language-model-based patient embedding using electronic health records facilitates phenotyping, disease forecasting, and progression analysis

Incorporating Medical Code Descriptions for Diagnosis Prediction in Healthcare

Generic medical concept embedding and time decay for diverse patient outcome prediction tasks

Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data

Clinical Risk Prediction Using Language Models: Benefits And Considerations

Medical Concept Embedding with Time-Aware Attention

Comparing neural language models for medical concept representation and patient trajectory prediction

MD-Manifold: A Medical-Distance-Based Representation Learning Approach for Medical Concept and Patient Representation

Quantifying risk factors in medical reports with a context-aware linear model

Learning Representations of Missing Data for Predicting Patient Outcomes

Integrated Convolutional and Recurrent Neural Networks for Health Risk Prediction using Patient Journey Data with Many Missing Values

Topic medical concept embedding: Multi-sense representation learning for medical concept

Deep Claim: Payer Response Prediction from Claims Data with Deep Learning

Multi-layer Representation Learning for Medical Concepts

Context-aware and Time-aware Attention-based Model for Disease Risk Prediction with Interpretability

Enriching Unsupervised User Embedding via Medical Concepts