Chronic Disease Diagnoses Using Behavioral Data

Di Wang,Yidan Hu,Eng Sing Lee,Hui Hwang Teong,Ray Tian Rui Lai,Wai Han Hoi,Chunyan Miao
2024-10-04
Abstract:Early detection of chronic diseases is beneficial to healthcare by providing a golden opportunity for timely interventions. Although numerous prior studies have successfully used machine learning (ML) models for disease diagnoses, they highly rely on medical data, which are scarce for most patients in the early stage of the chronic diseases. In this paper, we aim to diagnose hyperglycemia (diabetes), hyperlipidemia, and hypertension (collectively known as 3H) using own collected behavioral data, thus, enable the early detection of 3H without using medical data collected in clinical settings. Specifically, we collected daily behavioral data from 629 participants over a 3-month study period, and trained various ML models after data preprocessing. Experimental results show that only using the participants' uploaded behavioral data, we can achieve accurate 3H diagnoses: 80.2\%, 71.3\%, and 81.2\% for diabetes, hyperlipidemia, and hypertension, respectively. Furthermore, we conduct Shapley analysis on the trained models to identify the most influential features for each type of diseases. The identified influential features are consistent with those reported in the literature.
Computers and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **In the absence of clinical medical data, use daily behavior data to make an early diagnosis of three chronic diseases (i.e., diabetes, hyperlipidemia, and hypertension, collectively referred to as 3H)**. Specifically, the authors hope to collect and analyze users' daily behavior data (such as sleep time, number of steps, daily activities, etc.) to achieve accurate diagnosis of these three chronic diseases, thereby providing an opportunity for early intervention. ### Background and the Importance of the Problem 1. **The Importance of Early Detection**: - Early detection of chronic diseases is crucial for healthcare because it provides a golden opportunity for timely intervention. - Although many studies have successfully used machine - learning models for disease diagnosis, these methods are highly dependent on medical data, and most patients lack sufficient medical data in the early stages of chronic diseases. 2. **The Advantages of Behavior Data**: - Medical data is expensive and difficult to obtain comprehensively, especially in the early stages of chronic diseases. - Many people only go to the hospital when their health condition has deteriorated to the point where they must seek medical treatment, resulting in early symptoms not being detected in time. - Behavior data can reflect a person's lifestyle, and lifestyle is one of the main causes of many chronic diseases. ### Research Objectives The authors' objective is to verify the following research questions: - **Whether and to what extent can a person's 3H status be accurately assessed using only the collected daily behavior data?** ### Method Overview To answer the above questions, the authors carried out the following work: 1. **Data Collection**: - Developed a mobile application to collect daily behavior data of 629 participants for 3 months. - The data includes sleep time, number of steps, daily activities and their durations, etc. 2. **Data Pre - processing**: - Deal with missing values, using Mean Imputation (MI) and k - Nearest Neighbor Imputation (KNNI) methods. - Clean abnormal data to ensure data quality. - Extract features to represent behavior changes, such as calculating the differences in the first and second half of the study period. 3. **Model Training and Evaluation**: - Use multiple machine - learning algorithms (such as Random Forest, XGBoost, k - Nearest Neighbor, and Support Vector Machine) for training and evaluation. - The experimental results show that using only behavior data, diabetes, hyperlipidemia, and hypertension can be diagnosed with accuracies of 80.2%, 71.3%, and 81.2% respectively. 4. **Feature Importance Analysis**: - Use Shapley value analysis to identify the features that have the greatest impact on each disease, and find that these features are consistent with those reported in the literature. ### Conclusion This study shows that using only daily behavior data can effectively diagnose 3H diseases in the early stage, especially in the diagnosis of hypertension, which performs better than traditional blood pressure measurement methods. This result provides new ideas and tools for early chronic disease screening in large - scale populations.