A Data Mining Approach to Predict Risk of Cardiovascular

Shaopeng Ma,Xiong Chen
DOI: https://doi.org/10.1063/1.5085527
2019-01-01
Abstract:Cardiovascular disease is now increasingly threatening to humanity. The accurate prediction of patients' condition is significant to early prevention. This paper describes our research about how to predict patients' risk of cardiovascular disease by processing their physical examination reports. We use five items (systolic pressure, diastolic pressure, triglyceride, high-density lipoprotein cholesterol and low-density lipoprotein cholesterol) to quantizer this risk in our research. To extract useful information from the medical records, we use natural language processing (NLP) method. To conserve the sentence into digital data, we use term frequency-inverse document frequency (TF-IDF) algorithm to extract major information from medical reports. Principal component analysis (PCA) algorithm is used to reduce the high dimension of text information data. Additionally, we extracted easy-transform numerical features and category features. Combining all these features, we use the xgboost algorithm to make final predictions. The results turn out to be well that the mean square error and relative error can be restricted to an acceptable low level.
What problem does this paper attempt to address?