Abstract:Heart Disease has become one of the most serious diseases that has a significant impact on human life. It has emerged as one of the leading causes of mortality among the people across the globe during the last decade. In order to prevent patients from further damage, an accurate diagnosis of heart disease on time is an essential factor. Recently we have seen the usage of non-invasive medical procedures, such as artificial intelligence-based techniques in the field of medical. Specially machine learning employs several algorithms and techniques that are widely used and are highly useful in accurately diagnosing the heart disease with less amount of time. However, the prediction of heart disease is not an easy task. The increasing size of medical datasets has made it a complicated task for practitioners to understand the complex feature relations and make disease predictions. Accordingly, the aim of this research is to identify the most important risk-factors from a highly dimensional dataset which helps in the accurate classification of heart disease with less complications. For a broader analysis, we have used two heart disease datasets with various medical features. Firstly, we performed the correlation and inter-dependence of different medical features in the context of heart disease. Secondly, we applied a filter-based feature selection technique on both datasets to select most relevant features (an optimal reduced feature subset) for detecting the heart disease. Finally, various machine learning classification models were investigated using complete and reduced features subset as inputs for experimentation analysis. The trained classifiers were evaluated based on Accuracy, Receiver Operating Characteristics (ROC) curve, and F1-Score. The classification results of the models proved that there is a high impact of relevant features on the classification accuracy. Even with a reduced number of features, the performance of the classification models improved significantly with a reduced training time as compared with models trained on full feature set.

A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

Extensive Review on the Role of Machine Learning for Multifactorial Genetic Disorders Prediction

Medical Datasets Classification using a Hybrid Genetic Algorithm for Feature Selection based on Pearson Correlation Coefficient

A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants

A Non-Parametric Method for Building Predictive Genetic Tests on High-Dimensional Data

A Review of Feature Selection and Classification Approaches for Heart Disease Prediction

Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review

A comprehensive review for chronic disease prediction using machine learning algorithms

Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets

Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction

Interrelated feature selection from health surveys using domain knowledge graph

A Review on Nature-Inspired Algorithms for Cancer Disease Prediction and Classification

A Novel Feature Selection Method Based on MRMR and Enhanced Flower Pollination Algorithm for High Dimensional Biomedical Data

Analyzing the impact of feature selection on the accuracy of heart disease prediction

Supervised Learning-Based Tagsnp Selection for Genome-Wide Disease Classifications

Distance-based mutual congestion feature selection with genetic algorithm for high-dimensional medical datasets

Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: A comprehensive analysis

Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases

Optimized Clinical Feature Analysis for Improved Cardiovascular Disease Risk Screening

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Enhancing Cardiovascular Disease Risk Prediction with Machine Learning Models