A Deep Learning Approach to Diabetes Diagnosis

Zeyu Zhang,Khandaker Asif Ahmed,Md Rakibul Hasan,Tom Gedeon,Md Zakir Hossain
DOI: https://doi.org/10.48550/arXiv.2403.07483
IF: 5.414
2024-03-12
Machine Learning
Abstract:Diabetes, resulting from inadequate insulin production or utilization, causes extensive harm to the body. Existing diagnostic methods are often invasive and come with drawbacks, such as cost constraints. Although there are machine learning models like Classwise k Nearest Neighbor (CkNN) and General Regression Neural Network (GRNN), they struggle with imbalanced data and result in under-performance. Leveraging advancements in sensor technology and machine learning, we propose a non-invasive diabetes diagnosis using a Back Propagation Neural Network (BPNN) with batch normalization, incorporating data re-sampling and normalization for class balancing. Our method addresses existing challenges such as limited performance associated with traditional machine learning. Experimental results on three datasets show significant improvements in overall accuracy, sensitivity, and specificity compared to traditional methods. Notably, we achieve accuracies of 89.81% in Pima diabetes dataset, 75.49% in CDC BRFSS2015 dataset, and 95.28% in Mesra Diabetes dataset. This underscores the potential of deep learning models for robust diabetes diagnosis. See project website https://steve-zeyu-zhang.github.io/DiabetesDiagnosis/
What problem does this paper attempt to address?
The paper aims to address the issues present in diabetes diagnosis, particularly the invasiveness and cost limitations of existing diagnostic methods. Specifically, the paper proposes the following points: 1. **Non-invasive Diabetes Diagnosis**: Utilizing sensor technology and machine learning methods, a non-invasive diabetes diagnosis model based on Back Propagation Neural Network (BPNN) is developed. This method avoids the inconvenience and pain associated with the need to collect blood samples in traditional methods. 2. **Addressing Data Imbalance**: Existing machine learning models such as Classwise k Nearest Neighbor (CkNN) and General Regression Neural Network (GRNN) perform poorly when handling imbalanced data. The paper proposes a method to balance the dataset through undersampling techniques, thereby improving the classifier's performance on minority classes. 3. **Model Performance Enhancement**: The proposed BPNN model, combined with Batch Normalization, was experimentally validated on three different datasets. The results show that it outperforms traditional diagnostic methods in terms of overall accuracy, sensitivity, and specificity. The specific results are as follows: - Pima Diabetes Dataset: Accuracy 89.81%, Sensitivity 89.29%, Specificity 90.38%. - CDC BRFSS2015 Dataset: Accuracy 75.49%, Sensitivity 79.77%, Specificity 71.12%. - Mesra Diabetes Dataset: Accuracy 95.28%, Sensitivity 100%, Specificity 92.19%. 4. **Comprehensive Evaluation and Comparison**: The paper also compares the proposed BPNN model with several other commonly used machine learning methods, including CkNN, GA-MLP, and GRNN. The results indicate that the proposed BPNN model achieves better diagnostic performance across multiple datasets. In summary, the main objective of this paper is to develop a non-invasive diabetes diagnosis method based on BPNN, addressing the issues of traditional diagnostic methods and demonstrating higher accuracy and reliability in practical applications.