Quantum Machine Learning Applied to the Classification of Diabetes

Juan Kenyhy Hancco-Quispe,Jordan Piero Borda-Colque,Fred Torres-Cruz
DOI: https://doi.org/10.48550/arXiv.2301.00109
2022-12-31
Abstract:Quantum Machine Learning (QML) shows how it maintains certain significant advantages over machine learning methods. It now shows that hybrid quantum methods have great scope for deployment and optimisation, and hold promise for future industries. As a weakness, quantum computing does not have enough qubits to justify its potential. This topic of study gives us encouraging results in the improvement of quantum coding, being the data preprocessing an important point in this research we employ two dimensionality reduction techniques LDA and PCA applying them in a hybrid way Quantum Support Vector Classifier (QSVC) and Variational Quantum Classifier (VQC) in the classification of Diabetes.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Use quantum machine learning (Quantum Machine Learning, QML) methods to classify diabetes and compare it with classical machine learning methods to evaluate the potential and advantages of QML in practical applications**. Specifically, the goals of the paper include: 1. **Explore the advantages of quantum machine learning**: By applying the quantum support vector classifier (QSVC) and variational quantum classifier (VQC) to the diabetes classification task, show the potential advantages of quantum machine learning over classical machine learning. 2. **Data pre - processing and dimension reduction**: Research the performance of two dimension - reduction techniques, linear discriminant analysis (LDA) and principal component analysis (PCA), on the diabetes data set to optimize the data structure and improve classification performance. 3. **Model comparison**: Compare quantum machine learning models (QSVC and VQC) with classical machine learning models (such as logistic regression, decision tree, K - nearest neighbor, naive Bayes, etc.), and evaluate the performance of different methods in the diabetes classification task. ### Main content of the paper - **Background introduction**: Introduced the wide application of machine learning in solving classification problems, especially in the medical field. With the development of quantum computing, quantum machine learning has become a new research hotspot, although the current hardware limitations of quantum computers (such as insufficient qubit numbers) still pose challenges. - **Data set**: Used the Pima Indians Diabetes Database, which contains multiple feature variables (such as number of pregnancies, glucose concentration, blood pressure, skin thickness, insulin level, BMI, diabetes pedigree function, age), and the goal is to predict whether a patient has diabetes. - **Methods**: - **Dimension - reduction techniques**: Use LDA and PCA to reduce the data dimension, where LDA focuses on maximizing the separation between classes, while PCA focuses on maximizing the data variance. - **Quantum encoding**: Adopt angular encoding and ZZ feature mapping to convert classical data into quantum representations. - **Model selection**: Apply classical machine learning models and quantum machine learning models (QSVC and VQC) respectively for classification, and use the same evaluation metrics (such as precision, recall, F1 - score, balanced accuracy rate) for comparison. - **Results**: Showed the performance differences of different models in the diabetes classification task, especially the superiority of quantum models in some metrics. - **Discussion and conclusion**: Summarized the application potential of quantum machine learning in diabetes classification, pointed out that although the current hardware limitations of quantum computers still exist, quantum machine learning performs well in some aspects and is expected to further improve its performance in the future. ### Formula examples 1. **Probability formula of the logistic regression model**: \[ P(y_i = 1)=\frac{1}{1 + e^{-(\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\cdots+\beta_nx_{in})}} \] where \(P(y_i = 1)\) represents the probability that sample \(i\) belongs to class 1, \(\beta_n\) is the linear regression coefficient, and \(x_{in}\) is the sample feature. 2. **Main differences between LDA and PCA**: - LDA aims to maximize the separation between classes and is suitable for data analysis of small categories. - PCA aims to maximize the data variance and is suitable for data analysis of a large number of features. 3. **Evaluation metrics**: - **Precision**: \[ \text{Precision}=\frac{TP}{TP + FP} \] - **Recall**: \[ \text{Recall}=\frac{TP}{TP + FN} \] - **F1 - score**: \[ \text{F1 - score} \]