Intelligent medical heterogeneous big data set balanced clustering using deep learning

Xiaofeng Li,Hongshuang Jiao,Dong Li
DOI: https://doi.org/10.1016/j.patrec.2020.08.027
IF: 4.757
2020-10-01
Pattern Recognition Letters
Abstract:<p>In order to address the clustering problem of intelligent medical data, the data sets were not preprocessed using the traditional method, leading to a large amount of calculation, low efficiency, and large data cluster center offset distance. We proposed a balanced clustering algorithm for intelligent medical heterogeneous big data set using deep learning. Firstly, a deep neural network model based on incremental updating was constructed, and adaptive training and adjustment were made according to data scale, and the multi-layer feature learning of heterogeneous big data sets of intelligent medical care. Secondly, under-sampling preprocessing was carried out on the data set so that the data of the heterogeneous big data set was in a balanced state, and on this basis, clustering calculation of the heterogeneous big data was conducted. Then, the clustering center was set according to the kernel density estimation results, and the data cluster center was updated iteratively until convergence by combining the data features obtained from deep learning and euclidean distance calculation, so as to complete the balanced clustering of the heterogeneous big data set of intelligent medical treatment. The results show that the proposed algorithm has the advantages of small data cluster center offset distance, short clustering time, low energy consumption, high Macro-F1 value and NMI value, and the accuracy of clustering can be as high as 95%, the calculational cost is low, which has certain advantages.</p><p>2020 Elsevier Ltd. All rights reserved.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the clustering problem of intelligent medical datasets. Specifically, traditional methods face the following issues when dealing with heterogeneous big data sets in intelligent healthcare: 1. **Insufficient Preprocessing**: Inadequate preprocessing of the dataset leads to high computational load, low efficiency, and significant shifts in data cluster centers. 2. **Cluster Center Shift**: Data cluster centers are prone to shifting, affecting the accuracy of clustering. 3. **High Computational Complexity**: Existing clustering algorithms have high computational complexity, impacting the clustering results. To solve these problems, the authors propose a balanced clustering algorithm for heterogeneous big data sets in intelligent healthcare based on deep learning. The main contributions of this algorithm include: 1. **Building a Deep Neural Network Model**: Used for multi-layer feature learning to extract complex features from the dataset, providing a foundation for data clustering. 2. **Data Preprocessing**: By performing undersampling preprocessing on the dataset, the data is balanced, and clustering calculations for heterogeneous big data are conducted on this basis. 3. **Dynamic Update of Cluster Centers**: Cluster centers are set based on kernel density estimation results, and data features obtained through deep learning and Euclidean distance calculations are combined to iteratively update data cluster centers until convergence, completing the balanced clustering of heterogeneous big data sets. 4. **Performance Validation**: The algorithm's performance is validated using various evaluation criteria, and results show that this algorithm outperforms traditional methods in terms of clustering time, clustering energy consumption, clustering accuracy, and computational cost. Through these improvements, the algorithm can effectively solve the clustering problem of heterogeneous big data sets in intelligent healthcare, enhancing clustering accuracy and efficiency.