A novel Chinese herbal medicine clustering algorithm via artificial bee colony optimization

Nan Han,Shaojie Qiao,Guan Yuan,Ping Huang,Dingxiang Liu,Kun Yue
DOI: https://doi.org/10.1016/j.artmed.2019.101760
IF: 7.011
2019-11-01
Artificial Intelligence in Medicine
Abstract:<p>Traditional Chinese medicine (TCM) has become popular and been viewed as an effective clinical treatment across the world. Accordingly, there is an ever-increasing interest in performing data analysis over TCM data. Aiming to cope with the problem of excessively depending on empirical values when selecting cluster centers by traditional clustering algorithms, an improved artificial bee colony algorithm is proposed by which to automatically select cluster centers and apply it to aggregate Chinese herbal medicines. The proposed method integrates the following new techniques: (1) improving the artificial bee colony algorithm by applying a new searching strategy of neighbour nectar, (2) employing the improved artificial bee colony algorithm to optimize the parameters of the cutoff distance <em>d</em><sub><em>c</em></sub>, the local density <em>ρ</em><sub><em>i</em></sub> and the minimum distance <em>δ</em><sub><em>i</em></sub> between the element <em>i</em> and any other element with higher density in the cluster algorithm by fast search and finding of density peaks (called DP algorithm) to find the optimal cluster centers, in order to clustering herbal medicines in an accurate fashion with the guarantee of runtime performance. Extensive experiments were conducted on the UCI benchmark datasets and the TCM datasets and the results verify the effectiveness of the proposed method by comparing it with classical clustering algorithms including K-means, K-mediods and DBSCAN in multiple evaluation metrics, that is, Silhouette Coefficient, Entropy, Purity, Precision, Recall and F1-Measure. The results show that the IABC-DP algorithm outperforms other approaches with good clustering quality and accuracy on the UCI and the TCM datasets as well. In addition, it can be found that the improved artificial bee colony algorithm can effectively reduce the number of iterations when compared to the traditional bee colony algorithm. In particular, the IABC-DP algorithm is applied to cluster multi-dimensional Chinese herbal medicines and the result shows that it outperforms other clustering algorithms in clustering Chinese herbal medicines, which can contribute to a larger effort targeted at advancing the study of discovering composition rules of traditional Chinese prescriptions.</p>
engineering, biomedical,computer science, artificial intelligence,medical informatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the process of traditional Chinese medicine (TCM) data clustering, traditional clustering algorithms rely too much on empirical values when selecting clustering centers. Specifically, the author proposes a new method based on artificial bee colony optimization (Artificial Bee Colony, ABC) to automatically select clustering centers and applies it to the clustering analysis of traditional Chinese medicine. This method aims to improve the accuracy and efficiency of clustering, while reducing the dependence on empirical values, so as to better mine and understand the rules of traditional Chinese medicine combinations. ### Core problems of the paper 1. **Empirical value - dependence problem**: Traditional clustering algorithms usually need to manually set parameters when selecting clustering centers, which leads to excessive dependence on empirical values. 2. **Parameter optimization problem**: How to automatically optimize the key parameters (such as the cut - off distance \(d_c\), local density \(\rho_i\) and minimum distance \(\delta_i\)) in the clustering algorithm to find the optimal clustering centers. 3. **Complexity of TCM data**: TCM data is highly complex and diverse, and traditional clustering methods are difficult to effectively process these data. ### Solutions The author proposes an improved artificial bee colony optimization algorithm (Improved Artificial Bee Colony, IABC) and solves the above problems through the following steps: 1. **Pre - process TCM data**: Create a TCM database containing multiple tables through ETL (Extract - Transform - Load) operations, such as drugs, prescriptions, dosages and prescription effects. 2. **Parameter optimization**: Use the improved ABC algorithm to automatically determine the key parameter combinations (\(d_c\), \(\rho_i\) and \(\delta_i\)) in the DP algorithm. The improved ABC algorithm introduces a new neighbor search strategy and controls the algorithm to quickly find the best food source in the early stage through two update factors \(\phi_1\) and \(\phi_2\). 3. **Clustering analysis**: Based on the optimized parameter combinations, use the DP algorithm to efficiently cluster traditional Chinese medicine. ### Experimental verification The author conducted extensive experiments on UCI benchmark datasets and TCM datasets, and verified the effectiveness of the proposed method through multiple evaluation indicators (such as silhouette coefficient, entropy, purity, precision, recall and F1 - value). The experimental results show that the IABC - DP algorithm is superior to other classical clustering algorithms (such as K - means, K - medoids and DBSCAN) in terms of clustering quality and accuracy on UCI and TCM datasets. ### Main contributions 1. **Automatic parameter optimization**: Automatically optimize the key parameters in the DP algorithm through the improved ABC algorithm, reducing the dependence on empirical values. 2. **Efficient clustering**: The proposed IABC - DP algorithm improves the accuracy and quality of clustering while ensuring the running - time performance. 3. **TCM data mining**: This method can be effectively applied to the clustering of multi - dimensional TCM data, which is helpful for discovering the rules of traditional Chinese medicine combinations and promoting the development of traditional Chinese medicine research. Through these improvements, the paper provides an effective method to deal with the complexity and diversity of TCM data, and provides new ideas and technical support for the modernization research of traditional Chinese medicine.