A Study of Applying Unsupervised Learning Methods for Document Clustering and Automatic Categorization of Software.

Kai-Wen Chen,Chin-Yu Huang
DOI: https://doi.org/10.1109/ieem50564.2021.9672875
2021-01-01
Abstract:Software categorization is the task to group software into categories that briefly represent the behavior of software. However, with the rapid growth of software, manual software categorization has become almost impossible and expensive. Therefore, automatic software categorization has become necessary. In this paper, we propose to utilize two different document clustering methods, nonnegative matrix factorization (NMF) and spectral clustering, to fulfill the automatic software categorization respectively. In our work, we not only compare our performance with an existing automatic software categorization method LACT, but we also make profound analysis on the difference between our both clustering methods. Our methods require only at most about 1/10 execution time of LACT while the fastest one is hundreds of times faster than LACT, achieving at most 26% and 100% better performances based on two criteria, BCubed F1-measure and Adjusted Rand Index.
What problem does this paper attempt to address?