A Search Log Clustering Algorithm Based on the Idea of Hierarchy

Shu-Sheng Hou,Han Jin,Yang Wei,Xu Wang,Bao-Quan Fan,Jin-Mao Wei
DOI: https://doi.org/10.1109/icmlc.2013.6890446
2013-01-01
Abstract:Data analysis for search logs is becoming more and more important and necessary. A search query may contain several keywords, which makes the text belong to different categories. This paper presents a new algorithm called Sequential Clustering Algorithm for clustering search logs. Different from many other clustering algorithms, the proposed algorithm can cluster one record into multiple categories and meanwhile achieve a balance among time complexity, clustering reliability and the involved parameters. These are realized by text combination and text backtracking. Text combination forms the feature of each category automatically, and text backtracking makes the previous texts have opportunities to be compared with new categories. In the experiments, the proposed algorithm and the general hierarchical clustering algorithm were applied to the clustering of search log texts. The results suggest that our proposed algorithm can improve the clustering performance.
What problem does this paper attempt to address?