Hcluwin: An Algorithm For Clustering Heterogeneous Data Streams Over Sliding Windows

Jiadong Ren,Changzhen Hu,Ruiqing Ma
2010-01-01
Abstract:Many applications in web usage mining, such as business intelligence and usage characterization, require effective and efficient techniques to discover the users with similar usage patterns and the web pages with correlate contents in the physical world. Clustering click streams can help to achieve the goal. Despite the high processing rate, the existing methods for clustering click streams over sliding widows suffer from the missing of categorical attributes in click stream data. In this paper, we present HCluWin, an approach for clustering heterogeneous data streams which contain both continuous attributes and categorical attributes over sliding windows. A Heterogeneous Temporal Cluster Feature (HTCF) is introduced to monitor the distribution statistics of heterogeneous data points. Based on this structure, Exponential Histogram of Heterogeneous Cluster Feature (EHHCF) is presented. Simultaneously, a new similarity measure between two heterogeneous objects is proposed. Experimental results show that the clustering quality of HCluWin is higher than CluWin and the stream processing rate of HCluWin is higher than HClu Stream.
What problem does this paper attempt to address?