Flexible Clustering by Tendency in High Dimensional Space

Jinze Liu,Wei Wang
2003-01-01
Abstract:Clustering is the process of grouping a set of objects into classes of similar objects. Until recently, the concept of similarity is based on dis- tances, e.g Euclidean distance and cosine distance. Our previous work on -cluster and -pCluster designed new similarity models to capture subspace coherency exhibited in data and focused on shifting patterns or scaling patterns. Along the same general direction, we propose a more flexible yet powerful clustering model, namely u-Cluster (Up- pattern Cluster). Under this model, two objects are similar in a subset of dimensions if there exist a permutation of these dimensions, along which both objects exhibit a consistent 'up' pattern. For instance, in DNA microarray analysis, the expression levels of two genes can rise synchronously in response to a sequence of environment stimuli. Al- though the magnitude of their expression levels might not be close and the amount by which they rise might not be equivalent, the 'up' pat- terns that they exhibit can be consistent. Discovery of such clusters of genes is essential in revealing significant connections in gene regula- tory networks. In addition, E-Commerce applications such as collabo- rative filtering and stock analysis can also benefit from this model for identifying customer groups that have consistent trends in interests or activities (purchasing, browsing, etc). We also devise an efficient algo- rithm that takes advantage of fast sequential pattern mining to detect such clusters. Its efficiency and effectiveness have been demonstrated through experiments on several real data sets.
What problem does this paper attempt to address?