Abstract:The rapid growth in the volume of image and video data collections motivates the research of building an index structure in image information retrieval. Constructing an index in the image database poses a very challenging problem due to the facts of image databases containing data with high dimensions, and lack of domain knowledge. ClusterTree is an indexing approach representing clusters generated by any existing clustering approach and do not need any prior knowledge. It is a hierarchy of clusters and subcluster which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. However, one disadvantage of ClusterTree is that non-clustering data points are often ignored. These non-clustering data points might represent interesting targets in an image database. In this paper, we propose a modified ClusterTree structure(called SS-ClusterTree), which is based on subspace clustering. The SS-ClusterTree includes two kinds of leaf nodes, a cluster leaf node and a noise leaf node. When a new data item is added to the SS-ClusterTree, if it belongs to a cluster, it is inserted into the corresponding the cluster leaf node, otherwise into the noise leaf node. The noise leaf node will be split while its volume is more than a certain threshold. We present a novel updating technique which optimizes the internal structure of the SS-ClusterTree by utilizing the Newton's Universal Law of Gravitation. When a noise node is split, the attraction forces are calculated between every new node and its sibling nodes. These new nodes may be merged by their sibling nodes, if the attraction force between them is the most significant. Meanwhile the nodes intersecting boundaries are updated. This approach guarantees that the SS-ClusterTree always represents the current dataset structure, and helps to identify the pattern hiding in the newly added data. SS-ClusterTree can efficiently support the dynamic insertion and manage the dataset with non-clustering data, and is highly adaptive to any kind of cluster structure. Our experiment results also show that this index structure is effective and efficient.

Discovering pattern-based subspace clusters by pattern tree

A Fast Subspace Clustering Algorithm Based on Pattern Similarity

Effective algorithm for maximal pattern-based subspace clustering

Mining Maximal Pattern-Based Subspace Clusters in High Dimensional Space

Frequent Patterns-Based Subspace Clustering

Learning a Subspace for Clustering Via Pattern Shrinking

SS-ClusterTree: a subspace clustering based indexing algorithm over high-dimensional image features

Discovering the Skyline of Subspace Clusters in High-Dimensional Data

Direct mining of discriminative and essential frequent patterns via model-based search tree

Flexible Clustering by Tendency in High Dimensional Space

Spatial Co-Location Pattern Mining Based On Density Peaks Clustering And Fuzzy Theory

Spatial Co-Location Pattern Mining Based on the Improved Density Peak Clustering and the Fuzzy Neighbor Relationship.

Spatial Colocation Pattern Discovery Incorporating Fuzzy Theory.

Qtop-K: A Novel Algorithm For Mining High Quality Pattern-Based Clusters In Gst Microarray Data

Subspace Clustering with Sparsity and Grouping Effect

Efficient Direct Structured Subspace Clustering

Structure-Aware Subspace Clustering

Testing the significance of patterns in data with cluster structure

Nearest neighbor and closed pattern subspace clustering.

A Cluster-Based Approach for Extracting Redundancy-aware Top-k Co-location Patterns

Preserving Local and Global Information: an Effective Metric-based Subspace Clustering