Agglomerative Clustering in Uniform and Proportional Feature Spaces

Alexandre Benatti,Luciano da F. Costa
2024-07-11
Abstract:Pattern comparison represents a fundamental and crucial aspect of scientific modeling, artificial intelligence, and pattern recognition. Three main approaches have typically been applied for pattern comparison: (i) distances; (ii) statistical joint variation; (iii) projections; and (iv) similarity indices, each with their specific characteristics. In addition to arguing for intrinsic interesting properties of multiset-based similarity approaches, the present work describes a respectively based hierarchical agglomerative clustering approach which inherits the several interesting characteristics of the coincidence similarity index -- including strict comparisons allowing distinguishing between closely similar patterns, inherent normalization, as well as substantial robustness to the presence of noise and outliers in datasets. Two other hierarchical clustering approaches are considered, namely a multiset-based method as well as the traditional Ward's approach. After characterizing uniform and proportional features spaces and presenting the main basic concepts and methods, a comparison of relative performance between the three considered hierarchical methods is reported and discussed, with several interesting and important results. In particular, though intrinsically suitable for implementing proportional comparisons, the coincidence similarity methodology also works effectively in several types of data in uniform feature spaces
Physics and Society
What problem does this paper attempt to address?
The paper primarily explores the agglomerative hierarchical clustering method in pattern recognition and proposes a hierarchical clustering method based on multiple set similarity indices (especially the coincidence similarity index). Specifically: 1. **Research Background**: - Pattern recognition holds significant importance in scientific modeling, artificial intelligence, and pattern recognition. - Humans and other organisms rely on pattern recognition for survival and reproduction. - Pattern recognition can be divided into supervised learning and unsupervised learning. 2. **Comparison Methods**: - The study examined four main pattern comparison methods: distance, statistical joint variation, projection, and similarity index. - Special emphasis was placed on multiple set similarity indices (such as the coincidence similarity index), which have strict comparison capabilities, intrinsic normalization, and robustness to noise and outliers. 3. **Types of Feature Spaces**: - The paper discusses the concepts of uniform feature space and proportional feature space and applies them to the comparison of three agglomerative hierarchical clustering methods: the method based on multiple set similarity and the traditional Ward method. 4. **Experimental Results**: - Experiments were conducted on different types of clustering structures, comparing the performance of the three methods in 2-dimensional space. - It was found that although the coincidence similarity method is inherently suitable for proportional feature space, it also performs well in uniform feature space. In summary, this paper aims to improve the agglomerative hierarchical clustering method by introducing multiple set similarity indices and to verify their effectiveness in different types of feature spaces.