Paralinear Distance and Its Algorithm for Hierarchical Clustering of High-dimensional Discrete Variables
Shuai Wang,Lizhu Hao,Xiaofei Wang,Jianhua Guo
DOI: https://doi.org/10.1016/j.ijar.2024.109133
IF: 4.452
2024-01-28
International Journal of Approximate Reasoning
Abstract:Variable clustering is an important tool for mining association rules and explaining the latent mechanisms responsible for generating data. In this work, we aim to study the hierarchical variable clustering algorithm based on the paralinear distance between discrete variables. Firstly, we study the paralinear distance with the multinomial distribution, and point out that any distance with additivity on the graphical tree model has a unique form on the paralinear distance. And then, we suggest a novel hierarchical clustering algorithm, which can determine the local relationships of observed variables as sibling groups and singletons in each level, where the hierarchical structures are indicated between the levels. Furthermore, we show the probably approximately correct (PAC) property of the algorithm, and find out that its sample complexity is sensitive to the diameter of the tree. Finally, by using GPU computation, we demonstrate our discoveries and the applications of our learning algorithms through large-scale experiments on both synthetic and real-world data. Extensive empirical results show that the proposed method is efficient for discovering local structures and latent information.
computer science, artificial intelligence