Efficient Computation for Diagonal of Forest Matrix Via Variance-Reduced Forest Sampling

Haoxin Sun,Zhongzhi Zhang
DOI: https://doi.org/10.1145/3589334.3645578
2024-01-01
Abstract:The forest matrix of a graph, particularly its diagonal elements, has far-reaching implications in network science and machine learning. The state-of-the-art algorithms for the diagonal of forest matrix computation are based on the fast Laplacian solver. However, these algorithms encounter limitations when applied to digraphs due to the incapacity of the Laplacian solver. To overcome the issue, in this paper, we propose three novel sampling-based algorithms:SCF,SCFV,and SCFV+. Our first algorithm SCF leverages a probability interpretation of the diagonal of the forest matrix and utilizes an extension of Wilson's algorithm to sample spanning converging forests. To reduce the variance in the forest sampling, we develop two novel variance-reduced techniques. The first technique, leading to the proposal of the SCFV algorithm, is inspired by opinion dynamics in graphs and applies matrix-vector iteration to the spanning forest sampling. While SCFV achieves reduced variance compared to SCF, the cross-product term in its variance expression can be complex and potentially large in certain graphs. Therefore, we develop another technique, leading to a new iteration equation and the SCFV+ algorithm. SCFV+ achieves further reduced variance without the cross-product term in the variance of SCFV. We prove that SCFV+ can achieve a relative error guarantee with high probability and maintain a linear time complexity relative to the number of nodes in the graph, presenting a superior theoretical result compared to state-of-the-art algorithms. Finally, we conduct extensive experiments on various real-world networks, showing that our algorithms achieve better estimation accuracy and are more time-efficient than the state-of-the-art algorithms. Particularly, our algorithms are scalable to massive graphs with more than twenty million nodes in both undirected and directed graphs.
What problem does this paper attempt to address?