Gotta match 'em all: Solution diversification in graph matching matched filters

Zhirui Li,Ben Johnson,Daniel L. Sussman,Carey E. Priebe,Vince Lyzinski
2024-07-05
Abstract:We present a novel approach for finding multiple noisily embedded template graphs in a very large background graph. Our method builds upon the graph-matching-matched-filter technique proposed in Sussman et al., with the discovery of multiple diverse matchings being achieved by iteratively penalizing a suitable node-pair similarity matrix in the matched filter algorithm. In addition, we propose algorithmic speed-ups that greatly enhance the scalability of our matched-filter approach. We present theoretical justification of our methodology in the setting of correlated Erdos-Renyi graphs, showing its ability to sequentially discover multiple templates under mild model conditions. We additionally demonstrate our method's utility via extensive experiments both using simulated models and real-world dataset, include human brain connectomes and a large transactional knowledge base.
Machine Learning,Combinatorics,Applications,Methodology
What problem does this paper attempt to address?
The paper attempts to address the problem of finding multiple template graphs with noise embedded in a very large background graph. Specifically, the paper proposes a novel approach to achieve this goal, which is based on the previously proposed Graph-Matching Matched-Filter (GMMF) technique, and achieves diverse matching by iteratively penalizing the similarity matrix of suitable node pairs. Additionally, the paper proposes algorithm acceleration schemes to enhance its scalability and provides theoretical justification under the setting of related Erdős-Rényi graphs, demonstrating the method's ability to sequentially discover multiple templates. The effectiveness of the method is validated through extensive experiments, including simulated models and real datasets such as human brain connectomes and large transaction knowledge bases. In summary, the paper is primarily dedicated to solving the problem of efficiently and diversely identifying multiple subgraphs that are structurally similar to template graphs in large-scale background graphs.