On Densest $k$-Subgraph Mining and Diagonal Loading

Qiheng Lu,Nicholas D. Sidiropoulos,Aritra Konar
2024-10-10
Abstract:The Densest $k$-Subgraph (D$k$S) problem aims to find a subgraph comprising $k$ vertices with the maximum number of edges between them. A continuous reformulation of the binary quadratic D$k$S problem is considered, which incorporates a diagonal loading term. It is shown that this non-convex, continuous relaxation is tight for a range of diagonal loading parameters, and the impact of the diagonal loading parameter on the optimization landscape is studied. On the algorithmic side, two projection-free algorithms are proposed to tackle the relaxed problem, based on Frank-Wolfe and explicit constraint parametrization, respectively. Experiments suggest that both algorithms have merits relative to the state-of-art, while the Frank-Wolfe-based algorithm stands out in terms of subgraph density, computational complexity, and ability to scale up to very large datasets.
Social and Information Networks,Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the **Densest k - Subgraph (DkS) problem**, that is, to find a subgraph containing \(k\) vertices in a given graph so that the number of edges between these vertices is the largest. Specifically, the paper focuses on how to reformulate this combinatorial optimization problem by introducing the diagonal loading term and study its impact on the optimization problem. ### Problem Background The Densest k - Subgraph (DkS) problem is an important problem in graph mining, aiming to extract a subgraph containing \(k\) vertices with the largest number of edges between these vertices from a graph. This problem has wide applications in multiple fields, such as discovering patterns in DNA sequences, analyzing hot topics in social media, and detecting fraud in e - commerce and financial transaction networks. However, the DkS problem is an NP - hard problem and is difficult to solve exactly. Although existing approximation algorithms can provide good solutions in some cases, they often face problems such as high computational complexity and slow convergence speed when dealing with large - scale data sets. In addition, some methods cannot guarantee the tightness of the solution, that is, whether the optimal solution of the relaxed continuous problem corresponds to the optimal solution of the original discrete problem. ### Main Contributions of the Paper 1. **Introduction of Diagonal Loading Term**: The paper proposes a continuous relaxation form with a diagonal loading term and adjusts the form of the optimization problem through the diagonal loading parameter \(\lambda\). The author proves that when \(\lambda\geq1\), this relaxation is tight, that is, the optimal solution of the relaxed continuous problem is also the optimal solution of the original discrete problem. 2. **Optimization Algorithm Design**: In order to efficiently solve the relaxed problem, the paper proposes two projection - free gradient optimization algorithms: - **Frank - Wolfe Algorithm**: This is a first - order projection - free method, which is suitable for maximizing a continuously differentiable function with a Lipschitz continuous gradient. This algorithm does not need to perform complex projection operations in each iteration step, so it has high computational efficiency. - **Explicit Constraint Parameterization Method**: By appropriate variable transformation, the constrained optimization problem is transformed into an unconstrained optimization problem, thus avoiding the projection step. This method can use optimizers such as Adam or AdamW to solve. 3. **Experimental Verification**: The paper verifies the effectiveness of the proposed method through multiple real - world data sets. The experimental results show that, in particular, the Frank - Wolfe algorithm has reached the state - of - the - art level in terms of subgraph density, computational complexity, and scalability. ### Summary The main objective of the paper is to improve the continuous relaxation form of the Densest k - Subgraph problem by introducing the diagonal loading term and develop efficient optimization algorithms to solve this problem. Through theoretical analysis and experimental verification, the author proves the effectiveness and superiority of the proposed method in practical applications.