Abstract:Community and core-periphery are two widely studied graph structures, with their coexistence observed in real-world graphs (Rombach, Porter, Fowler \& Mucha [SIAM J. App. Math. 2014, SIAM Review 2017]). However, the nature of this coexistence is not well understood and has been pointed out as an open problem (Yanchenko \& Sengupta [Statistics Surveys, 2023]). Especially, the impact of inferring the core-periphery structure of a graph on understanding its community structure is not well utilized. In this direction, we introduce a novel quantification for graphs with ground truth communities, where each community has a densely connected part (the core), and the rest is more sparse (the periphery), with inter-community edges more frequent between the peripheries.
Built on this structure, we propose a new algorithmic concept that we call relative centrality to detect the cores. We observe that core-detection algorithms based on popular centrality measures such as PageRank and degree centrality can show some bias in their outcome by selecting very few vertices from some cores. We show that relative centrality solves this bias issue and provide theoretical and simulation support, as well as experiments on real-world graphs.
Core detection is known to have important applications with respect to core-periphery structures. In our model, we show a new application: relative-centrality-based algorithms can select a subset of the vertices such that it contains sufficient vertices from all communities, and points in this subset are better separable into their respective communities. We apply the methods to 11 biological datasets, with our methods resulting in a more balanced selection of vertices from all communities such that clustering algorithms have better performance on this set.
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is how to more effectively identify and utilize the co - existence of community structure and core - periphery structure in graph structures. Specifically, the author focuses on how to improve the core detection algorithm through relative centrality in graphs with multiple cores (multi - core) to overcome the imbalance problems that may occur when traditional centrality methods (such as PageRank and degree centrality) select core nodes. These problems include:
1. **Co - existence of community structure and core - periphery structure**: Although these two structures often appear simultaneously in actual graphs, their interactions have not been fully understood. In particular, the influence of inferring the core - periphery structure of a graph on understanding its community structure has not been fully utilized.
2. **Imbalance problem in core detection**: Core detection algorithms based on popular centrality metrics (such as PageRank and degree centrality) may be biased, that is, very few vertices are selected from some cores, resulting in poor performance in downstream applications (such as clustering).
To solve the above problems, the author makes the following contributions:
1. **Formalizing multi - core - periphery structure**: The author defines a multi - core - periphery structure (MCPC), in which each community has a densely connected core part and a sparser periphery part, and the edges between communities appear more frequently between peripheries.
2. **Proposing the concept of relative centrality**: In order to detect the core in the MCPC structure, the author introduces the concept of relative centrality and verifies its effectiveness through theoretical analysis and simulation experiments. Relative centrality can solve the imbalance problem brought by traditional centrality methods.
3. **Application on real - world data sets**: The author applies the proposed algorithm to 11 biological data sets. The results show that the subset of vertices selected using relative centrality can be better separated into their respective communities, thereby improving the performance of the clustering algorithm.
### Specific problems and solutions
#### 1. Co - existence of community structure and core - periphery structure
- **Problem**: Existing research has insufficient understanding of the co - existence of community structure and core - periphery structure, especially in multi - core settings.
- **Solution**: The author defines the multi - core - periphery structure (MCPC) and proves through experiments that much real - world data conforms to this structure.
#### 2. Imbalance problem in core detection
- **Problem**: Traditional centrality methods (such as PageRank and degree centrality) may be biased towards certain cores when selecting core nodes, resulting in unbalanced selection.
- **Solution**: The author proposes the concept of relative centrality, which solves this imbalance problem by considering the relative importance of nodes within their neighborhoods.
### Specific implementation of relative centrality
#### Definition
- **Core Concentration (CCG)**: Given a directed graph \(G(V, E)\), for any \(V'\subseteq V\) and \(V''\subseteq V\), let \(E(V', V'')\) represent the number of edges from \(V'\) to \(V''\). Then the core concentration of the set \(S\subseteq V\) is defined as:
\[
CCG(S)=\frac{E(\bar{S}, S)-E(S, \bar{S})}{E(S, V)}
\]
where \(\bar{S}\) represents the complement of \(S\).
#### Algorithm
- **NeighborRank (N - Rank)**: This algorithm generates relative centrality scores by considering the relative importance of nodes and their neighborhoods.
- **Steps**:
1. Calculate the initial centrality score \(F^{(t)}(v_i)\) as the sum of the \(i\)-th column of matrix \(A^t\).
2. For each node \(v_i\), select the nodes in its neighborhood whose centrality scores are higher than \(v_i\), and calculate the average centrality score of these nodes.
3.