Abstract:Graph clustering aims at discovering a natural grouping of the nodes such that similar nodes are assigned to a common cluster. Many different algorithms have been proposed in the literature: for simple graphs, for graphs with attributes associated to nodes, and for graphs where edges represent different types of relations among nodes. However, complex data in many domains can be represented as both attributed and multi-relational networks.
In this paper, we propose SpectralMix, a joint dimensionality reduction technique for multi-relational graphs with categorical node attributes. SpectralMix integrates all information available from the attributes, the different types of relations, and the graph structure to enable a sound interpretation of the clustering results. Moreover, it generalizes existing techniques: it reduces to spectral embedding and clustering when only applied to a single graph and to homogeneity analysis when applied to categorical data. Experiments conducted on several real-world datasets enable us to detect dependencies between graph structure and categorical attributes, moreover, they exhibit the superiority of SpectralMix over existing methods.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform effective clustering in attributed multi - relational graphs. Specifically, the author proposes a new algorithm named SpectralMix, which aims to handle multi - relational graphs with categorical node attributes through joint dimensionality reduction techniques. SpectralMix can integrate all available information from attributes, different types of relationships, and graph structures to achieve a reasonable explanation of the clustering results. In addition, SpectralMix also generalizes some existing techniques. For example, when only applied to a single graph, it can be reduced to spectral embedding and clustering; when applied to categorical data, it can be reduced to homogeneity analysis.
### Main Problems and Solutions
1. **Clustering of Attributed Multi - Relational Graphs**:
- **Problem**: Existing clustering methods are mainly for simple graphs, graphs with numerical attributes, or graphs with a single type of relationship, and these methods are not effective for complex graphs that contain multiple relationships and attributes simultaneously.
- **Solution**: SpectralMix integrates all information (attributes, relationship types, graph structures) in the multi - relational attributed graph into a low - dimensional vector space through joint dimensionality reduction techniques, thereby achieving more effective clustering.
2. **Information Fusion**:
- **Problem**: Node attributes and different types of edges in multi - relational attributed graphs can provide complementary information, but how to effectively fuse this information is a challenge.
- **Solution**: SpectralMix defines a mapping \(\Phi: G \rightarrow \mathbb{R}^d\) such that nodes with similar attributes have minimized distances in the low - dimensional space. This not only reduces noise but also emphasizes the main patterns in the data.
3. **Generality and Scalability**:
- **Problem**: Existing methods can usually only handle specific types of data, such as a single graph or multi - relational graphs without attributes.
- **Solution**: SpectralMix can handle multiple types of data, including single graphs, multi - relational graphs, attributed graphs, and non - attributed graphs. When applied to a single graph, it is consistent with Laplacian Eigenmaps; when applied to categorical data, it is consistent with homogeneity analysis.
### Experimental Verification
To evaluate the performance of SpectralMix, the author conducted experiments on multiple real - world datasets, including ACM, IMDB, DBLP, Flickr, and brain network datasets. The experimental results show that SpectralMix outperforms existing baseline methods in both clustering quality and data visualization tasks. In particular, for graphs with a large number of edges and attributes (such as the ACM dataset), SpectralMix performs extremely well and obtains the highest NMI and ARI values.
### Summary
SpectralMix successfully solves the clustering problem in multi - relational attributed graphs through joint dimensionality reduction techniques. It can not only effectively fuse node attributes and different types of relationships but also perform well on multiple types of data, having high generality and scalability.