HyperSMOTE: A Hypergraph-based Oversampling Approach for Imbalanced Node Classifications

Ziming Zhao,Tiehua Zhang,Zijian Yi,Zhishu Shen
DOI: https://doi.org/10.48550/arXiv.2409.05402
2024-09-09
Abstract:Hypergraphs are increasingly utilized in both unimodal and multimodal data scenarios due to their superior ability to model and extract higher-order relationships among nodes, compared to traditional graphs. However, current hypergraph models are encountering challenges related to imbalanced data, as this imbalance can lead to biases in the model towards the more prevalent classes. While the existing techniques, such as GraphSMOTE, have improved classification accuracy for minority samples in graph data, they still fall short when addressing the unique structure of hypergraphs. Inspired by SMOTE concept, we propose HyperSMOTE as a solution to alleviate the class imbalance issue in hypergraph learning. This method involves a two-step process: initially synthesizing minority class nodes, followed by the nodes integration into the original hypergraph. We synthesize new nodes based on samples from minority classes and their neighbors. At the same time, in order to solve the problem on integrating the new node into the hypergraph, we train a decoder based on the original hypergraph incidence matrix to adaptively associate the augmented node to hyperedges. We conduct extensive evaluation on multiple single-modality datasets, such as Cora, Cora-CA and Citeseer, as well as multimodal conversation dataset MELD to verify the effectiveness of HyperSMOTE, showing an average performance gain of 3.38% and 2.97% on accuracy, respectively.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the class - imbalance problem in hypergraph learning**. Specifically, as a high - level graph structure, hypergraph can better model and extract high - order relationships in data. However, current hypergraph models encounter challenges when dealing with class - imbalanced data. This imbalance will cause the model to be biased towards the majority class, thus affecting the classification performance of the minority class. ### Problem Background 1. **Advantages and Challenges of Hypergraphs**: - Compared with ordinary graphs, hypergraphs can more accurately represent the complex relationships between multiple nodes and are suitable for fields such as recommendation systems, sentiment detection, and multi - source sleep quality assessment. - However, hypergraphs face difficulties in dealing with class - imbalanced data because imbalanced data will cause the model to over - fit the majority class and under - fit the minority class. 2. **Limitations of Existing Methods**: - Existing methods for solving class - imbalance, such as GraphSMOTE, are effective on ordinary graphs but not applicable to hypergraphs because the structure of hypergraphs is more complex and the relationships between nodes are more diverse. ### The Method Proposed in the Paper To solve the above problems, the paper proposes **HyperSMOTE**, an oversampling method based on hypergraphs. The main contributions of HyperSMOTE include: 1. **Generating New Minority - Class Nodes**: - By synthesizing new minority - class nodes to increase the number of minority - class samples in the training set, thereby alleviating the class - imbalance problem. - The features of the new nodes are generated by combining the features of the target node and its neighbor nodes, ensuring that the new nodes are consistent with the original data distribution. 2. **Integrating New Nodes into the Hypergraph**: - Use a decoder to dynamically assign new nodes to the most relevant hyperedges according to the incidence matrix of the original hypergraph, ensuring that the addition of new nodes will not destroy the topological structure of the hypergraph. 3. **Extensive Experimental Verification**: - Experiments were carried out on multiple unimodal and multimodal datasets to verify the effectiveness of HyperSMOTE. The experimental results show that HyperSMOTE has a significant improvement in both accuracy and Macro - F1 score. ### Formula Representation - **Hypergraph Convolution Formula**: \[ E_E=\text{Aggr}(E, \sigma(W_1 X)) \] \[ E_v = \text{Aggr}(E, \sigma(W_2 E_E)) \] where \(E_E\in\mathbb{R}^{|E|\times D}\) represents hyperedge embedding, \(E_v\in\mathbb{R}^{|V|\times D}\) represents the updated node embedding, \(W_1\) and \(W_2\) are linear projection matrices, and \(\sigma\) is an activation function. - **New Node Feature Generation Formula**: \[ E_{v_g}=\tau E_{v_t}+(1 - \tau)\text{Mean}(\{E_{v_i}|v_i\in N(v_t)\}) \] where \(N(v_t)\) represents the neighbor nodes of the target node \(v_t\), and \(\tau\) is a hyperparameter used to control the weight between the target node embedding and the neighbor node embedding. - **Decoder Formula**: \[ \hat{H}_{\epsilon, v_g}=\sigma(E_{v_g}\cdot P\cdot E_\epsilon) \] where \(P\in\mathbb{R}^{D\times D}\) is a learnable projection matrix, and \(\sigma\) is the sigmoid activation function used to limit the decoder output between 0 and 1. ### Summary HyperSMOTE effectively solves the class - imbalance problem in hypergraph learning by generating new minority - class nodes and integrating them reasonably into the hypergraph structure, improving the classification performance of the model on the minority class.