Multi-channels Prototype Contrastive Learning with Condition Adversarial Attacks for Few-shot Event Detection

Fangchen Zhang,Shengwei Tian,Long Yu,Qimeng Yang
DOI: https://doi.org/10.1007/s11063-024-11515-1
IF: 2.565
2024-02-14
Neural Processing Letters
Abstract:Few-shot Event Detection (FSED) is a sub-task of Event Detection that aims to accurately identify event types with limited training instances and enable smooth transfer to newly-emerged event types. Recently, the dominant works have used the prototypical network to accomplish this task and employ contrastive learning to alleviate the issue of semantically-close categories. Nevertheless, these methods still suffer from two serious problems: (1) inadequate learning of prototype representations resulting from limited training data; (2) hard-easy sample imbalance and categories imbalance caused by the large number of non-trigger word("O" tags) in the token-level classification task. To address the problems, this paper proposes the Multi-channels Prototype and Contrastive learning method with Conditional Adversarial attack, which introduces the improved multi-channels prototype and contrastive networks to alleviate the categories and hard-easy samples imbalance. Moreover, we devise a constrained adversarial attack to improve the problem of limited training data. Extensive experimental results show that our model performs better than other FSED methods. All the code and data will be available for online public access.
computer science, artificial intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the two main challenges encountered in the Few - shot Event Detection (FSED) task: 1. **Insufficient prototype representation**: Due to the limited training data, the generated prototype representation is insufficient and cannot be well generalized to trigger words that the model has not seen before. Under the N - way - K - shot and sequence - labeling conditions, the prototype of each event type is aggregated only from the information of K tokens, which makes the prototype representation lack sufficient event - type information. 2. **Hard - easy samples and class imbalance**: In the word - level classification task, due to the existence of a large number of non - trigger words (labels marked as "O"), there is an imbalance between hard and easy samples and an imbalance between classes. Specifically, there are far more non - trigger words than trigger words in a sentence, which results in the loss of non - trigger words accounting for the majority of the total loss when calculating the loss, thus affecting the effective training of the model. To address these problems, the paper proposes a multi - channel prototype contrastive learning method combined with conditional adversarial attacks (Multi - channels Prototype Contrastive Learning with Condition Adversarial Attacks, MPC - CA). This method alleviates the problems of class and hard - easy sample imbalance through an improved multi - channel prototype network and contrastive learning, and solves the problem of limited training data through conditional adversarial learning. The specific contributions are as follows: 1. **Proposing the MPC - CA network**: This network aims to solve the problems of limited training data and class imbalance in FSED and sequence - labeling scenarios, including three parts: multi - channel prototype network, multi - channel contrastive learning, and conditional adversarial learning. 2. **Introducing Focal Loss**: Use Focal Loss instead of cross - entropy loss to mitigate the problems of class imbalance and hard - easy sample imbalance. 3. **Data cleaning and experimental verification**: Conducted data analysis and cleaning on the FewEvent dataset and named it FewEvent++, and carried out comparative experiments on this dataset. 4. **Experimental results**: The experimental results show that the proposed MPC - CA method outperforms other competing baseline models on the FewEvent and FewEvent++ datasets. Further analysis indicates that the proposed model has good adaptability and robustness.