Unsupervised Aspect Term Extraction by Integrating Sentence-level Curriculum Learning with Token-level Self-paced Learning

Jihong Ouyang,Zhiyao Yang,Chang Xuan,Bing Wang,Yiyuan Wang,Ximing Li
DOI: https://doi.org/10.1145/3583780.3615103
2023-01-01
Abstract:Aspect Term Extraction (ATE), a key sub-task of aspect-based sentiment analysis, aims to extract aspect terms from review sentences on which users express opinions. Existing studies mainly treat ATE as a sequence labeling problem, and the aspect terms of training data are annotated at the token level, such as "BIO" tagging. However, such fine-grained annotations are often too costly to collect in many real applications, giving rise to the urgent demand for the challenging Unsupervised ATE (UATE). This paper suggests a novel UATE method by integrating sentence-level curriculum learning with token-level self-paced learning, namely UATE-SCTS. We design a set of hand-crafted rules to generate pseudo-labels but with noise. To combat this issue, our key idea is to train the ATE model from easier samples to harder samples to achieve a more robust model with more precise predictions at the early training epochs. This enables better refining of the noisy pseudo-labels. At the sentence level, we propose a frequency-induced pseudo-label cardinality to measure the learning difficulty of the review sentence and train the model in a curriculum-learning manner. At the token level, we formulate a self-paced learning objective that can adaptively select easier samples for training. We compare UATE-SCTS with baseline methods on benchmark collections of reviews from different domains. Empirical results demonstrate that UATE-SCTS can outperform existing UATE baselines.
What problem does this paper attempt to address?