Improving aspect term extraction via span-level tag data augmentation

Bin Liu,Tao Lin,Ming Li
DOI: https://doi.org/10.1007/s10489-022-03558-5
IF: 5.3
2022-05-26
Applied Intelligence
Abstract:Aspect term extraction (ATE), a fundamental subtask in aspect-based sentiment analysis, aims to extract explicit aspect term from reviewers' expressed opinions. However, the distribution of samples containing different numbers of aspect terms is long-tailed. Due to the scarcity of long-tailed samples and the existence of multiple variable-length aspect terms inside each sample, most ATE models converge to an inferior state because they have difficulty capturing features. Popular data augmentation techniques used for addressing this problem, such as synonym replacement and back translation, cannot produce substantial improvements when using pretrained language models. In this paper, we present a novel span-level aspect term extraction (SATE) framework, which includes three main components: a simple and effective tag data augmentation (TaDA) module, an original pretrained language model, and an optimized heuristic decoding algorithm module. TaDA is based on a span-level tagging scheme and generates new pseudo training samples for long-tailed multiaspect samples. The pretrained model, deemed a general feature extractor, yields contextual token representations. Then, the decoding algorithm adopts an adjustment factor to extract the variable-length aspect terms simultaneously. All the techniques are seamlessly integrated into the stacked SATE framework to pinpoint the aspect terms. Empirical experiments on SemEval benchmark datasets of multiple domains achieve F 1 -scores of 86.92% and 86.28% for laptops and restaurants, respectively, demonstrating the superiority of our model compared with the well-known baseline models.
computer science, artificial intelligence
What problem does this paper attempt to address?