Abstract:We tackle Aspect Term Extraction (ATE), a task that automatically recognizes aspect terms conditioned on the understanding of word-level semantics. Due to the capacity of enriching linguistic phenomena for learning, data augmentation contributes to the establishment of robust ATE models. In this paper, we propose to leverage back translation to augment the training data for ATE. It is grounded on the potential advantages that the backtranslated instances generally appear as paraphrases, providing diverse pragmatic modes for learning when semantics remains unchanged. This helps to enhance ATE models in recognizing aspect terms when varied contexts and morphologically-different words occur during test. In our experiments, we apply an off-theshelf Neural Machine Translation (NMT) model for back translation, using French, Chinese and German as interlanguages, respectively. Besides, word alignment is conducted to designate aspect terms in the back-translated cases. Experimental results on SemEval benchmarks show that retraining with the augmented data produces substantial improvements, reaching up to 3.46% at best. In addition, the experiments suggest that 1) family languages are more beneficial than non-family for the aforementioned data augmentation, and 2) selective sampling produces positive effects in the low-resource settings. It is noteworthy that back translation has been explored for data augmentation in other fields, with the aim to enhance neural language modeling. Nevertheless, it hasn't yet been systematically studied towards the ATE task. Although a vest-pocket method is provided in this paper, the comprehensive analysis is conducted, including that on interlanguage selection, low-resource application, as well as compatibility with both conventional and pretrained neural models, besides that in the common comparison and ablation experiments. All the models and codes in the experiments will be made publicly available to support reproducible research.

Multi-task Learning-based Data Augmentation for Minority Languages to Chinese Neural Machine Translation

Data Augmentation under Scarce Condition for Neural Machine Translation

A Scenario-Generic Neural Machine Translation Data Augmentation Method

Random Concatenation: A Simple Data Augmentation Method for Neural Machine Translation

Multi-Task Learning for Multiple Language Translation.

Improving Chinese-Vietnamese Neural Machine Translation with Linguistic Differences

Scheduled Multi-task Learning for Neural Chat Translation

Research on Mongolian-Chinese Translation Model Based on Transformer with Soft Context Data Augmentation Technique

Syntax-Aware Data Augmentation for Neural Machine Translation

Multimodal Neural Machine Translation for Mongolian to Chinese

Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation

Data Augmentation for Code-Switch Language Modeling by Fusing Multiple Text Generation Methods.

Soft Contextual Data Augmentation for Neural Machine Translation

A Multi-task Multi-stage Transitional Training Framework for Neural Chat Translation

Improving Mongolian-Chinese Neural Machine Translation with Morphological Noise.

Data Augmentation via Back-translation for Aspect Term Extraction.

AdvAug: Robust Adversarial Augmentation for Neural Machine Translation

End-to-End Tibetan-Chinese Speech Translation Based on Multi-task and Multi-level Pre-training

Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation

Improving Many-to-Many Neural Machine Translation Via Selective and Aligned Online Data Augmentation