Abstract:We tackle Aspect Term Extraction (ATE), a task that automatically recognizes aspect terms conditioned on the understanding of word-level semantics. Due to the capacity of enriching linguistic phenomena for learning, data augmentation contributes to the establishment of robust ATE models. In this paper, we propose to leverage back translation to augment the training data for ATE. It is grounded on the potential advantages that the backtranslated instances generally appear as paraphrases, providing diverse pragmatic modes for learning when semantics remains unchanged. This helps to enhance ATE models in recognizing aspect terms when varied contexts and morphologically-different words occur during test. In our experiments, we apply an off-theshelf Neural Machine Translation (NMT) model for back translation, using French, Chinese and German as interlanguages, respectively. Besides, word alignment is conducted to designate aspect terms in the back-translated cases. Experimental results on SemEval benchmarks show that retraining with the augmented data produces substantial improvements, reaching up to 3.46% at best. In addition, the experiments suggest that 1) family languages are more beneficial than non-family for the aforementioned data augmentation, and 2) selective sampling produces positive effects in the low-resource settings. It is noteworthy that back translation has been explored for data augmentation in other fields, with the aim to enhance neural language modeling. Nevertheless, it hasn't yet been systematically studied towards the ATE task. Although a vest-pocket method is provided in this paper, the comprehensive analysis is conducted, including that on interlanguage selection, low-resource application, as well as compatibility with both conventional and pretrained neural models, besides that in the common comparison and ablation experiments. All the models and codes in the experiments will be made publicly available to support reproducible research.

Syntax-Aware Data Augmentation for Neural Machine Translation

Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation

A Scenario-Generic Neural Machine Translation Data Augmentation Method

AdvAug: Robust Adversarial Augmentation for Neural Machine Translation

Data Augmentation via Back-translation for Aspect Term Extraction.

Data Augmentation under Scarce Condition for Neural Machine Translation

Random Concatenation: A Simple Data Augmentation Method for Neural Machine Translation

A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages

AdMix: A Mixed Sample Data Augmentation Method for Neural Machine Translation

Soft Contextual Data Augmentation for Neural Machine Translation

Data Augmentation for Low‐resource Languages NMT Guided by Constrained Sampling

Importance-Aware Data Augmentation for Document-Level Neural Machine Translation

Sentence Augmentation for Language Translation Using GPT-2

Improving Data Augmentation for Low-Resource NMT Guided by POS-Tagging and Paraphrase Embedding

Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation

To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP

Deterministic Reversible Data Augmentation for Neural Machine Translation

Understanding Data Augmentation in Neural Machine Translation: Two Perspectives Towards Generalization.

Improving Chinese-Vietnamese Neural Machine Translation with Linguistic Differences

TreeSwap: Data Augmentation for Machine Translation via Dependency Subtree Swapping