Task-driven Augmented Data Evaluation

Olga Golovneva,Wei Pan,Khadige Abboud,Charith Peris,Lizhen Tan,Hao Yu
DOI: https://doi.org/10.18653/v1/2022.gem-1.2
2022-01-01
Abstract:The main focus of data augmentation research has been on the enhancement of generation models, leaving the examination and improvements of synthetic data evaluation methods less explored.In our work, we explore a number of sentence similarity measures in the context of data generation filtering, and evaluate their impact on the performance of the targeted Natural Language Understanding problem for the example of intent classification and named entity recognition tasks.Our experiments on ATIS dataset show that the right choice of filtering technique can bring up to 33% in sentence accuracy improvement for targeted underrepresented intents.
What problem does this paper attempt to address?