ComOM at VLSP 2023: A Dual-Stage Framework with BERTology and Unified Multi-Task Instruction Tuning Model for Vietnamese Comparative Opinion Mining

Dang Van Thin,Duong Ngoc Hao,Ngan Luu-Thuy Nguyen
DOI: https://doi.org/10.48550/arXiv.2312.09000
2023-12-14
Abstract:The ComOM shared task aims to extract comparative opinions from product reviews in Vietnamese language. There are two sub-tasks, including (1) Comparative Sentence Identification (CSI) and (2) Comparative Element Extraction (CEE). The first task is to identify whether the input is a comparative review, and the purpose of the second task is to extract the quintuplets mentioned in the comparative review. To address this task, our team proposes a two-stage system based on fine-tuning a BERTology model for the CSI task and unified multi-task instruction tuning for the CEE task. Besides, we apply the simple data augmentation technique to increase the size of the dataset for training our model in the second stage. Experimental results show that our approach outperforms the other competitors and has achieved the top score on the official private test.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the task of Comparative Opinion Mining (COM) in Vietnamese product reviews. Specifically, this task includes two subtasks: 1. **Comparative Sentence Identification (CSI)**: - The goal is to identify whether the input text is a comparative review. 2. **Comparative Element Extraction (CEE)**: - The goal is to extract quintuplets from comparative reviews, which include: subject, object, aspect, predicate, and comparison label. For example, given the review "iPhone 14 Pro Max 的电池续航时间比竞争对手更好", the extracted quintuplets are: - Subject: ["1&&iPhone", "2&&14", "3&&Pro", "4&&Max"] - Object: ["12&&its", "13&&competitors"] - Aspect: ["8&&battery", "9&&life"] - Predicate: ["7&&better"] - Comparison label: ["COM+"] To tackle this task, the authors propose a two-stage framework. The first stage uses a pre-trained PhoBERT model for comparative sentence identification, and the second stage uses a multi-task instruction tuning model for comparative element extraction. Additionally, simple data augmentation techniques are applied to increase the size of the training dataset. Experimental results show that this method achieves the best performance on the official private test set.