Comparative Opinion Summarization via Collaborative Decoding

Hayate Iso,Xiaolan Wang,Stefanos Angelidis,Yoshihiko Suhara
DOI: https://doi.org/10.48550/arXiv.2110.07520
2022-04-16
Abstract:Opinion summarization focuses on generating summaries that reflect popular subjective information expressed in multiple online reviews. While generated summaries offer general and concise information about a particular hotel or product, the information may be insufficient to help the user compare multiple different choices. Thus, the user may still struggle with the question "Which one should I pick?" In this paper, we propose the comparative opinion summarization task, which aims at generating two contrastive summaries and one common summary from two different candidate sets of reviews. We develop a comparative summarization framework CoCoSum, which consists of two base summarization models that jointly generate contrastive and common summaries. Experimental results on a newly created benchmark CoCoTrip show that CoCoSum can produce higher-quality contrastive and common summaries than state-of-the-art opinion summarization models. The dataset and code are available at <a class="link-external link-https" href="https://github.com/megagonlabs/cocosum" rel="external noopener nofollow">this https URL</a>
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Existing opinion summarization techniques can only generate a single summary that reflects the general opinions of a single entity (such as a hotel or a product), and cannot help users compare the differences and commonalities between multiple different options. Therefore, when faced with multiple options, users still find it difficult to decide "Which one should I choose?". To solve this problem, the author proposes a new task - **comparative opinion summarization**. The goal of this task is to generate two contrastive summaries and one common summary from two different sets of entity reviews. Specifically: - **Contrastive summaries**: Contain subjective information unique to each entity. - **Common summary**: Describes the subjective information common to both entities. In this way, users can more easily understand the differences and commonalities between multiple entities, and thus make more informed choices. ### Main contributions 1. **Proposing a new task**: Proposed the task of comparative opinion summarization, which requires generating two contrastive summaries and one common summary from the review sets of two entities. 2. **Developing the COCOSUM framework**: Designed the COCOSUM framework, which includes two basic summary models, and implemented a new co - decoding algorithm to generate more discriminative and entity - pair - specific summaries. 3. **Creating a benchmark dataset**: Created and released the COCOTRIP dataset, which contains manually - written reference summaries for 48 pairs of entities, used to evaluate the effectiveness of comparative opinion summarization. ### Technical details - **Problem definition**: - For the target entity A and the contrast entity B, the contrastive opinion refers to the subjective information that only appears in RA but not in RB, denoted as \( y_{A \backslash B}^{\text{cont}} \). - The common opinion refers to the subjective information that appears in both RA and RB, denoted as \( y_{A \cap B}^{\text{comm}} \). - **Co - decoding**: - For contrastive summary generation, highlight unique opinions by adjusting the word probability distributions of the target entity and the contrast entity: \[ \hat{p}_{A \backslash B}^{\text{cont}}(y_t) \propto p_A^{\text{cont}}(y_t) \left(\frac{p_A^{\text{cont}}(y_t)}{p_B^{\text{cont}}(y_t)}\right)^\delta \] - For common summary generation, emphasize common opinions by aggregating the word probability distributions of the contrastive summary models: \[ \hat{p}_{A \cap B}^{\text{comm}}(y_t) \propto p_{A \cap B}^{\text{comm}}(y_t) + \gamma \sum_{E \in \{A, B\}} p_E^{\text{cont}}(y_t) \] Through these methods, COCOSUM can significantly improve the uniqueness of the summaries while generating high - quality summaries.