OntoDSumm : Ontology based Tweet Summarization for Disaster Events

Piyush Kumar Garg,Roshni Chakraborty,Sourav Kumar Dandapat
DOI: https://doi.org/10.48550/arXiv.2201.06545
2022-11-19
Abstract:The huge popularity of social media platforms like Twitter attracts a large fraction of users to share real-time information and short situational messages during disasters. A summary of these tweets is required by the government organizations, agencies, and volunteers for efficient and quick disaster response. However, the huge influx of tweets makes it difficult to manually get a precise overview of ongoing events. To handle this challenge, several tweet summarization approaches have been proposed. In most of the existing literature, tweet summarization is broken into a two-step process where in the first step, it categorizes tweets, and in the second step, it chooses representative tweets from each category. There are both supervised as well as unsupervised approaches found in literature to solve the problem of first step. Supervised approaches requires huge amount of labelled data which incurs cost as well as time. On the other hand, unsupervised approaches could not clusters tweet properly due to the overlapping keywords, vocabulary size, lack of understanding of semantic meaning etc. While, for the second step of summarization, existing approaches applied different ranking methods where those ranking methods are very generic which fail to compute proper importance of a tweet respect to a disaster. Both the problems can be handled far better with proper domain knowledge. In this paper, we exploited already existing domain knowledge by the means of ontology in both the steps and proposed a novel disaster summarization method OntoDSumm. We evaluate this proposed method with 4 state-of-the-art methods using 10 disaster datasets. Evaluation results reveal that OntoDSumm outperforms existing methods by approximately 2-66% in terms of ROUGE-1 F1 score.
Social and Information Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: during disaster events, how to generate effective summaries from a large number of Tweets to help government organizations, institutions and volunteers quickly and accurately understand the disaster situation and make effective responses. Specifically, the paper mainly solves the following two problems: 1. **Challenges in Tweet classification**: - Existing Tweet classification methods are divided into two categories: supervised learning and unsupervised learning. Supervised learning requires a large amount of labeled data, which is both time - consuming and expensive; while unsupervised learning cannot classify Tweets well due to problems such as vocabulary overlap and insufficient semantic understanding. 2. **Challenges in representative Tweet selection**: - When selecting representative Tweets, existing methods usually use general - purpose ranking algorithms, which cannot accurately assess the importance of Tweets in specific disasters. In addition, the importance of various types of information in different disaster events also varies, and existing methods have not fully considered this point. To solve these problems, the author proposes an ontology - based Tweet summarization method - **OntoDSumm**. This method systematically solves the above problems through three stages: - **Phase - I**: Utilize existing knowledge in the disaster field (such as the Empathi ontology) to automatically classify Tweets into different categories. This stage is unsupervised, but the classification accuracy is improved by expanding the ontology vocabulary. - **Phase - II**: Propose a new scoring mechanism to automatically predict the importance of each category in a given disaster event. By calculating the "disaster similarity index", find historical disasters similar to the current disaster, and determine the importance of each category accordingly. - **Phase - III**: Propose an improved Disaster - specific Maximal Marginal Relevance (DMMR) algorithm to select the most representative Tweets from each category, ensuring that the summarized information is comprehensive and diverse. Through these three stages, OntoDSumm can generate Tweet summaries for disaster events more effectively. Compared with existing methods, it improves the ROUGE - 1 F1 score by approximately 2% - 66%. ### Formula presentation To better understand the working principle of OntoDSumm, the following are some key formulas involved in the paper: 1. **Semantic Similarity Score**: \[ \text{SemSIM}(T_j, C_i)=\frac{|Kw(T_j)\cap Kw(C_i)|}{|Kw(T_j)\cup Kw(C_i)|} \] where \(Kw(T_j)\) is the set of keywords of Tweet \(T_j\), and \(Kw(C_i)\) is the set of keywords of category \(C_i\). 2. **Maximal Semantic Similarity Score (MaxSIM)**: \[ \text{MaxSIM}(T_j)=\arg\max_{i\in K}(\text{SemSIM}(T_j, C_i)) \] 3. **Objective function for summary generation**: \[ T^*=\arg\max_{T_j\in C_i}(\alpha\cdot ICov(T_j, In(C_i))+\beta\cdot Div(T_j, S)) \] where \(ICov(T_j, In(C_i))\) represents the information coverage rate of Tweet \(T_j\) for category \(C_i\), \(Div(T_j, S)\) represents the diversity of Tweet \(T_j\) after being added to the summary \(S\), and \(\alpha\) and \(\beta\) are adjustable parameters. Through these formulas, OntoDSumm can improve the diversity and representativeness of the summary while ensuring information coverage.