Extractive Timeline Summarization based on Unsupervised Techniques

P. Torino,Stefano Munna
Abstract:With the increasing importance of internet during time, a huge amount of news article about several arguments are published in several websites: users that are interested in a particular event could use an automatic system that presents to him the most important aspects of the event and/or its evolution across time. These two problems are addressed by: • Multi-document Summarization, with the aim of condensing news from several articles in a complete and non-redundant summary • Temporal Summarization, with the aim of keeping the user informed about the developing of an event,providing non-redundand and significant updates as soon as new articles arrives (or at prefixed time intervals) • Timeline Summarization, with the aim of providing the end-user with the history of a concluded event along a timeline that highlights the most significant dates for the event and the most significant happenings for each date. We analyzed the State-of-the-Art in text, temporal and timeline summarization presenting several algorithms that have been proposed in the past years: the aim of this thesis is to create a framework that performs timeline summarizzation following a pipeline (date selection,date summarization,timeline visualization), extracting, from an input set of document about a specific topic, the most important dates and then applying text summarization to select the most important aspects (sentences) for each date. The aim is to explore the performance of several text summarization algorithms used in the date summarizzation step (in which a summary is extracted, for each date, from the input sentences associated to that date), comparing at the end the results obtained by testing the several algorithms on CRISIS and T17 datasets and evaluating the output timelines using some ROUGE variants. To ease the exploration of the generated summaries, we developed a Web application tailored to date summary visualization. It provides a visual extract of the summary content together with a representative image crawled by Google. It constitutes the last block of the pipeline,aimed at providing end-user with a more friendly tool to explore the timeline generated by the system.
Computer Science
What problem does this paper attempt to address?