Abstract:In many cases, rather than a keyword search, people intend to see what is going on through the Internet. Then the integrated comprehensive information on news topics is necessary, which we called news issues, including the background, history, current progress, different opinions and discussions, etc. Traditionally, news issues are manually generated by website editors. It is quite a time-consuming hard work, and hence real-time update is difficult to perform. In this paper, a three-step automatic online algorithm for news issue construction is proposed. The first step is a topic detection process, in which newly appearing stories are clustered into new topic candidates. The second step is a topic tracking process, where those candidates are compared with previous topics, either merged into old ones or generating a new one. In the final step, news issues are constructed by the combination of related topics and updated by the insertion of new topics. An automatic online news issue construction process under practical Web circumstances is simulated to perform news issue construction experiments. F-measure of the best results is either above (topic detection) or close to (topic detection and tracking) 90%. Four news issue construction results are successfully generated in different time granularities: one meets the needs like "what's new", and the other three will answer questions like "what's hot" or "what's going on". Through the proposed algorithm, news issues can be effectively and automatically constructed with real-time update, and lots of human efforts will be released from tedious manual work.

HisTrace: A system for mining on news-related articles instead of web pages

Socialanalysis: A Real-Time Query And Mining System From Social Media Data Streams

NewsMiner: Multifaceted News Analysis for Event Search

Searching for Historical Events on a Large-Scale Web Archive.

Practice of Web Mining Based on Nature Language Understanding

A Flexible Topic-driven Framework for News Exploration

Automatic Online News Issue Construction in Web Environment

Dynamic mining for web navigation patterns based on markov model

Learning to Extract Web News Title in Template Independent Way

Automatic Elements Extraction of Chinese Web News Using Prior Information of Content and Structure

Mining Event Temporal Boundaries from News Corpora Through Evolution Phase Discovery

Design and Analysis of a Report Tracing System Based on Webinfomall

Can We Learn a Template-Independent Wrapper for News Article Extraction from a Single Training Site?

An Approach for Discovering Multilingual News Events and Term Association from the Web

Histrace: building a search engine of historical events.

An artificial intelligence based news feature mining system based on the Internet of Things and multi-sensor fusion

NLP based intelligent news search engine using information extraction from e-newspapers

Reuters tracer: Toward automated news production using large scale social media data

Mining and Analyzing the Future Works in Scientific Articles

News article extraction with template-independent wrapper.

Discovering Authoritative News Sources And Top News Stories