Histrace: building a search engine of historical events.

Lian'en Huang,Jonathan J. H. Zhu,Xiaoming Li
DOI: https://doi.org/10.1145/1367497.1367703
2008-01-01
Abstract:In this paper, we describe an experimental search engine on our Chinese web archive since 2001. The original data set contains nearly 3 billion Chinese web pages crawled from past 5 years. From the collection, 430 million "article-like" pages are selected and then partitioned into 68 million sets of similar pages. The titles and publication dates are determined for the pages. An index is built. When searching, the system returns related pages in a chronological order. This way, if a user is interested in news reports or commentaries for certain previously happened event, he/she will be able to find a quite rich set of highly related pages in a convenient way.
What problem does this paper attempt to address?