A New Data Storage and Service Model of China Web InfoMall 1

Hongfei Yan,Lianen Huang,Chong Chen,Zhengmao Xie
2004-01-01
Abstract:The Web consists of enormous pages which is easier vanishing than traditional media such as newspaper, journals. To preserve the web resources, we began the China Web archiving project, named Web InfoMall, from 2001. The paper describes the data storage and service model of Web InfoMall 2.0 to meet the goals of collecting the stuff broadly, storing them perennially, and locating requests efficiently. Currently the Web InfoMall holds 0.7 billion pages (10.6 terabyte) together with 5 terabyte of digital web resources other than web pages, having the ability of collecting more than 1 million pages per day, a storage capacity to hold more than 10 billion pages (about 150 terabyte), and a scheme to manage large numbers of pages.
What problem does this paper attempt to address?