Engineering Of Web Infomall: The Chinese Web Archive

Le Huang,Hf Yan,Xm Li
2004-01-01
Abstract:We present the design and architecture of the Chinese web archiving project-Web InfoMall. Web InfoMall is one of the most large-scale deployments of web archiving project in China. Web InfoMall has 0.7 billion pages (10.6 terabyte) and the ability of collecting 1 million pages per day. It has the potential of holding more than 10 billion pages (about 150 terabyte) - Web InfoMall includes a strict storage structure to facilitate storing perennially and a scheme to locate requests efficiently. We outline the Web InfoMall architecture and describe some challenges we face in the deployment.
What problem does this paper attempt to address?