A Cache-Based Distributed Terabyte Text Retrieval System in CADAL
Jun Cheng,Wen Gao,Bin Liu,Tie-jun Huang,Ling Zhang
DOI: https://doi.org/10.1007/3-540-36227-4_40
2002-01-01
Abstract:The China-America Digital Academic Library (CADAL) project aims to create a searchable collection of one million digital books freely available over the Internet. For this, a terabyte text retrieval system is required. This paper presents a cache-based, distributed terabyte text retrieval system, with fulltext retrieval, distributed computing and caching techniques. By distributing data by subject on different index servers, query searching is limited to specific index servers. With cache servers, response time is reduced. When queried, the system returns only highly relevant search results, to reduce the workload on the network. The prototype system shows the effectiveness of our design.