Towards Multi-granularity Multi-facet E-Book Retrieval in China-US Million Book Digital Library

Yonghong Tian,Tiejun Huang,Wen Gao
2007-01-01
Abstract:There are more than one million digitalized books (i.e. e-books) so far in China-US Million Book Digital Library Project (MBP for short). It is thus important to design effective and powerful tools that enable users to easily search the required information and appropriately access knowledge in the digital library. To- wards this end, currently most digital libraries simply use the traditional metadata-based or fulltext-based retrieval technologies on the e-book collection. However, there are at least two limita- tions of such e-book retrieval systems. (1) The granularity of re- trieval results is either too big or too small, and consequently the middle granularities such as chapters and paragraphs are ignored in the traditional e-book retrieval systems. (2) The mass of re- trieval results are usually ill-organized so that users often need to pay more efforts to obtain the required items. Therefore, with the many complex data in MBP, new search models and algorithms need to be developed that can take advantage of the particularities of e-books, access them appropriately, and provide results effi- ciently. To tackle this challenge, this paper introduces our multi- granularity and multi-aspect e-book retrieval approach for MBP. Firstly, a Multi-granularity Multi-facet Knowledge Network (MMKN) model is proposed to represent content from different granularities (e.g., books, chapters, pages, paragraphs and words) and different facets (e.g., time, space, etc.) to support retrieval of relevant items from an e-book collection. Then we implement a novel e-book retrieval system, called IQuery, to extract facet- related information from e-books at several granularities and then support multi-granularity e-book retrieval with more retrievable units and multi-facet navigation. Experiments were conducted to validate the efficiency and effectiveness of the proposed MMKN model, as well as the performance of IQuery. The results are en- couraging, demonstrating that IQuery can provide powerful capa- bilities for e-book retrieval in MBP.
What problem does this paper attempt to address?