Design and implementation of search engine system for digital library

Qi-dong LIN,Chuan-bo CHEN,Le-dan ZHENG,Yi-man ZHANG
DOI: https://doi.org/10.3969/j.issn.1001-3695.2009.08.044
2009-01-01
Abstract:This paper advanced the total system design for topic-specific search engine of digital library.It made use of a pretreatment system to select the seed station with high quality, thus giving Web topic defined data. Every topic crawler collected synchronistically Web resource recommended by crawlers with regulation of system controller,then classified text and identified topic in download resource, which was stored into Web topic resource database according to discipline classification.Others could search the topic resource through the index of whole information database.According to every specially characterist of digital library,this paper brang up the design for topic-specific crawler of multi-thread, and gave anovel URL pruning algorithm-EPR,for the design to realize topic-specific search engine prototype of digital library. Lucene-based open-source platform for the expansion of the system and the formation of the final system,the experiment results show that the research work of this article is effective,especially in EPR algorithm, which are really creative and valuable in real application environment.
What problem does this paper attempt to address?