Study on Architecture and Core Technology of Search Engine Google

WANG De-feng,LI Dong
DOI: https://doi.org/10.3969/j.issn.1672-0946.2006.01.023
2006-01-01
Abstract:It is hard to retrieve information on the Internet,but search engine make it easy.The data on the Internet is so large that the retrieve information technology on the normal database can not meet the requirement.To resolve the problem,some technologies,such as parallel processing,barrel sorting,compression and PageRank,are applied to Google.So it is a complicated system which have five parts,crawler,Repository,index system(including indexer,barrels,file index and so on),sorter,searcher.The rank system of Google considers both count-weight,type weight,prox-weight,and PageRank which weight the importance of a page.Applied Academic citation literature to the Web,a page can have a high PageRank if there are many pages that point to it,or if there are some pages that point to it and have a high PageRank.Applying the PageRank,the search technology is improved effectively.
What problem does this paper attempt to address?