Analysis of the User Log for a Large-scale Chinese Search Engine

王继民,陈翀,彭波
DOI: https://doi.org/10.3321/j.issn:1000-565X.2004.z1.001
2004-01-01
Abstract:In this paper, the user log of Tianwang, a large-scale distributed Chinese search engine system, is investigated. The results show that the time distribution of users accessing the system is not uniform, and there are three peaks in one day, just in the morning, afternoon and evening. In general, only 1-2 queries with different contents are carried out in one day by a user, and more than 2/3 of the users click some URLs in Web pages. Most queries contain only one word string with Chinese characters and the most frequent number of Chinese words is 2-4. The interval for the users viewing the result pages is about 2-3 min, and only a few of users visited the historic Web pages (cached pages). Moreover, it is shown that the numbers of different queries, users and URLs follow Heap's law.
What problem does this paper attempt to address?