Analyzing Scale of Web Logs and Mining Users' Interests
GUO Yan,BAI Shuo,YANG Zhi-Feng,ZHANG Kai
DOI: https://doi.org/10.3321/j.issn:0254-4164.2005.09.009
2005-01-01
Chinese Journal of Computers
Abstract:The work in this paper focuses on Web-log mining. Are there really some characteristics of user access existing in Web logs?And if yes, can these characteristics be described clearly?And how to use the characteristics?To try to answer these questions, this paper analyzes real Web logs. The work in this paper include: As scale of Web logs increasing, the changes of users’ count, Web documents’ count and the average of Web documents’ count accessed by one user are analyzed. A conclusion is drawn that user’s accessing on Web is more driven by stable interests than casual ones, and user’s stable interests must be contained in Web logs. To make use of user’s stable interests in Web logs, this paper provides a model and a search engine, SISI (Similar Interests, Similar access on Internet), which tries to mine related pages by making use of latent human judgment in related pages contained in Web logs. The performance of SISI is consistent with the analysis result of model: The accuracy and time cost of retrieval mainly rely on users’ count, and count of result records mainly rely on Web documents’ count.