An N-Gram Prediction Model Based on Web-Log Mining

苏中,马少平,杨强,张宏江
2002-01-01
Journal of Software
Abstract:As an increasing number of users access information on the Web, there is a great opportunity to learn about the users?probable actions in the future from the server logs. In this paper, an n-gram based model is presented to utilize path profiles of users from very large data sets to predict the users?future requests. Since this is a prediction system, the recall cannot be measured in a traditional sense. Therefore, the notion of applicability is presented to give a measure of the ability to predict the next document. The new model is based on a simple extension of existing point-based models for such predictions, but the results show that by sacrificing the applicability somewhat one can gain a great deal in prediction precision. The result can potentially be applied to a wide range of applications on the Web, including pre-sending, pre-fetching, enhancement of recommendation systems as well as Web caching policies. The tests are based on three realistic Web logs. The new algorithm shows a marked improvement in precision and applicability over previous approaches.
What problem does this paper attempt to address?