WRT: Constructing Users' Web Request Trees from HTTP Header Logs.

Shengchao Liu,Jilong Wang,Hui Wang,Haibo Wang,Ya Liu
DOI: https://doi.org/10.1109/icc.2019.8761532
2019-01-01
Abstract:As web traffic has already dominated Internet, massive web logs are being generated ceaselessly. It is essential and meaningful for operators to mine valuable information and knowledge from the log data. However, the state-less feature of HTTP and increasing dynamics and complexities of web services bring a challenge to web mining in web logs. To solve the problem, in this paper, we introduce Web Request Tree (WRT) to reconstruct users' web request behaviors from web logs with HTTP header information, which can be applied in many fields. Compared to previous related work, our method pays special attentions to modern technologies to handle cases caused by these technologies such as PJAX. We evaluate the feasibility of our method with a measurement study on referrer policies of Alexa top websites and results show that our method can achieve a high accuracy for most websites. We also conduct experiments on a real-world dataset collected from official websites of a top university in China. We find that a lot of abnormal requests exist in log data by analyzing WRT and WRT is rich in information that is valuable for operators to analyze user behaviors, detect anomalies, optimize performance and so on.
What problem does this paper attempt to address?