ReSurf: Reconstructing Web-Surfing Activity from Network Traffic.

Guowu Xie,Marios Iliofotou,Thomas Karagiannis,Michalis Faloutsos,Yaohui Jin
2013-01-01
Abstract:More and more applications and services move to the web and this has led to web traffic amounting to as much as 80% of all network traffic. At the same time, most traffic classification efforts stop once they correctly label a flow as web or HTTP. In this paper, we focus on understanding what happens “under the hood” of HTTP traffic. Our first contribution is ReSurf, a systematic approach to reconstruct web-surfing activity starting from raw network data with more than 91% recall and 95% precision over four real network traces. Our second contribution is an extensive analysis of web activity across these traces. By utilizing ReSurf, we study web-surfing behaviors in terms of user requests and transitions between websites (e.g. the click-through history of following hyperlinks). A surprising result is the prevalence of advertising and tracking services that are being accessed during web-surfing that are without the user's explicit consent. In our traces, we found that with 90% chance a user will access such a service after just three user requests (or “clicks”). We believe that our methodology and findings provide valuable insights into modern traffic that can allow: (a) network administrators to better manage and protect their networks, (b) traffic regulators to protect the rights of on-line users, and (c) researchers to better understand the evolution of the traffic from modern websites.
What problem does this paper attempt to address?