TS-Finder: Privacy Enhanced Web Crawler Detection Model Using Temporal–spatial Access Behaviors

Jing Zhao,Rui Chen,Pengcheng Fan
DOI: https://doi.org/10.1007/s11227-024-06133-6
2024-01-01
Abstract:Web crawler detection is critical for preventing unauthorized extraction of valuable information from websites. Current methods rely on heuristics, leading to time-consuming processes and inability to detect novel crawlers. Privacy protection and communication burdens during training are overlooked, resulting in potential privacy leaks. To address these issues, we propose a federated deep learning crawler detection model that analyzes access behaviors while preserving privacy. First, individual clients locally host website data, while the central server aggregates information for detection model parameters, eliminating raw user data transmission or access. We then develop an innovative algorithm constructing access path trees from user logs, effectively extracting temporal and spatial behavior features. Additionally, we propose a novel time series model with fused additive attention, enabling effective web crawler detection while preserving privacy and reducing data transmission. Finally, comprehensive evaluations on public datasets demonstrate robust privacy protection and effective detection of emerging crawler types.
What problem does this paper attempt to address?