SINBAD: Saliency-informed detection of breakage caused by ad blocking

Saiid El Hajj Chehade,Sandra Siby,Carmela Troncoso
2024-05-09
Abstract:Privacy-enhancing blocking tools based on filter-list rules tend to break legitimate functionality. Filter-list maintainers could benefit from automated breakage detection tools that allow them to proactively fix problematic rules before deploying them to millions of users. We introduce SINBAD, an automated breakage detector that improves the accuracy over the state of the art by 20%, and is the first to detect dynamic breakage and breakage caused by style-oriented filter rules. The success of SINBAD is rooted in three innovations: (1) the use of user-reported breakage issues in forums that enable the creation of a high-quality dataset for training in which only breakage that users perceive as an issue is included; (2) the use of 'web saliency' to automatically identify user-relevant regions of a website on which to prioritize automated interactions aimed at triggering breakage; and (3) the analysis of webpages via subtrees which enables fine-grained identification of problematic filter rules.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to automatically detect the breakage caused by ad - blocking tools (privacy - enhanced blocking tools based on filter list rules) so that maintainers can proactively fix the problematic rules before deployment, thereby avoiding negative impacts on millions of users**. Specifically, existing ad - blocking tools may accidentally break the legitimate functions of web pages when blocking web content or hiding elements. This not only affects the user experience but also may prevent developers from adopting more aggressive blocking strategies. Currently, such problems usually rely on user reports, and maintainers need to manually check and repair, which is both time - consuming and inefficient. Therefore, the paper proposes a new automatic breakage detection system - SINBAD, aiming to improve the detection accuracy and be able to identify breakages caused by dynamic breakages and style - related filter rules. ### The main innovation points of SINBAD include: 1. **Utilizing user - reported breakage problems**: By collecting user - reported breakage problems in forums, a high - quality data set is constructed for training, ensuring that only the breakage situations considered problematic by users are included. 2. **Using "web page saliency" to automatically identify user - focused areas**: By analyzing the salient areas in the web page, priority is given to interacting with these areas to trigger potential breakages. 3. **Analyzing web pages based on sub - trees**: Through fine - grained analysis of the web page DOM tree, the filter rules that cause breakages are accurately identified. ### The main contributions of the paper: - A high - quality breakage detection data set is constructed, and it is found that breakage reports usually take several days to several weeks to be resolved, highlighting the importance of automatic detection tools. - A method is proposed that can automatically identify important areas on the web page and give priority to interacting with these areas, thereby more effectively discovering user - related breakages. - The SINBAD system is introduced, whose accuracy is 20% higher than that of existing methods and can correctly classify dynamic breakages and breakages caused by CSS - based filter rules that could not be detected by previous methods. Through these innovations, SINBAD not only improves the accuracy of breakage detection but also provides a more efficient and proactive way for maintainers of ad - blocking tools to fix problems.