Unbundle-Rewrite-Rebundle: Runtime Detection and Rewriting of Privacy-Harming Code in JavaScript Bundles

Mir Masood Ali,Peter Snyder,Chris Kanich,Hamed Haddadi
DOI: https://doi.org/10.1145/3658644.3690262
2024-09-04
Abstract:This work presents Unbundle-Rewrite-Rebundle (URR), a system for detecting privacy-harming portions of bundled JavaScript code and rewriting that code at runtime to remove the privacy-harming behavior without breaking the surrounding code or overall application. URR is a novel solution to the problem of JavaScript bundles, where websites pre-compile multiple code units into a single file, making it impossible for content filters and ad-blockers to differentiate between desired and unwanted resources. Where traditional content filtering tools rely on URLs, URR analyzes the code at the AST level, and replaces harmful AST sub-trees with privacy-and-functionality maintaining alternatives. We present an open-sourced implementation of URR as a Firefox extension and evaluate it against JavaScript bundles generated by the most popular bundling system (Webpack) deployed on the Tranco 10k. We evaluate URR by precision (1.00), recall (0.95), and speed (0.43s per script) when detecting and rewriting three representative privacy-harming libraries often included in JavaScript bundles, and find URR to be an effective approach to a large-and-growing blind spot unaddressed by current privacy tools.
Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenge of JavaScript bundles in modern web development to privacy - protection tools. Specifically, current URL - based content filtering and ad - blocking tools are unable to distinguish between harmful and harmless code in bundles, causing these tools to become ineffective when faced with JavaScript bundles in the form of a single file. These problems include: 1. **Inability to partially block code**: When a website packages multiple JavaScript files into one file (for example, `/script/bundle.js`), traditional URL - based filtering tools cannot block only part of the code (such as `/script/tracker.js`), but can only choose to block the entire file or not block anything. 2. **Breaking website functionality**: If the JavaScript bundle is completely blocked, it will break the normal functionality of the website, because many websites rely on the code in these bundles to implement their core functions. 3. **Difficulty in identifying and replacing harmful code**: Since JavaScript bundling tools (such as Webpack) perform various transformations on the code during the build process (such as compression, deletion of unused code, etc.), it is difficult for traditional regular - expression - based matching methods to accurately identify and replace privacy - invasive code therein. To solve the above problems, the paper proposes the Unbundle - Rewrite - Rebundle (URR) system, which can detect and rewrite privacy - invasive code in JavaScript bundles at runtime while maintaining the normal functionality of other code. The specific workflow of URR is as follows: 1. **Unbundle**: Generate an Abstract Syntax Tree (AST) of a given script, analyze its structure to determine whether it is a bundle containing one or more modules, and extract the sub - trees of each module. 2. **Process modules**: Convert each module into an implementation - independent representation, that is, remove variable names, function names, and object properties, create a bottom - up hash value of the AST structure, and use this representation for comparison. 3. **Compare modules**: Compare each processed module with the previously generated representation of privacy - invasive modules. If a match is found, mark the corresponding module for replacement. 4. **Rebundle**: Replace each marked privacy - invasive module with the corresponding harmless substitute module, ensuring that the replacement does not break other functions of the website. Then reassemble the bundle and continue execution. In this way, URR can effectively detect and remove privacy - invasive code in JavaScript bundles without affecting the normal functionality of the website.