Fingerprinting Browsers in Encrypted Communications

Sandhya Aneja,Nagender Aneja
2024-10-28
Abstract:Browser fingerprinting is the identification of a browser through the network traffic captured during communication between the browser and server. This can be done using the HTTP protocol, browser extensions, and other methods. This paper discusses browser fingerprinting using the HTTPS over TLS 1.3 protocol. The study observed that different browsers use a different number of messages to communicate with the server, and the length of messages also varies. To conduct the study, a network was set up using a UTM hypervisor with one virtual machine as the server and another as a VM with a different browser. The communication was captured, and it was found that there was a 30\%-35\% dissimilarity in the behavior of different browsers.
Cryptography and Security,Networking and Internet Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to fingerprint browsers through encrypted communication (using the HTTPS protocol). Specifically, the author studied the behavioral differences of different browsers in network communication under the TLS 1.3 protocol and proposed a method based on message length and cipher suite list to distinguish different browsers. ### Research Background and Problems 1. **Differences between HTTP and HTTPS**: - In HTTP communication, since the data is transmitted in plain text, the server can easily identify the client browser through the `user - agent` field. - However, in HTTPS communication, the data is encrypted, and traditional fingerprinting methods are no longer applicable, so new methods are needed to identify browsers. 2. **Importance of Browser Fingerprinting**: - Browser fingerprinting not only helps the server identify the client device, but also can help detect malicious users, because the fingerprints of malicious users are usually different from those of legitimate users. 3. **Limitations of Existing Research**: - Most of the existing research relies on decrypting HTTPS fields or using complex combination sequence tests, and these methods are computationally costly and less efficient. ### Main Contributions of the Paper - **Proposed a New Method**: By analyzing the length of TLS handshakes and data messages and combining the cipher suite list, the author proposed a browser fingerprinting method without decrypting HTTPS fields. - **Experimental Verification**: By setting up a virtual network environment, capturing the communication data between different browsers and servers, and using the interpolation method and cosine similarity to calculate the similarities and differences between browsers. - **Result Presentation**: The experimental results show that there are significant differences in the communication behaviors of different browsers on the same page, and different browsers can be effectively distinguished. ### Formula Representation The paper uses cosine similarity to measure the similarity between browsers: \[ \text{cosine similarity}(\vec{A}, \vec{B})=\frac{\vec{A} \cdot \vec{B}}{\|\vec{A}\| \|\vec{B}\|} \] Cosine dissimilarity is defined as: \[ \text{cosine dissimilarity}(\vec{A}, \vec{B}) = 1-\frac{\vec{A} \cdot \vec{B}}{\|\vec{A}\| \|\vec{B}\|} \] Here, $\vec{A}$ and $\vec{B}$ respectively represent the communication message length vectors of two browsers on a certain webpage. ### Conclusion This research proves that by analyzing the message length and cipher suite list under the TLS 1.3 protocol, different browsers can be effectively fingerprinted. Future work will further expand to more browsers and consider combining other protocols (such as the TCP protocol) to improve the identification accuracy.