Deep Forest with LRRS Feature for Fine-grained Website Fingerprinting with Encrypted SSL/TLS

Ziqing Zhang,Cuicui Kang,Gang Xiong,Zhen Li
DOI: https://doi.org/10.1145/3357384.3357993
2019-11-03
Abstract:With the development of encryption protocol, such as Secure Sockets Layer (SSL) and Transport Layer Security (TLS), the traditional fingerprinting approaches based on packet content and special field are difficult to fingerprint the websites. Therefore, recent research imported machine learning algorithms to deal with this problem, and various features are extracted for the machine learning algorithms. However, previous approaches of fingerprinting encrypted websites are based on HTTP/1.1, which are not applicable to the widely used HTTP/2. In addition, most of the work only fingerprints the home page of each website, but in fact, users also visit other web pages of the website. To solve the feature compatibility problem, we propose to use the local request and response sequence (LRRS) as features. LRRS can represent the patterns of the encrypted Internet traffic not only based on HTTP/1.1 but also based on HTTP/2 using local packet sequences. In order to fingerprint different web pages in the same website, we import Deep Forest to extract fine-grained features. It utilizes a convolution structure to make full use of LRRS sequential features and multi-layer structure to enhance the ability of feature representation. The experimental results show the proposed algorithm has achieved the best overall performance on four datasets. Especially on the bidirectional encrypted traffic dataset with HTTP/2, the proposed approach achieved 55% higher of f1 score than the state-of-the-art method KFP with Random Forest.
What problem does this paper attempt to address?