Efficiently and Effectively: A Two-stage Approach to Balance Plaintext and Encrypted Text for Traffic Classification

Wei Peng
2024-08-12
Abstract:Encrypted traffic classification is the task of identifying the application or service associated with encrypted network traffic. One effective approach for this task is to use deep learning methods to encode the raw traffic bytes directly and automatically extract features for classification (byte-based models). However, current byte-based models input raw traffic bytes, whether plaintext or encrypted text, for automated feature extraction, neglecting the distinct impacts of plaintext and encrypted text on downstream tasks. Additionally, these models primarily focus on improving classification accuracy, with little emphasis on the efficiency of models. In this paper, for the first time, we analyze the impact of plaintext and encrypted text on the model's effectiveness and efficiency. Based on our observations and findings, we propose a two-phase approach to balance the trade-off between plaintext and encrypted text in traffic classification. Specifically, Stage one is to Determine whether the Plain text is enough to be accurately Classified (DPC) using the proposed DPC Selector. This stage quickly identifies samples that can be classified using plaintext, leveraging explicit byte features in plaintext to enhance model's efficiency. Stage two aims to adaptively make a classification with the result from stage one. This stage incorporates encrypted text information for samples that cannot be classified using plaintext alone, ensuring the model's effectiveness on traffic classification tasks. Experiments on two datasets demonstrate that our proposed model achieves state-of-the-art results in both effectiveness and efficiency.
Cryptography and Security,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the encrypted traffic classification task, how to balance the use of plaintext and encrypted text to improve the classification accuracy and efficiency simultaneously. Specifically: 1. **Deficiencies of existing methods**: - Current deep - learning models based on bytes directly take the original traffic bytes (whether plaintext or encrypted text) as input for feature extraction, ignoring the different impacts of plaintext and encrypted text on downstream tasks. - These models mainly focus on improving classification accuracy and less consider the efficiency of the model. 2. **The core of the problem**: - Plaintext information can significantly improve the classification effect in some cases and is processed more quickly. - Although encrypted text information increases the complexity and time cost of classification, it is necessary for accurate classification in some cases. 3. **Research objectives**: - Analyze the impact of plaintext and encrypted text on model performance and time cost. - Propose a two - stage method (EETP), by first determining whether the plaintext is sufficient for accurate classification (DPC), thereby optimizing the accuracy and efficiency of traffic classification. ### Specific steps of the two - stage method - **First stage (DPC Selector)**: - Objective: Determine whether the plaintext is sufficient for accurate classification. - Method: Use the DPC Selector to quickly identify samples that can be classified by plaintext, and use the explicit byte features in the plaintext to enhance the model efficiency. - **Second stage (adaptive classification)**: - Objective: Adaptively classify according to the results of the first stage. - Method: If it is determined in the first stage that the plaintext is sufficient, only use the plaintext for classification; otherwise, combine the encrypted text information for classification to ensure the effectiveness of the classification task. ### Experimental results Experiments show that this method has achieved state - of - the - art results on both data sets, improving both the classification accuracy and significantly enhancing the processing efficiency. ### Summary This paper solves the problem of unbalanced use of plaintext and encrypted text in existing encrypted traffic classification methods by proposing the EETP framework, providing a more efficient and effective solution.