An Effective Feature Representation of Web Log Data by Leveraging Byte Pair Encoding and TF-IDF

Junlang Zhan,Xuan Liao,Yukun Bao,Lu Gan,Zhiwen Tan,Mengxue Zhang,Ruan He,Jialiang Lu
DOI: https://doi.org/10.1145/3321408.3321568
2019-01-01
Abstract:Web log data analysis is important in intrusion detection. Various machine learning techniques have been applied. However, compared to abundant researches on machine learning, ways to extract features from log data are still under research. In this paper, we present an effective feature extraction approach by leveraging Byte Pair Encoding (BPE) and Term Frequency-Inverse Document Frequency (TF-IDF). We have applied this approach on various downstream machine learning algorithms and proved its usefulness.
What problem does this paper attempt to address?