Using Graph Representation in Host-Based Intrusion Detection

Zhichao Hu,Likun Liu,Haining Yu,Xiangzhan Yu
DOI: https://doi.org/10.1155/2021/6291276
IF: 1.968
2021-01-01
Security and Communication Networks
Abstract:Cybersecurity has become an important part of our daily lives. As an important part, there are many researches on intrusion detection based on host system call in recent years. Compared to sentences, a sequence of system calls has unique characteristics. It contains implicit pattern relationships that are less sensitive to the order of occurrence and that have less impact on the classification results when the frequency of system calls varies slightly. There are also various properties such as resource consumption, execution time, predefined rules, and empirical weights of system calls. Commonly used word embedding methods, such as Bow, TI-IDF, N-Gram, and Word2Vec, do not fully exploit such relationships in sequences as well as conveniently support attribute expansion. To solve these problems, we introduce Graph Representation based Intrusion Detection (GRID), an intrusion detection framework based on graph representation learning. It captures the potential relationships between system calls to learn better features, and it is applicable to a wide range of back-end classifiers. GRID utilizes a new sequence embedding method Graph Random State Embedding (GRSE) that uses graph structures to model a finite number of sequence items and represent the structural association relationships between them. A more efficient representation of sequence embeddings is generated by random walks, word embeddings, and graph pooling. Moreover, it can be easily extended to sequences with attributes. Our experimental results on the AFDA-LD dataset show that GRID has an average improvement of 2% using the GRSE embedding method comparing to others.
What problem does this paper attempt to address?