Email security level classification of imbalanced data using artificial neural network: The real case in a world-leading enterprise

Jen-Wei Huang,Chia-Wen Chiang,Jia-Wei Chang
DOI: https://doi.org/10.1016/j.engappai.2018.07.010
IF: 8
2018-10-01
Engineering Applications of Artificial Intelligence
Abstract:Email is far more convenient than traditional mail in the delivery of messages. However, it is susceptible to information leakage in business. This problem can be alleviated by classifying emails into different security levels using text mining and machine learning technology. In this research, we developed a scheme in which a neural network is used to extract information from emails to enable its transformation into a multidimensional vector. Email text data is processed using bi-gram to train the document vector, which then undergoes under-sampling to deal with the problem of data imbalance. Finally, the security label of emails is classified using an artificial neural network. The proposed system was evaluated in an actual corporate setting. The results show that the proposed feature extraction approach is more effective than existing methods for the representations of email data in true positive rates and F1-scores.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary
What problem does this paper attempt to address?