Latent Text Mining for Cybercrime Forensics

Raymond Y. K. Lau,Yunqing Xia
DOI: https://doi.org/10.7763/ijfcc.2013.v2.187
2013-01-01
International Journal of Future Computer and Communication
Abstract:Recent research reveals that the number of cyber-attacks has been doubled in the past three years. This is a devastating growth of the number of cyber-attacks, and it reveals a serious business problem around the world. Existing intrusion detection systems (IDSs), intrusion prevention systems (IPSs), and anti-malware systems mainly rely on low-level network traffic features or program code signatures to detect cyber-attacks. However, since hackers can constantly change their attack tactics by, it is extremely difficult for existing security solutions to detect cyber-attacks. There are increasing more evidences showing that cybercriminals tend to exchange cybercrime knowledge and transact via online social media. Accordingly, it presents unprecedented opportunities for security intelligence experts to tap into online social media to extract the vital security intelligence for cyber-attack forensics. The main contributions of this paper are the design, development, and evaluation of a Latent Dirichlet Allocation (LDA)-based latent text mining model for cyber-attack forensics. Our preliminary evaluation of the proposed latent text mining model based on a real-world data set crawled from Twitter and Blog sites shows that it significantly outperforms the probabilistic latent semantic indexing (pLSI) method in terms of extracting more relevant and richer concepts describing real-world cyber-attack incidents.
What problem does this paper attempt to address?