Anomaly detection of unstructured big data via semantic analysis and dynamic knowledge graph construction
Qingliang Zhao,Jiaoyue Liu,Nichole Sullivan,Kuochu Chang,John Spina,Erik Blasch,Genshe Chen,Kuochu C. Chang
DOI: https://doi.org/10.1117/12.2589047
2021-04-12
Abstract:There is an increasing need for both governments and businesses to discover latent anomalous activities in unstructured publicly-available data, produced by professional agencies and the general public. Over the past two decades, consumers have begun to use smart devices to both take in and generate a large volume of open-source text-based data, providing the opportunity for latent anomaly analysis. However, real-time data acquisition, and the processing and interpretation of various types of unstructured data, remains a great challenge. Recent efforts have focused on artificial intelligence / machine learning (AI/ML) solutions to accelerate the labor-intensive linear collection, exploitation, and dissemination analysis cycle and enhance it with a data-driven rapid integration and correlation process of open-source data. This paper describes an Activity Based Intelligence framework for anomaly detection of open-source big data using AI/ML to perform semantic analysis. The proposed Anomaly Detection using Semantic Analysis Knowledge (ADUSAK) framework includes four layers: input layer, knowledge layer, reasoning layer, and graphical user interface (GUI)/output layer. The corresponding main technologies include: Information Extraction, Knowledge Graph (KG) construction, Semantic Reasoning, and Pattern Discovery. Finally, ADUSAK was verified by performing Emerging Events Detection, Fake News Detection, and Suspicious Network Analysis. The generalized ADUSAK framework can be easily extended to a wide range of applications by adjusting the data collection, modeling construction, and event alerting.