Extensible Topic Modeling and Analysis Framework for Multisource Data*

Shuang TANG,Lingxiao ZHANG,Junfeng ZHAO,Bing XIE,Yanzhen ZOU
DOI: https://doi.org/10.3778/j.issn.1673-9418.1710025
2019-01-01
Abstract:With the continuous development and application of information technology, many information systems have accumulated a large amount of multi-source heterogeneous data. A large part of these data is structured data which is high-dimensional, low quality and unmarked. It’s difficult to extract feature and refine knowledge from this kind of data. Topic modeling is a very important method in text processing and data mining. It is an unsupervised learning algorithm that is originally used to model unstructured natural language text. It can effectively extract topic information from text semantics, extract feature and reduce dimensionality. But topic modeling is still not well applied in the processing of complex multi-source data, especially structured data. This paper presents a framework based on extensible topic modeling technology for structured and unstructured multi-source data analysis. This framework analyzes the multi-source data by data importing, data analysis and data visualization three steps. On this basis, a multi-source data analysis tool is implemented. Finally, the experiment of two data sets proves the effectiveness of the multi-source data analysis framework.
What problem does this paper attempt to address?