A Method Of Text Categorization Based On Genetic Algorithm And Lda

Lei Chen,Jun Li,Li Zhang
DOI: https://doi.org/10.23919/ChiCC.2017.8029089
2017-01-01
Abstract:Latent Dirichlet Allocation(LDA) does not consider the input feature selection. The topic of each word is allocated by LDA in original feature space, which contains many insignificant words and affects quality of topics. In this paper, we proposed a feature selection method based on Genetic Algorithm(GA), which reduces the dimension of LDA input features and makes the generated topic more meaningful. Experimental results on corpus of Fudan University show that micro-average F1 and macro-average F1 is improved by 0.76% and 0.72%, compared with original LDA. The method also reduces the training time of model due to removal of many insignificant words. According to experiment results, GA feature selection is superior to statistical methods such as document frequency and information gain. Besides, it's adaptive and does not need to determine proportion of feature selection. Thus, LDA model based on GA feature selection has better categorization performance .
What problem does this paper attempt to address?