Biotopic: A Topic-Driven Biological Literature Mining System

Xi Wang,Peiyan Zhu,Tao Liu,Ke Xu
DOI: https://doi.org/10.1504/IJDMB.2016.075822
2016-01-01
International Journal of Data Mining and Bioinformatics
Abstract:Biology and biomedicine are flourishing disciplines, with massive biological data produced in experiments and huge amount of research papers published in journals. In such a big data context, unsupervised data mining methods such as topic models are used to extract topics from large-scale document collections. In this paper, we present a biological literature mining system based on topic modelling (BioTopic). Experiments show that the perplexity reduction percentage of our pre-processing method is 5% larger that of a traditional pre-processing method. The precision of our search performance reaches 86%, which is better that that of a unigram language model. Our method employs linguistic information from shallow parsing to better pre-process biological literature for topic models. BioTopic with finegrained pre-processing and topic modelling works better than traditional literature mining systems.
What problem does this paper attempt to address?