Semantic Smoothing the Multinomial Naive Bayes for Biomedical Literature Classification.

Jian Wen,Zhoujun Li
DOI: https://doi.org/10.1109/grc.2007.98
2007-01-01
Abstract:Huge biomedical literatures result in many new challenges on text classification, its efficiency and sparseness of data attract many researchers. Recent success of language modeling in information retrieval have let us consider again about multinomial Naive Bayes for text classification. In this paper, we propose a semantic smoothing method for Naive Bayes model, biomedical documents were indexed by the concept of UMLS, and at the same time concept pairs which are context-sensitive were extracted as topic signature, the translation between concept pair and concept is attained using EM algorithm. Then classification model is estimated by a mixture model combined with this semantic smoothing method. Ontology-based document representation can deal with synonym and reduce the concept vector. The semantic smoothing method can partly solve the sparseness of data. Our method is evaluated on OHSUMED and genomic track collection, and proper results were attained. We found this semantic smoothing method can attain better results than other simple smoothing method, also this method is significant because of its simpleness, comprehensibility.
What problem does this paper attempt to address?