Semi-Supervised Learning Based on Information Theory and Functional Dependency Rules of Probability

Li Min Wang,Chun Hong Cao,Xiong Fei Li
DOI: https://doi.org/10.1166/asl.2011.1229
2011-01-01
Advanced Science Letters
Abstract:The problems of unlabeled data and missing values are two hot research topics in machine learning and pattern recognition. In this paper, we proposed a novel algorithm called FFDC, which chooses Naive Bayes as the underlying supervised learner in the semi-supervised learning framework. Based on the conditional independence assumption of Naive Bayes, the information gain of Information theory is redefined to quantitatively measure the information implicated in unlabeled data. And functional dependency rules of probability are deduced based on Armstrong's axioms, which can be used to find and delete redundant attributes. Thus the computational complexity while modeling will be reduced exponentially. Empirical studies on a set of natural domains show that FFDC has clear advantages with respect to generalization and probabilistic performance.
What problem does this paper attempt to address?