Research On Learning Bayesian Network Structure With Hidden Variables Based On Genetic Algorithms
Fei Wang,Dayou Liu,Wanxin Xue,Yinan Lu
IF: 1.019
2002-01-01
Chinese Journal of Electronics
Abstract:In recent years, there has been a great interest in learning Bayesian networks from data. A difficulty of learning Bayesian networks is how to discover hidden variables. Hidden variables are not observed in data set, yet interact with several of the observed variables. Introducing hidden variables is helpful for learning a concise structure that is crucial both for avoiding overfitting and for efficient inference in the learned model. In this paper, based on Genetic algorithm a learning algorithm that can detect hidden variables is proposed. Fitness function is presented to evaluate network structure that may include hidden variables. To make computation of fitness feasible, it is advisable to convert incomplete data to complete data. In this paper, this conversion is carried out by current best model of evolution procedure. Variables of individuals pending evaluation and variables of current best individual may be different. To make the latter same as the former, expectation, projection, expansion operation of data set are given. Fitness function based in Minimum description length is designed on this transformed data set. Length-variable encoding is given, where a gene of the individual corresponds to parents of a variable. Besides, genetic operators that evolve structure are designed. Selection operator is selected as rank proportional to avoid premature convergence. Crossover operator is metabolic uniformity crossover, which give chance for good local structures of different individuals to recombine. Five mutation operators are introduced, adding a parent for a variable, deleting a parent of a variable, reversing arc, adding new variable according to structure complexity-local structures overlap much tending to suggest the presence of a hidden variable, deleting introduced variable. Experimental results show that this algorithm can effectively detect variables so that the learned structure takes on good performance.