Multiple Motif Discovery in Biological Sequences by Mixture Gibbs Sampling

Li-fang LIU,Hong-wei HUO,Bao-shu WANG
DOI: https://doi.org/10.3321/j.issn:0372-2112.2008.04.025
2008-01-01
Abstract:For the motif discovery problem of biological sequences,a mixture Gibbs sampling algorithm is presented.Based on mixture motifs model learning through likelihood maximization,a greedy strategy that adds sequentially new motif to a mixture model is employed.Two sampling methods are designed,site sampling and motif sampling,the two sampling methods are applied by turns.In order to speed up the searching procedure,a hierarchical partitioning scheme based on kd-trees is used for partitioning the input dataset.Experimental results indicate that the proposed algorithm is advantageous in identifying larger groups of motifs characteristic of biological families.In addition,it offers better diagnostic capabilities by building more powerful statistical motif models with improved classification accuracy.
What problem does this paper attempt to address?