COSCEB: Comprehensive search for column-coherent evolution biclusters and its application to hub gene identification

Ankush Maind,Shital Raut
Abstract:Biclustering is an increasingly used data mining technique for searching groups of co-expressed genes across the subset of experimental conditions from the gene-expression data. The group of co-expressed genes is present in the form of various patterns called a bicluster. A bicluster provides significant insights related to the functionality of genes and plays an important role in various clinical applications such as drug discovery, biomarker discovery, gene network analysis, gene identification, disease diagnosis, pathway analysis etc. This paper presents a novel unsupervised approach 'COmprehensive Search for Column-Coherent Evolution Biclusters (COSCEB)' for a comprehensive search of biologically significant column-coherent evolution biclusters. The concept of column subspace extraction from each gene pair and Longest Common Contiguous Subsequence (LCCS) is employed to identify significant biclusters. The experiments have been performed on both synthetic as well as real datasets. The performance of COSCEB is evaluated with the help of key issues. The issues are comprehensive search, Deep OPSM bicluster, bicluster types, bicluster accuracy, bicluster size, noise, overlapping, output nature, computational complexity and biologically significant biclusters. The performance of COSCEB is compared with six all-time famous biclustering algorithms SAMBA, OPSM, xMotif, Bimax, Deep OPSM- and UniBic. The result shows that the proposed approach performs effectively on most of the issues and extracts all possible biologically significant column-coherent evolution biclusters which are far more than other biclustering algorithms. Along with the proposed approach, we have also presented the case study which shows the application of significant biclusters for hub gene identification.
What problem does this paper attempt to address?