A clustering method for next-generation sequences of bacterial genomes through multiomics data mapping

Ho-Sik Seok,Mikang Sim,Daehwan Lee,Jaebum Kim
DOI: https://doi.org/10.1007/s13258-013-0155-8
2013-11-06
Abstract:With various ‘omics’ data becoming available recently, new challenges and opportunities are provided for researches on the assembly of next-generation sequences. As an attempt to utilize novel opportunities, we developed a next-generation sequence clustering method focusing on interdependency between genomics and proteomics data. Under the assumption that we can obtain next-generation read sequences and proteomics data of a target species, we mapped the read sequences against protein sequences and found physically adjacent reads based on a machine learning-based read assignment method. We measured the performance of our method by using simulated read sequences and collected protein sequences of Escherichia coli (E. coli). Here, we concentrated on the actual adjacency of the clustered reads in the E. coli genome and found that (i) the proposed method improves the performance of read clustering and (ii) the use of proteomics data does have a potential for enhancing the performance of genome assemblers. These results demonstrate that the integrative approach is effective for the accurate grouping of adjacent reads in a genome, which will result in a better genome assembly.
genetics & heredity,biochemistry & molecular biology,biotechnology & applied microbiology
What problem does this paper attempt to address?