Towards a Taxonomical Consensus: Diversity and Richness Inference from Large Scale rRNA gene Analysis

Dimitris Papamichail,Celine C. Lesaulnier,Steven Skiena,Sean R. McCorkle,Bernard Ollivier,Daniel van der Lelie
DOI: https://doi.org/10.48550/arXiv.1003.5007
2010-03-25
Populations and Evolution
Abstract:Population analysis is persistently challenging but important, leading to the determination of diversity and function prediction of microbial community members. Here we detail our bioinformatics methods for analyzing population distribution and diversity in large microbial communities. This was achieved via (i) a homology based method for robust phylotype determination, equaling the classification accuracy of the Ribosomal Database Project (RDP) classifier, but providing improved associations of closely related sequences; (ii) a comparison of different clustering methods for achieving more accurate richness estimations. Our methodology, which we developed using the RDP vetted 16S rRNA gene sequence set, was validated by testing it on a large 16S rRNA gene dataset of approximately 2300 sequences, which we obtained from a soil microbial community study. We concluded that the best approach to obtain accurate phylogenetics profile of large microbial communities, based on 16S rRNA gene sequence information, is to apply an optimized blast classifier. This approach is complemented by the grouping of closely related sequences, using complete linkage clustering, in order to calculate richness and evenness indices for the communities.
What problem does this paper attempt to address?