NCBIminer: Sequences Harvest from Genbank

Xiaoting Xu,Dimitar Dimitrov,Carsten Rahbek,Zhiheng Wang
DOI: https://doi.org/10.1111/ecog.01055
IF: 5.9
2015-01-01
Ecography
Abstract:NCBIminer is freely available, cross‐platform and user‐friendly software for mining nucleotide sequence data from GenBank. It has several features that enable users to accurately and efficiently download sequences with specific attributes from the GenBank database: 1) it uses a novel search strategy, and can download sequences for distantly related taxonomic groups with high accuracy; 2) it deals with genes, CDS, rRNA, and other GenBank‐defined feature types; 3) it can filter sequences by length and similarities with the reference sequence using user‐defined parameters; 4) it can download information on DNA sample collections, e.g. voucher specimen, country, latitude and longitude, and collector; 5) it takes advantage of parallelization for a high efficiency workflow. We demonstrate the use and performance of NCBIminer by downloading sequences for the plant family Campanulaceaes. Compared to other methods, NCBIminer harvests more and longer sequences, and is less sensitive to query sequences.
What problem does this paper attempt to address?