BDcleaner: A workflow for cleaning taxonomic and geographic errors in occurrence data archived in biodiversity databases

Jin,Jing,Yang,Jun
DOI: https://doi.org/10.1016/j.gecco.2019.e00852
IF: 4
2020-01-01
Global Ecology and Conservation
Abstract:High-quality data are indispensable for research and management in biodiversity conservation. Nevertheless, errors in biodiversity data must be removed before they can be used with confidence. In this study, we have developed a workflow for cleaning occurrence data archived in various biodiversity databases. The workflow allows researchers and practitioners to identify taxonomic and geographic errors in millions of records in an automatic, reproducible, and transparent manner. It also allows users to correct several types of taxonomic and geographic errors. We applied the workflow to clean global tree occurrence records. The results showed that among the 30,242,556 occurrence records of 58,034 species extracted from eight databases, only 8,624,319 (28.5%) records of 22,766 (39.2%) species were classified as high quality after running through the workflow. Inaccurate and non-standard taxon names appeared as a more severe problem than geographical errors that people are most familiar with. The workflow developed in this study can be easily adapted to clean occurrence records of other taxonomic groups, which allows researchers and practitioners to reduce uncertainties in their findings.
What problem does this paper attempt to address?