Abstract:DNA sequences are increasingly used for large-scale biodiversity inventories. Because these genetic data avoid the time-consuming initial sorting of specimens based on their phenotypic attributes, they have been recently incorporated into taxonomic workflows for overlooked and diverse taxa. Major statistical developments have accompanied this new practice, and several models have been proposed to delimit species with single-locus DNA sequences. However, proposed approaches to date make different assumptions regarding taxon lineage history, leading to strong discordance whenever comparisons are made among methods. Distance-based methods, such as Automatic Barcode Gap Discovery (ABGD) and Assemble Species by Automatic Partitioning (ASAP), rely on the detection of a barcode gap (i.e., the lack of overlap in the distributions of intraspecific and interspecific genetic distances) and the associated threshold in genetic distances. Network-based methods, as exemplified by the REfined Single Linkage (RESL) algorithm for the generation of Barcode Index Numbers (BINs), use connectivity statistics to hierarchically cluster-related haplotypes into molecular operational taxonomic units (MOTUs) which serve as species proxies. Tree-based methods, including Poisson Tree Processes (PTP) and the General Mixed Yule Coalescent (GMYC), fit statistical models to phylogenetic trees by maximum likelihood or Bayesian frameworks.Multiple webservers and stand-alone versions of these methods are now available, complicating decision-making regarding the most appropriate approach to use for a given taxon of interest. For instance, tree-based methods require an initial phylogenetic reconstruction, and multiple options are now available for this purpose such as RAxML and BEAST. Across all examined species delimitation methods, judicious parameter setting is paramount, as different model parameterizations can lead to differing conclusions. The objective of this chapter is to guide users step-by-step through all the procedures involved for each of these methods, while aggregating all necessary information required to conduct these analyses. The "Materials" section details how to prepare and format input files, including options to align sequences and conduct tree reconstruction with Maximum Likelihood and Bayesian inference. The Methods section presents the procedure and options available to conduct species delimitation analyses, including distance-, network-, and tree-based models. Finally, limits and future developments are discussed in the Notes section. Most importantly, species delimitation methods discussed herein are categorized based on five indicators: reliability, availability, scalability, understandability, and usability, all of which are fundamental properties needed for any approach to gain unanimous adoption within the DNA barcoding community moving forward.

Hierarchical Heuristic Species Delimitation under the Multispecies Coalescent Model with Migration

Species Delimitation with Gene Flow

Approaches to biological species delimitation based on genetic and spatial dissimilarity

Delimiting Species with Single-Locus DNA Sequences

Detectability of Varied Hybridization Scenarios using Genome-Scale Hybrid Detection Methods

Bayesian Inference of Species Trees from Multilocus Data

Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering

Phylogenetic estimation error can decrease the accuracy of species delimitation: a Bayesian implementation of the general mixed Yule-coalescent model

Molecular species delimitation in the primitively segmented spider genus Heptathela endemic to Japanese islands

Understanding species limits through the formation of phylogeographic lineages

A hierarchical Bayesian approach for estimating the origin of a mixed population

Decoding coalescent hidden Markov models in linear time

Genome sequence-based species delimitation with confidence intervals and improved distance functions

A global test of hybrid ancestry from genome-scale data

Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis

The Multilocus Multispecies Coalescent: A Flexible New Model of Gene Family Evolution

Identifiability of speciation times under the multispecies coalescent

Phylogeny Estimation by Integration over Isolation with Migration Models

Factors affecting the efficiency of molecular species delimitation in a species‐rich insect family

Efficient Bayesian species tree inference under the multi-species coalescent

Limits and convergence properties of the sequentially Markovian coalescent