A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering

Matthew G Johnson, Lisa Pokorny, Steven Dodsworth, Laura R Botigué, Robyn S Cowan, Alison Devault, Wolf L Eiserhardt, Niroshini Epitawalage, Félix Forest, Jan T Kim, James H Leebens-Mack, Ilia J Leitch, Olivier Maurin, Douglas E Soltis, Pamela S Soltis, Gane Ka-shu Wong, William J Baker, Norman J Wickett
2019-07-01
Abstract:Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k …
What problem does this paper attempt to address?