Long‐read Sequencing in Ecology and Evolution: Understanding How Complex Genetic and Epigenetic Variants Shape Biodiversity
Dan G. Bock,Jianquan Liu,Polina Novikova,Loren H. Rieseberg
DOI: https://doi.org/10.1111/mec.16884
IF: 6.622
2023-01-01
Molecular Ecology
Abstract:Ten years ago, the journal Molecular Ecology published a “road map” paper that reviewed past achievements in the discipline of molecular ecology, identified research challenges and charted a way forward (Andrew et al., 2013). That paper was motivated by a symposium organized during the First Joint Congress on Evolutionary Biology (Ottawa, July 6–10, 2012). In addition, it occurred on the heels of a major inflection point in molecular ecology and in life sciences more broadly: the development and uptake of “next”- or “second”-generation sequencing technologies, which deliver short DNA reads (typically shorter than 400 bp) at very high throughput (e.g., several billion reads per run; Goodwin et al., 2016). As such, Andrew et al. (2013) emphasized the promise of second-generation sequencing for diverse subdisciplines of molecular ecology such as phylogeography, landscape genomics, molecular adaptation and speciation. Representing more than just a technical advancement, second-generation sequencing was predicted to stimulate rapid conceptual breakthroughs in the field, especially in nonmodel species (Stapley et al., 2010; Tautz et al., 2010). As illustrated by any recent issue in the Molecular Ecology journal, these predictions were accurate. While second-generation sequencing has enabled important discoveries at the forefront of molecular ecology, this technology does not come without limitations, the most prominent of which is short read length. Indeed, without additional validation, standard short reads cannot be used to traverse complex regions of the genome such as repetitive elements, duplications, inversions and other forms of structural change (Goodwin et al., 2016; Huddleston et al., 2014). Consequently, these regions have remained relatively unexplored. Ironically, however, they may also be particularly important for understanding ecological and evolutionary processes (Wellenreuther et al., 2019), given their high mutation rates (e.g., Hastings et al., 2009), and the fact that they can be extremely abundant—often surpassing single nucleotide polymorphisms by several fold, in terms of the total length of the genome that is affected (1000 Genomes Project Consortium et al., 2015; Mérot et al., 2023). The technology to obtain reads spanning tens to thousands of kilobases was already available 10 years ago (Hayden, 2009; Munroe & Harris, 2010), and has been developed by several sequencing providers, the best-known of which are Pacific Biosciences (hereafter “PacBio”) and Oxford Nanopore Technologies (hereafter “Nanopore”). Also referred to as third-generation sequencing, these approaches were initially used to complement short-read data (Goodwin et al., 2016; Laszlo et al., 2014; Munroe & Harris, 2010). Despite the potential utility of long reads, the adoption of this technology has been slow, primarily due to the high error rates reported for initial iterations of third-generation sequencing instruments (Glenn, 2011; Ip et al., 2015). Recent years have brought about two important developments. First, the error rates of long reads have dropped considerably, in some cases below 1%, approaching rates characteristic of short reads (Goodwin et al., 2016). This is due to improvements in sequencing chemistry, base-calling algorithms and methods for post-sequencing error correction (Logsdon et al., 2020; Rang et al., 2018). Second, methods for assembling long stretches of DNA from short reads have also become available (e.g., McCoy et al., 2014; Selvaraj et al., 2013; Zheng et al., 2016). Collectively, these achievements are helping biologists tackle highly complex and dynamic regions of the genome, which were largely inaccessible until just a few years ago. Perhaps most clearly, the consequential role of long-read sequencing for getting the job done is illustrated by the recent publication of the first complete, telomere-to-telomere human genome (Nurk et al., 2022), more than two decades after the genome of our species was first made available (International Human Genome Sequencing Consortium, 2001). This Molecular Ecology Special Issue highlights ways in which molecular ecologists are utilizing long-read information to explore the ecological and evolutionary roles of repetitive or otherwise complex loci. The 19 articles that comprise this issue, covering a range of plant, animal, bacteria and virus study systems, are grouped into six sections, which we summarize below. In doing so, our goal is to emphasize some of the key findings of each study. We also highlight, where possible, important challenges that will need to be overcome in the coming years, before long-read sequencing realizes its full potential. We conclude by summarizing the underlying thread of this Special Issue: that complex genetic and epigenetic variation, while traditionally more difficult to study, can make a substantial contribution to processes such as adaptation and speciation. We anticipate that, with continued improvement in long-read sequencing, this area of molecular ecology will only continue to grow, shaping our understanding of downstream biodiversity consequences of complex variants. “Epigenetics” refers to heritable changes in the expression of the genome that are achieved by means other than direct modification in DNA sequence (Bossdorf et al., 2008). While a range of epigenetic mechanisms are known, including DNA methylation, histone modifications or small RNAs, research in ecology and evolution has focused largely on DNA methylation, because it is characterized by increased stability over generations (Verhoeven et al., 2016). Among the different types of methylated nucleotides, 5-methylcytosine (5mC) has received the most attention, as it is the dominant methylation pattern in eukaryotes (Goll & Bestor, 2005). Recent epigenomic studies have indicated that differential methylation can have wide-ranging ecological and evolutionary relevance. For example, broad methylation repatterning is known to follow hybridization and changes in the genomic background (Rapp & Wendel, 2005). As well, 5mC variants have been found to be associated with diverse environmental variables and with complex phenotypic or metabolic traits in a range of plant and animal species (Bossdorf et al., 2008; Hu & Barrett, 2017; Rapp & Wendel, 2005; Verhoeven et al., 2016). To obtain genome-wide profiles of 5mC in population samples, one may treat DNA with bisulphite prior to sequencing, in a step that converts unmethylated cytosines to uracil, rendering methylated vs. unmethylated cytosines identifiable in downstream sequence data (Verhoeven et al., 2016). In one of the two opinion articles of this Special Issue, Nielsen et al. (2023) explore how long-read sequencing is revolutionizing epigenomic studies, using as an example bacteria and bacteriophages, which have a more diverse methylation repertoire than eukaryotes. The authors discuss several advantages that PacBio and Nanopore data offer for the detection of nucleotide modifications, including the fact that these technologies eliminate the need for bisulphite treatment and enable de novo detection of complex epigenetic base modifications. Aside from illustrating state-of-the-art approaches to data acquisition and analysis, Nielsen et al. (2023) also identify current limitations of epigenetic profiling as enabled by long-read sequencing, including the need to develop dedicated analytical tools that minimize noise from neighbouring nucleotides, and that implement reference libraries with the signature of diverse nucleotide modifications. As these advances are achieved, we will be much better positioned to understand the ecological and evolutionary relevance of diverse epigenetic modifications, including for the ongoing arms race between bacteria and bacteriophages (Nielsen et al., 2023). Repetitive regions of the genome such as transposable elements can be an important source of genomic novelty. Well-known routes to the reshuffling of chromosomal segments that involve repetitive DNA include ectopic recombination and nonhomologous end-joining, which can lead to a variety of outcomes such as deletions, duplications, inversions or fusions (González & Petrov, 2012; Huang & Rieseberg, 2020). As illustrated by contributions included in this section, long-read sequencing can benefit investigations of how the repeat landscape may lead to changes in the karyotype or in patterns of synteny. As an example of changes in karyotype, Burley et al. (2023) use Nanopore long reads to characterize a large (134-Mbp) neo-sex chromosome in the blue-faced honeyeater. Results demonstrated that this chromosome originated via a fusion between an autosome and the ancestral Z chromosome, with important consequences for the genomic landscape of diversity and differentiation. Remarkably, the same chromosomal regions appear to have fused convergently in other songbird lineages, potentially facilitated by repeats that are shared between the two chromosomes (Burley et al., 2023). As an example of changes in synteny, Ferguson et al. (2023) use Nanopore long reads to sequence, assemble and compare the genomes of three Eucalyptus species. Results demonstrated that transposon-rich regions of the genome can lead to synteny loss via small-scale rearrangements. Their study thus challenges the generally accepted view that Eucalyptus species maintain a largely syntenic genome. Moreover, results showed that a sizeable fraction of rearrangements contained genes, and therefore have the potential to drive adaptation in this species-rich and widely distributed genus. Rather than representing an obstacle to be overcome during genome assembly, repetitive DNA may also be the focus of study. Peona et al. (2023), for example, investigated the evolution of satellite repetitive DNA in 24 species of birds. Using linked short reads and PacBio long reads, the authors catalogued repeats with monomer sizes ranging from 20 bp to 4 kb that are highly dynamic. Remarkably, patterns of satellite DNA abundance did not align with predictions of current models for satellite DNA evolution. Specifically, satellite DNA profiles were found to be more similar among deeply diverged species than among recently diverged species. This result therefore highlights a promising area for future study. In addition, Wierzbicki et al. (2023) investigated piRNA (PIWI-interacting RNA) clusters in Drosophila. These genomic clusters are known to be rich in repetitive elements and have a crucial role in the genomic defence against transposable elements. The authors resolved 20 such clusters from four Drosophila species, using contiguous genome assemblies made with PacBio and Nanopore data. Aside from developing a framework for quantitative investigations of the dynamics of piRNA clusters, which includes establishing synteny between these highly dynamic loci, Wierzbicki et al. (2023) show that piRNA clusters evolve rapidly, mainly due to the insertion of recent transposable elements, and the deletion of old ones. Remaining challenges include expanding the taxonomic breadth of studies of piRNA cluster evolution, as well as extending analyses to a larger fraction of the total piRNA complement of each genome. As Wierzbicki et al. (2023) point out, both challenges stand to be overcome with the increased implementation of long-read sequencing in molecular ecology. Long-read sequencing is broadening the toolset available for the management of small populations, by enabling the reconstruction of gap-free, highly contiguous genomes at a fast pace (Kardos et al., 2021). This allows us to re-evaluate previous conclusions that were drawn for at-risk populations based on genetic data, such as population origin, levels of inbreeding or genetic structure (Kardos et al., 2021). Moreover, genomic data sets also allow new information to be gained regarding the long-term demographic and evolutionary histories of populations, or the contribution of structural variants to population fitness (Kardos et al., 2021; Wold et al., 2021). This Special Issue includes examples of long-read genome-scale analyses in vulnerable or threatened species. For instance, Li, Yang, et al. (2023) rely on PacBio data and Hi-C technology to assemble the genome of the takin, a large bovid herbivore currently listed as vulnerable by the International Union for Conservation of Nature (IUCN; Li, Yang, et al., 2023). This high-quality chromosome-level assembly was used, along with resequencing data, to demonstrate important declines in effective population size during the past million years, and to uncover evidence of runs of homozygosity caused by recent inbreeding. In another example, Yan et al. (2023) investigate intraspecific divergence in a hot-spring snake that is endemic to the Qinghai–Tibet Plateau and is listed as near threatened by IUCN. The authors use short-read data to infer intraspecific divergence, reconstruct demographic history and find genes under selection during local adaptation. By combining these data with PacBio long reads, the authors are also able to document the abundance of structural variants and assess their contribution to differentiation among major lineages in this system. Being able to thoroughly catalogue genetic diversity is critically important for answering some of the most fundamental questions in evolutionary biology, such as how wild populations are likely to respond when confronted with challenging or novel environments (e.g., Yeaman et al., 2016), or whether and why adaptive evolution repeatedly makes use of the same genetic modules (e.g., Jones et al., 2012). As demonstrated by a number of contributions in this Special Issue, long-read sequencing is helping us answer these questions. Xie et al. (2023), for instance, study how mangroves cope with a unique environment: the interface of land and sea. The authors rely on a combination of short reads and PacBio long reads to obtain chromosome-level assemblies for two mangrove species, and for one closely related inland species. In contrast to previous studies in other mangroves, which found that whole genome duplications preceded the colonization of novel habitats, Xie et al. (2023) do not detect evidence of recent polyploidization. Rather, they attribute the large genome sizes of these species to repeat sequence expansion. Additionally, results emphasized lack of parallelism in gene family evolution, consistent with the use of different genetic modules during adaptation to the intertidal environment in these species. Evidence for repeated use of the same functional genes was found, however, in the study of Li, Wang, et al. (2023). The authors relied on a new chromosome-level assembly for a tropical poplar, obtained with Nanopore long reads. Comparisons with five other poplar species provided evidence of convergent evolution during adaptation to tropical environments. Hotaling et al. (2023) provide another exciting example of how long reads are fast-tracking the study of adaptation. The authors relied on PacBio long reads to obtain a genome assembly for the Antarctic eelpout, the first representative of the family Zoarcidae of ray-finned fish to be genome-sequenced. This highly contiguous assembly in turn allowed the authors to focus on regions of the genome such as the haemoglobin and antifreeze gene clusters which, while representing strong candidates for cold water adaptation, are arranged in highly duplicated tandem arrays (Hotaling et al., 2023). Results were consistent with convergent as well as species-specific mechanisms of adaptation to the extremely cold waters of the Southern Ocean. A series of other studies in this section illustrate the utility of long reads for understanding the genetic architecture of functionally important traits. Nacif et al. (2023), for example, conduct a comprehensive investigation of the sex-determining region in Midas cichlid fish. Using a combination of forward-genetics, PacBio sequencing and Bionano optical mapping, the authors narrow down sex determination in this system to an ~100-kb region of the Y chromosome that is rich in transposable elements. This region harbours a few partial genes, but also one complete coding gene: a duplicate of the anti-Mullerian receptor 2 gene (amhr2Y). Because amhr2Y has been shown to act as a molecular sex-determining locus in other teleost fishes (Nacif et al., 2023), it represents a strong candidate for future functional validation, and probably an additional example of molecular parallelism. At the other extreme in terms of the scale of duplication, Zhu et al. (2023) focus on a biennial alpine plant that sustained two recent rounds of whole genome duplication. In this system, the authors detail a multi-omics investigation of dimorphic cleistogamy. Known to have evolved repeatedly across plants, dimorphic cleistogamy is manifested by the production of both open (available for cross-pollination) and closed (self-fertilizing) flowers, and as such is thought to be important for reproductive assurance in challenging environments (Zhu et al., 2023). An assembly made using short reads and Nanopore data revealed a genome that consists of over 70% repetitive sequences. By integrating additional experiments that probed changes in gene expression and in metabolites, the authors were able to show that a large number of genes and metabolites differentiate the two types of flowers. This is consistent with a complex genetic architecture for this trait, which can at least partially be attributed to past whole genome duplication events (Zhu et al., 2023). Finally, Cohen et al. (2023) investigate the genetics of pesticide resistance in Colorado potato beetle. The authors used PacBio sequencing and a trio-binning approach to obtain three new haploid assemblies with considerably improved contiguity, as compared to the existing reference genome for this pest species. A pangenome obtained using all assemblies as well as population-scale resequencing data are then used to investigate the role of structural variants in rapid adaptation to pesticide exposure. Results revealed that structural variants are abundant, accounting for ~30% of the genome, while also highlighting cases in which structural variants may have been adaptive. Such studies demonstrate the relevance of long-read sequencing for understanding the process of adaptation. At the same time, they underline a need for developing analytical approaches that are designed for structural variants, and that additionally exploit other layers of information made available by third-generation sequencing. For example, in the second opinion article of this Special Issue, Shipilina et al. (2023) consider and illustrate, based on simulated and empirical data, the utility of haplotype information that can be obtained using long-read technology, including for analyses of selective sweeps. Specifically, the authors discuss methods based on ancestral recombination graph reconstruction, which, in addition to mutation, take into account ancestry and recombination. This information, while currently computationally challenging to obtain for large numbers of samples, could vastly improve resolution as compared to data sets based only on single nucleotide polymorphisms, including by identifying multiple selective sweeps that occur in the same genomic region (Shipilina et al., 2023). As adaptation proceeds and populations diverge, reproductive isolation may gradually develop. Genomic analyses of species pairs, and in particular of those pairs that have recently diverged, represent a promising approach for dissecting the genetic architecture of speciation and for identifying barrier loci (Ravinet et al., 2017). Several papers in this Special Issue focus on the speciation continuum and investigate the contribution of structural variants and recombination suppression to species differentiation. Mérot et al. (2023), for example, undertake a detailed genomic investigation of recent speciation using a pair of whitefish species that diverged in allopatry starting around 60,000 years ago, and then came back into contact roughly 12,000 years ago (Mérot et al., 2023). The authors combined short-read resequencing with Nanopore long reads to obtain the first genome assemblies for both Dwarf and Normal whitefish species, and to genotype single nucleotide polymorphisms and structural variants. Whitefish genomes were found to be repeat-rich, with over 60% of sequence corresponding to interspersed repeats. Moreover, results indicated that a large proportion of the structural variants that differentiate the two species were enriched for several classes of transposable elements. This is consistent with a role of bursts in repetitive elements in generating early genome-wide differentiation between species, and even reproductive isolation (Mérot et al., 2023). In another investigation of incipient speciation, Wersebe et al. (2023) focus on the freshwater crustacean Daphnia. The authors present the first genome-wide scan of differentiation for the pulex–pulicaria pair of species, which separated roughly 150,000 years ago (Wersebe et al., 2023). A reference genome made for D. pulicaria using PacBio data, complemented with short reads for both species and their hybrids, allowed the authors to reconstruct the genomic landscape of differentiation. Contrary to expectations, results indicated that genomic windows of high differentiation between these species are restricted to genic regions of high recombination. Finally, Zhang et al. (2023) present an in-depth investigation of the contribution of structural variation to reproductive isolation, in one of the few studies so far that has implemented population-scale long-read sequencing. The authors focus on a natural hybrid zone established between two species of Lycaeides butterflies that separated over 2.4 million years ago, and that came into secondary contact roughly 14,000 years ago (Zhang et al., 2023). Structural variants were genotyped for parental and hybrid individuals using Nanopore data, and then validated using short reads. Genomic cline analyses revealed over 562 structural variants with a signature of selection in the hybrid zone. Among different structural variants, deletions were found to exhibit the largest departures from neutral expectations, pointing to a large contribution of these variants, along with gene-rich inversions, to hybrid fitness and reproductive isolation (Zhang et al., 2023). In addition to facilitating in-depth analysis of epigenomes, genomes, populations and species, long-read sequencing can also be harnessed to study species interactions. As illustrated by the two papers in this section, this information can be broadly relevant in contexts that range from pathogen control to understanding how biological communities are assembled. In the first paper, van Steenbrugge et al. (2023) study the evolution of virulence in potato cyst nematodes, which are among the most destructive pathogens of potato worldwide. The authors rely on Nanopore data to assemble a new and highly contiguous reference genome for potato cyst nematodes as well as for an outgroup. These genomes are in turn used to investigate six families of effectors, which are proteins secreted by the pathogen that can manipulate plant physiology and have a key role in virulence. Aside from illuminating patterns of evolutionary diversification for effector genes, results are also predicted to facilitate the management of potato cyst nematodes. Specifically, the findings of van Steenbrugge et al. (2023) should enable molecular investigations of pathogen populations, and subsequent matching of potato host resistance genes with pathogen virulence genotypes. In the second paper in this section, Handy et al. (2023) rely on PacBio data to investigate the composition of gut bacterial communities for two carpenter bee species that are incipiently social. In this case, the ability to obtain full-length 16S amplicons via long-read sequencing allowed the authors to classify bacterial species with significantly improved resolution, and to reveal in this way species interactions that would have otherwise remained cryptic. Results revealed both shared and distinct elements of the microbiome between the two bee species. Moreover, results indicated that different components of the microbiome might be structured by different processes, including geographical isolation and patterns of microbial transmission, highlighting a promising area for future investigation. As illustrated by the collection of articles in this Special Issue, long-read sequencing is providing molecular ecologists with the information needed to tackle some of the most challenging basic and applied topics in our discipline, with important discoveries being made across groups of organisms and levels of biological organization. These studies collectively emphasize the underlying thread of this Special Issue: epigenetic variants, structural variants, repetitive elements and other regions in the genome that may have been hard to assemble and genotype using short reads can now be properly traversed and can play a critical role during adaptation and species diversification. In this context, we emphasize the need for expanding long-read sequencing at scales that exceed single individuals. While obtaining highly contiguous reference genomes is an essential first step, we stand to gain much more from replicate genome assemblies, pangenomes and population-scale long-read sequencing, as illustrated by articles in this Special Issue. At the same time, there is a critical need for analytical improvements. These include developing methods that explicitly consider structural variation, as well as improving the computational efficiency of existing methods, such that long-range information provided by third-generation sequencing can be efficiently harnessed for large numbers of samples. Ten years ago, in their road map paper, Andrew et al. (2013) emphasized how next-generation sequencing is improving our observational abilities, illuminating new areas of study. We are now in a similar position, on the cusp of accelerated progress facilitated by recent advances in long-read sequencing technology. With continued broadening of the molecular and analytical toolkits available to molecular ecologists, we are increasingly able to push the conceptual limits of our discipline, and to answer ever more challenging questions of basic and applied relevance about the biodiversity that sustains us and surrounds us. We would like to acknowledge all the authors who contributed articles to this Special Issue, the reviewers who evaluated the manuscripts, as well as editors at the Molecular Ecology journal including Emily Warschefsky and Ben Sibbett for their help throughout. Not applicable.
What problem does this paper attempt to address?
-
Sequencing breakthroughs for genomic ecology and evolutionary biology
Matthew E Hudson,MATTHEW E. HUDSON
DOI: https://doi.org/10.1111/j.1471-8286.2007.02019.x
IF: 7.7
2008-01-01
Molecular Ecology Resources
Abstract:Techniques involving whole-genome sequencing and whole-population sequencing (metagenomics) are beginning to revolutionize the study of ecology and evolution. This revolution is furthest advanced in the Bacteria and Archaea, and more sequence data are required for genomic ecology to be fully applied to the majority of eukaryotes. Recently developed next-generation sequencing technologies provide practical, massively parallel sequencing at lower cost and without the requirement for large, automated facilities, making genome and transcriptome sequencing and resequencing possible for more projects and more species. These sequencing methods include the 454 implementation of pyrosequencing, Solexa/Illumina reversible terminator technologies, polony sequencing and AB SOLiD. All of these methods use nanotechnology to generate hundreds of thousands of small sequence reads at one time. These technologies have the potential to bring the genomics revolution to whole populations, and to organisms such as endangered species or species of ecological and evolutionary interest. A future is now foreseeable where ecologists may resequence entire genomes from wild populations and perform population genetic studies at a genome, rather than gene, level. The new technologies for high throughput sequencing, their limitations and their applicability to evolutionary and environmental studies, are discussed in this review.
biochemistry & molecular biology,ecology,evolutionary biology
-
Long-read sequencing in the era of epigenomics and epitranscriptomics
Morghan C. Lucas,Eva Maria Novoa
DOI: https://doi.org/10.1038/s41592-022-01724-8
IF: 48
2023-01-01
Nature Methods
Abstract:As long-read sequencing technologies continue to advance, the possibility of obtaining maps of DNA and RNA modifications at single-molecule resolution has become a reality. Here we highlight the opportunities and challenges posed by the use of long-read sequencing technologies to study epigenetic and epitranscriptomic marks and how this will affect the way in which we approach the study of health and disease states.
biochemical research methods
-
Perspectives and benefits of high-throughput long-read sequencing in microbial ecology
Leho Tedersoo,Mads Albertsen,Sten Anslan,Benjamin Callahan
DOI: https://doi.org/10.1128/AEM.00626-21
2021-06-17
Applied and Environmental Microbiology
Abstract:Short-read, high-throughput sequencing (HTS) methods have yielded numerous important insights into microbial ecology and function. Yet, in many instances short-read HTS techniques are suboptimal, for example by providing insufficient phylogenetic resolution or low integrity of assembled genomes. Single-molecule and synthetic long-read (SLR) HTS methods have successfully ameliorated these limitations. In addition, nanopore sequencing has generated a number of unique analysis opportunities such as rapid molecular diagnostics and direct RNA sequencing, and both PacBio and nanopore sequencing support detection of epigenetic modifications. Although initially suffering from relatively low sequence quality, recent advances have greatly improved the accuracy of long read sequencing technologies. In spite of great technological progress in recent years, the long-read HTS methods (PacBio and nanopore sequencing) are still relatively costly, require large amounts of high-quality starting material, and commonly need specific solutions in various analysis steps. Despite these challenges, long-read sequencing technologies offer high-quality, cutting-edge alternatives for testing hypotheses about microbiome structure and functioning as well as assembly of eukaryote genomes from complex environmental DNA samples.
microbiology,biotechnology & applied microbiology
-
Revisiting genomes of non-model species with long reads yields new insights into their biology and evolution
Nadège Guiglielmoni,Laura I. Villegas,Joseph Kirangwa,Philipp H. Schiffer
DOI: https://doi.org/10.3389/fgene.2024.1308527
IF: 3.7
2024-02-07
Frontiers in Genetics
Abstract:High-quality genomes obtained using long-read data allow not only for a better understanding of heterozygosity levels, repeat content, and more accurate gene annotation and prediction when compared to those obtained with short-read technologies, but also allow to understand haplotype divergence. Advances in long-read sequencing technologies in the last years have made it possible to produce such high-quality assemblies for non-model organisms. This allows us to revisit genomes, which have been problematic to scaffold to chromosome-scale with previous generations of data and assembly software. Nematoda, one of the most diverse and speciose animal phyla within metazoans, remains poorly studied, and many previously assembled genomes are fragmented. Using long reads obtained with Nanopore R10.4.1 and PacBio HiFi, we generated highly contiguous assemblies of a diploid nematode of the Mermithidae family, for which no closely related genomes are available to date, as well as a collapsed assembly and a phased assembly for a triploid nematode from the Panagrolaimidae family. Both genomes had been analysed before, but the fragmented assemblies had scaffold sizes comparable to the length of long reads prior to assembly. Our new assemblies illustrate how long-read technologies allow for a much better representation of species genomes. We are now able to conduct more accurate downstream assays based on more complete gene and transposable element predictions.
genetics & heredity
-
Dna sequence analysis: new applications with high throughput sequencing and new methods in studying gene families and human haplogroups
Hong Ma,Stephen Schaeffer,Yazhou Sun
2012-01-01
Abstract:Understanding the sequential information coded in DNA, RNA and proteins is important for both basic and applied researches in life sciences. Extensive efforts have been devoted to the research and development of DNA sequence analysis methods. The studies described in this dissertation explored new applications of existing methods in the context of the recent development of ultra-high throughput sequencing technologies. This dissertation also included new methods developed for studying gene families and human haplogroups. The theories, algorithms and tools for analyzing DNA sequence information concerning these studies are reviewed in Chapter 1 of this dissertation.With the recent development in DNA sequencing technologies, came many new research opportunities. Great challenges also came along, mainly because of the large data size of the latest high throughput sequencing technologies. The potential of these new technologies was exploited to complete a 100,000 years old ancient polar bear mitochondrial genome. With this and some additional modern bear data, the matrilineal polar bear's divergence time was estimated to be around 130,000 years ago, which is significantly older than some recent estimates. This estimate indicated that modern polar bear matrilineal ancestors adapted to the niche polar environment within 30,000 years after the speciation event and propagated along the entire Arctic Circle for the next 100,000 years. This recent speciation and rapid expansion process is analogous to the evolution and migration of modern humans. The lineage characteristics of the latter were also briefly studied using the same technologies. (Chapter 2)Because of the increased efficiency from the latest sequencing technologies, more and more complete human mitochondrial genomes have been generated at an increasingly faster speed. Although mitochondrial haplogroups, and their classification and identification were widely used in human evolution and population studies, the current tools could not fully take advantage of the rapidly growing number of new mitochondrial genomes. An updated mitochondrial haplogroup classification system was thus developed with evolutionary models that incorporate the mitochondrial genomic variations within the human population. These variations have not been considered by previous methods, which could lead to incorrectly classified haplogroups. The variation parameters, including the whole-genome substitution rate (0.013 - 0.1 substitutions per generation), the rate heterogeneity among sites (Gamma distribution shape parameter α = 0.7078) and the percentage of invariant sites (64%), were estimated based on 7985 full-length human mitochondrial genome sequences. Haplogroups were then classified based on the corrected genetic distance estimation and modeled with position specific matrices. A new haplogroup identification system was developed based on the resulting matrices and the maximum-likelihood estimation (MLE) method, permitting fast and accurate haplogroup assignment for both known and new mitochondrial genomes. The entire system is available through the HapSearch web application (http://hapsearch.synblex.com). (Chapter 3) The latest sequencing technologies also allowed a more thorough study of stage-specific transcriptional activities. To elucidate the transcriptomic profiles and new transcriptomic activities in neural development, nine recent RNA-seq datasets corresponding to tissues/organs ranging from stem cell, embryonic brain cortex to adult whole brain were analyzed. The global similarities between the neural and stem cell transcriptomes were found on both genic and chromosomal levels. A previously undocumented high level of unannotated expression was found in mouse embryonic brain cortices, the intronic part of which was found to be strongly associated with gene ontology (GO) categories that are important for synaptogenesis and neural circuit formation. This suggested potentially novel genes, gene functions and regulatory mechanisms in early brain development. (Chapter 4)Although the speed of generating genomic sequences was increasing rapidly, the development of genome annotation was lagging behind. This slowed down or prevented a broader utilization of the newly sequenced genomes. To partially mitigate this situation, a new tool, called Phoenix, was developed for retrieving homologues of a given gene or gene family from unannotated genomes. Phoenix exhibited fast and accurate performance in simulation using known gene families' data. Its advantage was further demonstrated by correctly retrieving homologues of a gene family that has a known complex evolutionary history. This tool allows gene family studies in unannotated genomes or even partially assembled genomes. (Chapter 5)Finally, this dissertation concluded with a discussion of the intrinsic limitations and advantages of the DNA sequence analysis, along with its current and future application potentials. (Chapter 6)
-
Rethinking eco‐evo studies of gene expression for non‐model organisms in the genomic era
Adam H. Freedman,Timothy B. Sackton
DOI: https://doi.org/10.1111/mec.17378
IF: 6.622
2024-05-10
Molecular Ecology
Abstract:Recent advances in genomic technology, including the rapid development of long‐read sequencing technology and single‐cell RNA‐sequencing methods, are poised to significantly expand the kinds of studies that are feasible in ecological genomics. In this perspective, we review these new technologies and discuss their potential impact on gene expression studies in non‐model organisms. Although traditional RNA‐sequencing methods have been an extraordinarily powerful tool to apply functional genomics in an ecological context, bulk RNA‐seq approaches often rely on de novo transcriptome assembly, and cannot capture expression changes in rare cell populations or distinguish shifts in cell type abundance. Advancements in genome assembly technology, particularly long‐read sequencing, and improvements in the scalability of single‐cell RNA‐sequencing (scRNA‐seq) offer unprecedented resolution in understanding cellular heterogeneity and gene regulation. We discuss the potential of these technologies to enable disentangling differential gene regulation from cell type composition differences and uncovering subtle expression patterns masked by bulk RNA‐seq. The integration of these approaches provides a more nuanced understanding of the ecological and evolutionary dynamics of gene expression, paving the way for refined models and deeper insights into the generation of biodiversity.
biochemistry & molecular biology,ecology,evolutionary biology
-
Coming of age: ten years of next-generation sequencing technologies
Sara Goodwin,John D. McPherson,W. Richard McCombie
DOI: https://doi.org/10.1038/nrg.2016.49
IF: 59.581
2016-05-17
Nature Reviews Genetics
Abstract:Key PointsThere are two major paradigms in next-generation sequencing (NGS) technology: short-read sequencing and long-read sequencing. Short-read sequencing approaches provide lower-cost, higher-accuracy data that are useful for population-level research and clinical variant discovery. By contrast, long-read approaches provide read lengths that are well suited for de novo genome assembly applications and full-length isoform sequencing.NGS technologies have been evolving over the past 10 years, leading to substantial improvements in quality and yield; however, certain approaches have proven to be more effective and adaptable than others.Recent improvements in chemistry, costs, throughput and accessibility are driving the emergence of new, varied technologies to address applications that were not previously possible. These include integrated long-read and short-read sequencing studies, routine clinical DNA sequencing, real-time pathogen DNA monitoring and massive population-level projects.Although massive strides are being made in this technology, several notable limitations remain. The time required to sequence and analyse data limits the use of NGS in clinical applications in which time is an important factor; the costs and error rates of long-read sequencing make it prohibitive for routine use, and ethical considerations can limit the public and private use of genetic data.We can expect increasing democratization and options for NGS in the future. Many new instruments with varied chemistries and applications are being released or being developed.
genetics & heredity
-
Phyloepigenetics in phylogeny analyses
Simeon Santourlidis
DOI: https://doi.org/10.1101/2024.08.14.607911
2024-08-18
Abstract:Long-standing, continuous blurring and controversies in the field of phylogenetic interspecies relations, associated with insufficient explanations for dynamics and variability of speeds of evolution in mammals, hint to a crucial missing link. It has been suggested that transgenerational epigenetic inheritance and the concealed mechanisms behind play a distinct role in mammalian evolution. Here, a comprehensive sequence alignment approach in hominid species, i.e., Homo sapiens, Homo neanderthalensis, denisovan human, Pan troglodytes, Pan paniscus, Gorilla gorilla and Pongo pygmaeus, comprising conserved CpG islands of housekeeping genes, uncover evidence for a distinct variability of CpG dinucleotides. Applying solely these evolutionary consistent and inconsistent CpG sites in a classic phylogenetic analysis, calibrated by the divergence time point of the common chimpanzee (Pan troglodytes) and the bonobo or pygmy chimpanzee (Pan paniscus), a "phylo-epigenetic" tree has been generated which precisely recapitulates branch points and branch lengths, i.e., divergence events and relations, as they have been broadly suggested in the current literature, based on comprehensive molecular phylogenomics and fossil records. I suggest here that CpG dinucleotides changes at CpG islands are of superior importance for evolutionary development and determine the emerging DNA methylation profiles.
Evolutionary Biology
-
An epigenetic toolbox for conservation biologists
Alice Balard,Miguel Baltazar‐Soares,Christophe Eizaguirre,Melanie J. Heckwolf
DOI: https://doi.org/10.1111/eva.13699
2024-06-04
Evolutionary Applications
Abstract:Ongoing climatic shifts and increasing anthropogenic pressures demand an efficient delineation of conservation units and accurate predictions of populations' resilience and adaptive potential. Molecular tools involving DNA sequencing are nowadays routinely used for these purposes. Yet, most of the existing tools focusing on sequence‐level information have shortcomings in detecting signals of short‐term ecological relevance. Epigenetic modifications carry valuable information to better link individuals, populations, and species to their environment. Here, we discuss a series of epigenetic monitoring tools that can be directly applied to various conservation contexts, complementing already existing molecular monitoring frameworks. Focusing on DNA sequence‐based methods (e.g. DNA methylation, for which the applications are readily available), we demonstrate how (a) the identification of epi‐biomarkers associated with age or infection can facilitate the determination of an individual's health status in wild populations; (b) whole epigenome analyses can identify signatures of selection linked to environmental conditions and facilitate estimating the adaptive potential of populations; and (c) epi‐eDNA (epigenetic environmental DNA), an epigenetic‐based conservation tool, presents a non‐invasive sampling method to monitor biological information beyond the mere presence of individuals. Overall, our framework refines conservation strategies, ensuring a comprehensive understanding of species' adaptive potential and persistence on ecologically relevant timescales.
evolutionary biology
-
Phylogenomics of the Epigenetic Toolkit Reveals Punctate Retention of Genes Across Eukaryotes
Agnes K. M. Weiner,Mario A. Ceron-Romero,Ying Yan,Laura A. Katz
DOI: https://doi.org/10.1093/gbe/evaa198
2020-01-01
Genome Biology and Evolution
Abstract:Epigenetic processes in eukaryotes play important roles through regulation of gene expression, chromatin structure, and genome rearrangements. The roles of chromatin modification (e.g., DNA methylation and histone modification) and non-protein-coding RNAs have been well studied in animals and plants. With the exception of a few model organisms (e.g., Saccharomyces and Plasmodium), much less is known about epigenetic toolkits across the remainder of the eukaryotic tree of life. Even with limited data, previous work suggested the existence of an ancient epigenetic toolkit in the last eukaryotic common ancestor. We use PhyloToL, our taxon-rich phylogenomic pipeline, to detect homologs of epigenetic genes and evaluate their macroevolutionary patterns among eukaryotes. In addition to data from GenBank, we increase taxon sampling from understudied clades of SAR (Stramenopila, Alveolata, and Rhizaria) and Amoebozoa by adding new single-cell transcriptomes from ciliates, foraminifera, and testate amoebae. We focus on 118 gene families, 94 involved in chromatin modification and 24 involved in non-protein-coding RNA processes based on the epigenetics literature. Our results indicate 1) the presence of a large number of epigenetic gene families in the last eukaryotic common ancestor; 2) differential conservation among major eukaryotic clades, with a notable paucity of genes within Excavata; and 3) punctate distribution of epigenetic gene families between species consistent with rapid evolution leading to gene loss. Together these data demonstrate the power of taxon-rich phylogenomic studies for illuminating evolutionary patterns at scales of > 1 billion years of evolution and suggest that macroevolutionary phenomena, such as genome conflict, have shaped the evolution of the eukaryotic epigenetic toolkit.
-
Unveiling microbial diversity: harnessing long-read sequencing technology
Daniel P. Agustinho,Yilei Fu,Vipin K. Menon,Ginger A. Metcalf,Todd J. Treangen,Fritz J. Sedlazeck
DOI: https://doi.org/10.1038/s41592-024-02262-1
IF: 48
2024-05-01
Nature Methods
Abstract:Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
biochemical research methods
-
Improving bacterial metagenomic research through long read sequencing
Noah Greenman,Sayf Al-Deen Hassouneh,Latifa S. Abdelli,Catherine Johnston,Taj Azarian
DOI: https://doi.org/10.1101/2023.10.31.564966
2024-04-04
Abstract:Metagenomic sequencing analysis is central to investigating microbial communities in clinical and environmental studies. Short read sequencing remains the primary data type for metagenomic research, however, long read sequencing promises advantages of improved metagenomic assembly and resolved taxonomic identification. To assess the comparative performance of short and long read sequencing data for metagenomic analysis, we simulated short and long read datasets using increasingly complex metagenomes comprised of 10, 20, and 50 microbial taxa. In addition, an empirical dataset of paired short and long read data from mouse fecal pellets was generated to assess feasibility. We compared metagenomic assembly quality, taxonomic classification capabilities, and metagenome-assembled genome recovery rates for both simulated and real metagenomic sequence data. We show that long read sequencing data significantly improves taxonomic classification capabilities and assembly quality. For simulated long read datasets, metagenomic assemblies were completer and more contiguous with higher rates of metagenome-assembled genome recovery. This resulted in more precise taxonomic classifications. Analysis of empirical data demonstrated that sequencing technology directly affects compositional results. Overall, we highlight strengths of long read sequencing for metagenomic studies of microbial communities over traditional short read approaches. Long read sequencing improved the accuracy of classification and abundance estimation. These results will aid researchers when considering which sequencing platforms to use for metagenomic projects.
Bioinformatics
-
Abstract 7026: Multi-omic genomic mapping with long read sequencing
Bryan J. Venters,Paul W. Hook,Vishnu S. Kumary,Alli R. Hickman,James T. Anderson,Anup Vaidya,Ryan J. Ezell,Jonathan M. Burg,Zu-Wen Sun,Martis W. Cowles,Winston Timp,Michael-Christopher Keogh
DOI: https://doi.org/10.1158/1538-7445.am2024-7026
IF: 11.2
2024-03-22
Cancer Research
Abstract:Abstract Gene transcription is regulated by the complex interplay between histone post-translational modifications (PTMs), chromatin associated proteins (CAPs), and DNA methylation (DNAme). Mapping their genomic locations and examining the relationships between these chromatin elements is a powerful approach to decipher mechanisms of disease, thereby enabling discovery of novel biomarkers and therapeutics. Leading epigenomic mapping technologies (e.g., ChIP-seq, CUT&RUN) rely upon DNA fragmentation to isolate regions of interest for sequencing on short read platforms (e.g., Illumina). This strategy leads to substantial loss of contextual information regarding the surrounding DNA, precluding the identification of multiple co-occurring epigenomic features on a single DNA molecule. By contrast, long-read sequencing (LRS) platforms are capable of sequencing very long reads from a single molecule (typically >10kb), allowing relationships between features on a single molecule to be used to resolve heterogeneity within mixed populations. Here we report a robust multi-omic method that leverages LRS to simultaneously profile histone PTMs (or CAPs), DNAme, and parental haplotype in a single assay. This nondestructive, epigenomic mapping approach leverages a novel DNA methyltransferase fusion protein (pAG-M.EcoGII) to label DNA underneath antibody-targeted chromatin features, thereby marking sites of interest while preserving DNA molecules intact for LRS. Inspired by our work with state-of-the-art immunotethering-based approaches (CUT&RUN/CUT&Tag), nuclei are bound to magnetic beads to streamline and automate sample processing. Next, adenosines nearby antibody-targeted chromatin features are methylated with pAG-M.EcoGII, which are then directly read from genomic DNA using Oxford Nanopore Technologies or Pacific Biosciences LRS platforms. To determine the capabilities and limitations of this assay, we tested multiple chromatin targets in various cell lines. Importantly, this method is highly reproducible across biological replicates, and highly concordant with orthogonal SRS assays (e.g., CUT&RUN). Further, we showed that this method is a true multi-omic approach by simultaneously profiling histone PTMs, native DNAme (5mC), and parental single-nucleotide variants from single DNA molecules within a single reaction. Finally, this workflow preserves chromatin integrity for LRS, revealing heterogeneity (e.g., haplotype or paternal origin) within/between data types and providing access to previously unmappable genomic regions (e.g., centromeres). Citation Format: Bryan J. Venters, Paul W. Hook, Vishnu S. Kumary, Alli R. Hickman, James T. Anderson, Anup Vaidya, Ryan J. Ezell, Jonathan M. Burg, Zu-Wen Sun, Martis W. Cowles, Winston Timp, Michael-Christopher Keogh. Multi-omic genomic mapping with long read sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7026.
oncology
-
Sequence Depth, Not PCR Replication, Improves Ecological Inference from Next Generation DNA Sequencing
Dylan P. Smith,Kabir G. Peay
DOI: https://doi.org/10.1371/journal.pone.0090234
IF: 3.7
2014-02-28
PLoS ONE
Abstract:Recent advances in molecular approaches and DNA sequencing have greatly progressed the field of ecology and allowed for the study of complex communities in unprecedented detail. Next generation sequencing (NGS) can reveal powerful insights into the diversity, composition, and dynamics of cryptic organisms, but results may be sensitive to a number of technical factors, including molecular practices used to generate amplicons, sequencing technology, and data processing. Despite the popularity of some techniques over others, explicit tests of the relative benefits they convey in molecular ecology studies remain scarce. Here we tested the effects of PCR replication, sequencing depth, and sequencing platform on ecological inference drawn from environmental samples of soil fungi. We sequenced replicates of three soil samples taken from pine biomes in North America represented by pools of either one, two, four, eight, or sixteen PCR replicates with both 454 pyrosequencing and Illumina MiSeq. Increasing the number of pooled PCR replicates had no detectable effect on measures of α- and β-diversity. Pseudo-β-diversity - which we define as dissimilarity between re-sequenced replicates of the same sample - decreased markedly with increasing sampling depth. The total richness recovered with Illumina was significantly higher than with 454, but measures of α- and β-diversity between a larger set of fungal samples sequenced on both platforms were highly correlated. Our results suggest that molecular ecology studies will benefit more from investing in robust sequencing technologies than from replicating PCRs. This study also demonstrates the potential for continuous integration of older datasets with newer technology.
multidisciplinary sciences
-
Bioinformatics Methods and Biological Interpretation for Next-Generation Sequencing Data
Guohua Wang,Yunlong Liu,Dongxiao Zhu,Gunnar W. Klau,Weixing Feng
DOI: https://doi.org/10.1155/2015/690873
2015-01-01
Abstract:Next-generation sequencing (NGS) technologies have revolutionarily reshaped the landscape of “-omics” research areas and their effects are becoming increasingly widespread. With its significantly lower costs and higher throughput, NGS has been applied to genome, transcriptome, and epigenome research. The plethora of information that emerges from large-scale next-generation sequencing experiments has triggered the development of bioinformatics tools and method for efficient analysis, interpretation, and visualization of NGS data. Such methods and tools will substantially promote the life-science community to better and efficiently help understand the underlying biological principles and mechanisms. This special issue mainly focuses on the original research articles as well as review articles that develop new bioinformatics approaches, present novel platforms and systems, and describe concise models well explaining the biological context and application in relation to genetics, metagenomics, and clinical study from NGS data. This special issue contains nine papers. Two papers discuss the application of NGS data analysis in metagenomics and one paper presents R package for metagenomic systems biology analysis. One review paper discusses the software to detect alternative splicing isoforms from deep sequencing data. The other five papers are related to application of NGS data integration in genomics, genetics, and epigenetics. In “mmnet: An R Package for Metagenomics Systems Biology Analysis,” the authors developed R package, mmnet, to implement community-level metabolic network reconstruction and also implement a set of functions for automatic analysis pipeline construction. The package has substantial potentials in metagenomic studies that focus on identifying system-level variations of human microbiome associated with disease. The paper “Constructing a Genome-Wide LD Map of Wild A. gambiae Using Next-Generation Sequencing” sequenced the genomes of nine individual wild A. gambiae mosquitoes using next-generation sequencing technologies. And 2,219,815 common single nucleotide polymorphisms (SNPs) were detected. Nearly one million SNPs that were genotyped with 99.6% confidence were extracted from these high-throughput sequencing data. Based on these SNP genotypes, the authors constructed a genome-wide linkage disequilibrium (LD) map for wild A. gambiae mosquitoes from malaria-endemic areas in Kenya and made it available through a public website. The paper entitled “How to Isolate a Plant's Hypomethylome in One Shot” provided an easy, fast, and cost-effective tool to obtain a plant's hypomethylome (the nonmethylated part of the genome) by an optimized methyl filtration protocol with subsequent next-generation sequencing, in essence a variant of MRE-seq. The hypomethylomes which were identified in three plant species, Oryza sativa, Picea abies, and Crocus sativus, showed clear enrichment in genes and their flanking regions. This method is extremely conducive to studying and understanding the genomes of nonmodel organisms. In “Genetic Interactions Explain Variance in Cingulate Amyloid Burden: An AV-45 PET Genome-Wide Association and Interaction Study in the ADNI Cohort,” the authors performed a genome-wide association study (GWAS) and a genome-wide interaction study (GWIS) of an amyloid imaging phenotype, using the data from Alzheimer's Disease (AD) Neuroimaging Initiative. The GWAS analysis revealed significant hits within or proximal to APOE, APOC1, and TOMM40 genes. The GWIS analysis yielded 8 novel SNP-SNP interaction findings that warrant replication and further investigation. The paper “Understanding Transcription Factor Regulation by Integrating Gene Expression and DNase I Hypersensitive Sites” identified the functional transcription factor binding sites in gene regulatory region by integrating the DNase I hypersensitive sites with known position weight matrices. The authors present a model-based computational approach to predict a set of transcription factors that potentially cause such differential gene expression in cervical cancer HeLa S3 cell and HelaS3-ifna4h cell. This model demonstrated the potential to computationally identify the functional transcription factors in gene regulation. The paper “Survey of Programs Used to Detect Alternative Splicing Isoforms from Deep Sequencing Data In Silico” is a review paper. Alternative splicing (AS) is very important for gene expression and protein diversity. First the authors summarized the alternative splicing forms and the means of selective splicing. Then the authors described the numerous methods for the read mapping of RNA-seq data and alternative types of splicing prediction software. At last, HMMSplicer, SOAPsplice, TopHat, and STAR were used to evaluate the performance of alternative splicing isoforms detection. The article “MicroRNA Promoter Identification in Arabidopsis Using Multiple Histone Marker” was devoted to a computational strategy, which identified the promoter regions of most microRNA genes in Arabidopsis, using the genome wide profiles of nine histone markers. Based upon the assumption that the distributions of histone markers around the transcription start sites (TSSs) of microRNA genes are similar with the TSSs of protein coding gene, the Support Vector Machine (SVM) was used to identify 42 independent miRNA TSSs and 132 miRNA TSSs which are located in the promoters of upstream genes. The annotation of microRNA TSSs will provide the measurements regarding the initiation of transcription and better understanding of microRNA regulation. The paper “454-Pyrosequencing Analysis of Bacterial Communities from Autotrophic Nitrogen Removal Bioreactors Utilizing Universal Primers: Effect of Annealing Temperature” carried out a metagenomic analysis (pyrosequencing) of total bacterial diversity including Anammox population in five autotrophic nitrogen removal technologies, two bench-scale (MBR and low temperature CANON) and three full-scale (Anammox, CANON, and DEMON), by optimization of primer selection and PCR conditions. The pyrosequencing data showed that annealing temperature of 45°C yielded the best results in terms of species richness and diversity for all bioreactors analyzed. The paper entitled “Active Microbial Communities Inhabit Sulphate-Methane Interphase in Deep Bedrock Fracture Fluids in Olkiluoto, Finland” investigated active microbial communities of deep crystalline bedrock fracture water from seven different boreholes in Olkiluoto (Western Finland), using bacterial and archaeal 16S rRNA, dsrB, and mcrA gene transcript targeted 454 pyrosequencing. The results demonstrated that active and highly diverse but sparse and stratified microbial communities inhabited the Fennoscandian deep bedrock ecosystems.
-
Environmental DNA: The next chapter
Rosetta Blackman,Marjorie Couton,François Keck,Dominik Kirschner,Luca Carraro,Eva Cereghetti,Kilian Perrelet,Raphael Bossart,Jeanine Brantschen,Yan Zhang,Florian Altermatt
DOI: https://doi.org/10.1111/mec.17355
IF: 6.622
2024-04-18
Molecular Ecology
Abstract:Molecular tools are an indispensable part of ecology and biodiversity sciences and implemented across all biomes. About a decade ago, the use and implementation of environmental DNA (eDNA) to detect biodiversity signals extracted from environmental samples opened new avenues of research. Initial eDNA research focused on understanding population dynamics of target species. Its scope thereafter broadened, uncovering previously unrecorded biodiversity via metabarcoding in both well‐studied and understudied ecosystems across all taxonomic groups. The application of eDNA rapidly became an established part of biodiversity research, and a research field by its own. Here, we revisit key expectations made in a land‐mark special issue on eDNA in Molecular Ecology in 2012 to frame the development in six key areas: (1) sample collection, (2) primer development, (3) biomonitoring, (4) quantification, (5) behaviour of DNA in the environment and (6) reference database development. We pinpoint the success of eDNA, yet also discuss shortfalls and expectations not met, highlighting areas of research priority and identify the unexpected developments. In parallel, our retrospective couples a screening of the peer‐reviewed literature with a survey of eDNA users including academics, end‐users and commercial providers, in which we address the priority areas to focus research efforts to advance the field of eDNA. With the rapid and ever‐increasing pace of new technical advances, the future of eDNA looks bright, yet successful applications and best practices must become more interdisciplinary to reach its full potential. Our retrospect gives the tools and expectations towards concretely moving the field forward.
biochemistry & molecular biology,ecology,evolutionary biology
-
Long-Read-Resolved, Ecosystem-Wide Exploration of Nucleotide and Structural Microdiversity of Lake Bacterioplankton Genomes
Yusuke Okazaki,Shin-Ichi Nakano,Atsushi Toyoda,Hideyuki Tamaki
DOI: https://doi.org/10.1128/msystems.00433-22
2022-08-30
mSystems
Abstract:Reconstruction of metagenome-assembled genomes (MAGs) has become a fundamental approach in microbial ecology. However, a MAG is hardly complete and overlooks genomic microdiversity because metagenomic assembly fails to resolve microvariants among closely related genotypes. Aiming at understanding the universal factors that drive or constrain prokaryotic genome diversification, we performed an ecosystem-wide high-resolution metagenomic exploration of microdiversity by combining spatiotemporal (2 depths × 12 months) sampling from a pelagic freshwater system, high-quality MAG reconstruction using long- and short-read metagenomic sequences, and profiling of single nucleotide variants (SNVs) and structural variants (SVs) through mapping of short and long reads to the MAGs, respectively. We reconstructed 575 MAGs, including 29 circular assemblies, providing high-quality reference genomes of freshwater bacterioplankton. Read mapping against these MAGs identified 100 to 101,781 SNVs/Mb and 0 to 305 insertions, 0 to 467 deletions, 0 to 41 duplications, and 0 to 6 inversions for each MAG. Nonsynonymous SNVs were accumulated in genes potentially involved in cell surface structural modification to evade phage recognition. Most (80.2%) deletions overlapped with a gene coding region, and genes of prokaryotic defense systems were most frequently (>8% of the genes) overlapped with a deletion. Some such deletions exhibited a monthly shift in their allele frequency, suggesting a rapid turnover of genotypes in response to phage predation. MAGs with extremely low microdiversity were either rare or opportunistic bloomers, suggesting that population persistency is key to their genomic diversification. The results concluded that prokaryotic genomic diversification is driven primarily by viral load and constrained by a population bottleneck. IMPORTANCE Identifying intraspecies genomic diversity (microdiversity) is crucial to understanding microbial ecology and evolution. However, microdiversity among environmental assemblages is not well investigated, because most microbes are difficult to culture. In this study, we performed cultivation-independent exploration of bacterial genomic microdiversity in a lake ecosystem using a combination of short- and long-read metagenomic analyses. The results revealed the broad spectrum of genomic microdiversity among the diverse bacterial species in the ecosystem, which has been overlooked by conventional approaches. Our ecosystem-wide exploration further allowed comparative analysis among the genomes and genes and revealed factors behind microbial genomic diversification, namely, that diversification is driven primarily by resistance against viral infection and constrained by the population size.
-
Third-Generation Sequencing: The Spearhead towards the Radical Transformation of Modern Genomics
Konstantina Athanasopoulou,Michaela A Boti,Panagiotis G Adamopoulos,Paraskevi C Skourou,Andreas Scorilas
DOI: https://doi.org/10.3390/life12010030
2021-12-26
Abstract:Although next-generation sequencing (NGS) technology revolutionized sequencing, offering a tremendous sequencing capacity with groundbreaking depth and accuracy, it continues to demonstrate serious limitations. In the early 2010s, the introduction of a novel set of sequencing methodologies, presented by two platforms, Pacific Biosciences (PacBio) and Oxford Nanopore Sequencing (ONT), gave birth to third-generation sequencing (TGS). The innovative long-read technologies turn genome sequencing into an ease-of-handle procedure by greatly reducing the average time of library construction workflows and simplifying the process of de novo genome assembly due to the generation of long reads. Long sequencing reads produced by both TGS methodologies have already facilitated the decipherment of transcriptional profiling since they enable the identification of full-length transcripts without the need for assembly or the use of sophisticated bioinformatics tools. Long-read technologies have also provided new insights into the field of epitranscriptomics, by allowing the direct detection of RNA modifications on native RNA molecules. This review highlights the advantageous features of the newly introduced TGS technologies, discusses their limitations and provides an in-depth comparison regarding their scientific background and available protocols as well as their potential utility in research and clinical applications.
-
The ecology of environmental DNA and implications for conservation genetics
Matthew A. Barnes,Cameron R. Turner
DOI: https://doi.org/10.1007/s10592-015-0775-4
2015-09-08
Conservation Genetics
Abstract:Environmental DNA (eDNA) refers to the genetic material that can be extracted from bulk environmental samples such as soil, water, and even air. The rapidly expanding study of eDNA has generated unprecedented ability to detect species and conduct genetic analyses for conservation, management, and research, particularly in scenarios where collection of whole organisms is impractical or impossible. While the number of studies demonstrating successful eDNA detection has increased rapidly in recent years, less research has explored the “ecology” of eDNA—myriad interactions between extraorganismal genetic material and its environment—and its influence on eDNA detection, quantification, analysis, and application to conservation and research. Here, we outline a framework for understanding the ecology of eDNA, including the origin, state, transport, and fate of extraorganismal genetic material. Using this framework, we review and synthesize the findings of eDNA studies from diverse environments, taxa, and fields of study to highlight important concepts and knowledge gaps in eDNA study and application. Additionally, we identify frontiers of conservation-focused eDNA application where we see the most potential for growth, including the use of eDNA for estimating population size, population genetic and genomic analyses via eDNA, inclusion of other indicator biomolecules such as environmental RNA or proteins, automated sample collection and analysis, and consideration of an expanded array of creative environmental samples. We discuss how a more complete understanding of the ecology of eDNA is integral to advancing these frontiers and maximizing the potential of future eDNA applications in conservation and research.
biodiversity conservation,genetics & heredity
-
A phylogenetic method linking nucleotide substitution rates to rates of continuous trait evolution
Patrick Gemmell,Timothy B. Sackton,Scott V. Edwards,Jun S. Liu
DOI: https://doi.org/10.1371/journal.pcbi.1011995
2024-04-26
PLoS Computational Biology
Abstract:Genomes contain conserved non-coding sequences that perform important biological functions, such as gene regulation. We present a phylogenetic method, PhyloAcc-C, that associates nucleotide substitution rates with changes in a continuous trait of interest. The method takes as input a multiple sequence alignment of conserved elements, continuous trait data observed in extant species, and a background phylogeny and substitution process. Gibbs sampling is used to assign rate categories (background, conserved, accelerated) to lineages and explore whether the assigned rate categories are associated with increases or decreases in the rate of trait evolution. We test our method using simulations and then illustrate its application using mammalian body size and lifespan data previously analyzed with respect to protein coding genes. Like other studies, we find processes such as tumor suppression, telomere maintenance, and p53 regulation to be related to changes in longevity and body size. In addition, we also find that skeletal genes, and developmental processes, such as sprouting angiogenesis, are relevant. Biologists hope to use data from diverse species to identify the genetic basis of continuous traits such as lifespan or beak shape. To do so, they need methodologies that relate genotypic and phenotypic evolution, while taking account of the relationship between species. The practice of integrating data from many species in this systematic way is relatively new, and existing approaches to the problem are often ad hoc, focus on protein coding genes, or involve discretizing continuous measurements. We avoid these limitations and develop a statistical model and software package that can be used to rapidly analyze alignments with respect to a continuous trait. Our method is illustrated by describing 136,859 conserved non-coding elements from 61 mammalian species with respect to the trait 'long-lived and large-bodied'. We report on the loci highlighted by our model and describe how our results compare to recent studies taking other methodological approaches. We suggest approaches like ours are an important step towards realizing the potential of data collected from across the animal kingdom, whether the aim is to increase our understanding of natural history or to better understand human biology.
biochemical research methods,mathematical & computational biology