Acute amitriptyline in a rat model of neuropathic pain: differential symptom and route effects

M. Esser,J. Sawynok

DOI: https://doi.org/10.1016/S0304-3959(98)00261-9

IF: 7.926

1999-04-01

Pain

Abstract:

What problem does this paper attempt to address?

SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing

Devam Mondal,Atharva Inamdar

2024-07-03

Abstract:RNA sequencing techniques, like bulk RNA-seq and Single Cell (sc) RNA-seq, are critical tools for the biologist looking to analyze the genetic activity/transcriptome of a tissue or cell during an experimental procedure. Platforms like Illumina's next-generation sequencing (NGS) are used to produce the raw data for this experimental procedure. This raw FASTQ data must then be prepared via a complex series of data manipulations by bioinformaticians. This process currently takes place on an unwieldy textual user interface like a terminal/command line that requires the user to install and import multiple program packages, preventing the untrained biologist from initiating data analysis. Open-source platforms like Galaxy have produced a more user-friendly pipeline, yet the visual interface remains cluttered and highly technical, remaining uninviting for the natural scientist. To address this, SeqMate is a user-friendly tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis (differential expression, trajectory analysis, etc). Furthermore, by utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes with sources cited from known repositories like PubMed, PDB, and Uniprot.

Genomics,Artificial Intelligence,Machine Learning
ExpressAnalyst: A unified platform for RNA-sequencing analysis in non-model species

Peng Liu,Jessica Ewald,Zhiqiang Pang,Elena Legrand,Yeon Seon Jeon,Jonathan Sangiovanni,Orcun Hacariz,Guangyan Zhou,Jessica A. Head,Niladri Basu,Jianguo Xia

DOI: https://doi.org/10.1038/s41467-023-38785-y

IF: 16.6

2023-05-24

Nature Communications

Abstract:Abstract The increasing application of RNA sequencing to study non-model species demands easy-to-use and efficient bioinformatics tools to help researchers quickly uncover biological and functional insights. We developed ExpressAnalyst ( www.expressanalyst.ca ), a web-based platform for processing, analyzing, and interpreting RNA-sequencing data from any eukaryotic species. ExpressAnalyst contains a series of modules that cover from processing and annotation of FASTQ files to statistical and functional analysis of count tables or gene lists. All modules are integrated with EcoOmicsDB, an ortholog database that enables comprehensive analysis for species without a reference transcriptome. By coupling ultra-fast read mapping algorithms with high-resolution ortholog databases through a user-friendly web interface, ExpressAnalyst allows researchers to obtain global expression profiles and gene-level insights from raw RNA-sequencing reads within 24 h. Here, we present ExpressAnalyst and demonstrate its utility with a case study of RNA-sequencing data from multiple non-model salamander species, including two that do not have a reference transcriptome.

multidisciplinary sciences
A snakemake toolkit for the batch assembly, annotation, and phylogenetic analysis of mitochondrial genomes and ribosomal genes from genome skims of museum collections

Oliver White,Andie Hall,Ben W. Price,Suzanne T. Williams,Matt Clark

DOI: https://doi.org/10.1101/2023.08.11.552985

2024-04-03

Abstract:Low coverage “genome-skims” are often used to assemble organelle genomes and ribosomal gene sequences for cost effective phylogenetic and barcoding studies. Natural history collections hold invaluable biological information, yet poor preservation resulting in degraded DNA often hinders PCR based analyses. However, with improvements to molecular techniques and sequencing technology, it is possible to use methods developed for working with ancient DNA to generate libraries and sequence the short fragments typical of degraded DNA to generate genome skims from museum collections. Here we introduce a snakemake toolkit comprised of three pipelines , and , designed to unlock the genomic potential of historical museum specimens using genome skimming. Specifically, and perform the batch assembly, annotation and phylogenetic analysis of mitochondrial genomes and nuclear ribosomal genes, respectively, from low-coverage genome skims. The third pipeline takes a set of gene alignments and performs phylogenetic analysis of individual genes, partitioned analysis of concatenated alignments and a phylogenetic analysis based on gene trees. We benchmark our pipelines with simulated data, followed by testing with a novel genome skimming dataset from both recent and historical solariellid gastropod samples. We show that the toolkit can recover mitochondrial and ribosomal genes from poorly preserved museum specimens of the gastropod family Solariellidae. In addition, the phylogenetic analysis of multiple gene sequences is consistent with our current understanding of taxonomic relationships. The generation of bioinformatic pipelines that facilitate processing large quantities of sequence data from the vast repository of specimens held in natural history museum collections will greatly aid species discovery and exploration of biodiversity over time, ultimately aiding conservation efforts in the face of a changing planet.

Bioinformatics
An educational guide for nanopore sequencing in the classroom

Alex N. Salazar,Franklin L. Nobrega,Christine Anyansi,Cristian Aparicio-Maldonado,Ana Rita Costa,Anna C. Haagsma,Anwar Hiralal,Ahmed Mahfouz,Rebecca E. McKenzie,Teunke van Rossum,Stan J. J. Brouns,Thomas Abeel

DOI: https://doi.org/10.1371/journal.pcbi.1007314

2020-01-23

PLoS Computational Biology

Abstract:The last decade has witnessed a remarkable increase in our ability to measure genetic information. Advancements of sequencing technologies are challenging the existing methods of data storage and analysis. While methods to cope with the data deluge are progressing, many biologists have lagged behind due to the fast pace of computational advancements and tools available to address their scientific questions. Future generations of biologists must be more computationally aware and capable. This means they should be trained to give them the computational skills to keep pace with technological developments. Here, we propose a model that bridges experimental and bioinformatics concepts using the Oxford Nanopore Technologies (ONT) sequencing platform. We provide both a guide to begin to empower the new generation of educators, scientists, and students in performing long-read assembly of bacterial and bacteriophage genomes and a standalone virtual machine containing all the required software and learning materials for the course.Genomes contain all the information required for an organism to function. Understanding the genome sequence is often the key to answer important biological questions. For example, the sequences of human genomes are used for diagnosis of genetic disorders or for the development of personalized treatments, while the sequences of microbes may inform about their mechanisms of infection and guide the development of novel drugs. Today, our capacity to generate genome sequencing data is tremendous. However, our capacity to process this information is insufficient. This is partially due to limitations of current methods for data analysis but is mostly caused by lack of training for most biologists to leverage high-throughput sequencing data and use their full potential. It is urgent that we train the new generations of biologists to become computationally aware and able to keep pace with technological developments in the field. In this manuscript, we illustrate our efforts in adopting an integrated teaching model that bridges experimental and bioinformatics works. Our course integrates data generation in the lab with bioinformatics work to illustrate the interlinking of lab practices and downstream effects. In our demonstration course, we used nanopore sequencing to train nanobiology students, but the model is easily customizable to suit students of different educational backgrounds or alternative technologies. The tools we provide help not only science educators but also biologists to address many relevant questions in biology.

biochemical research methods,mathematical & computational biology
Disentangling Cobionts and Contamination in Long-Read Genomic Data using Sequence Composition

Claudia C Weber

DOI: https://doi.org/10.1101/2024.05.30.596622

2024-06-03

Abstract:The recent acceleration in genome sequencing targeting previously unexplored parts of the tree of life presents computational challenges. Samples collected from the wild often contain sequences from several organisms, including the target, its cobionts, and contaminants. Effective methods are therefore needed to separate sequences. Though advances in sequencing technology make this task easier, it remains difficult to taxonomically assign sequences from eukaryotic taxa that are not well-represented in databases. Therefore, reference-based methods alone are insufficient. Here, I examine how we can take advantage of differences in sequence composition between organisms to identify symbionts, parasites and contaminants in samples, with minimal reliance on reference data. To this end, I explore data from the Darwin Tree of Life project, including hundreds of high-quality HiFi read sets from insects. Visualising two-dimensional representations of read tetranucleotide composition learned by a Variational Autoencoder can reveal distinct components of a sample. Annotating the embeddings with additional information, such as coding density, estimated coverage, or taxonomic labels allows rapid assessment of the contents of a dataset. The approach scales to millions of sequences, making it possible to explore unassembled read sets, even for large genomes. Combined with interactive visualisation tools, it allows a large fraction of cobionts reported by reference-based screening to be identified. Crucially, it also facilitates retrieving genomes for which suitable reference data are absent.

Bioinformatics
Comprehensive RNA-Seq Analysis Pipeline for Non-Model Organisms and Its Application in Schmidtea mediterranea .

Yanzhi Wang,Sijun Li,Baoting Nong,Weiping Zhou,Shuhua Xu,Zhou Songyang,Yuanyan Xiong

DOI: https://doi.org/10.3390/genes14050989

IF: 4.141

2023-01-01

Genes

Abstract:RNA sequencing (RNA-seq) is a high-throughput technology that provides in-depth information on transcriptome. The advancement and dropping costs of RNA sequencing, accompanied by more available reference genomes for different species, make transcriptome analysis in non-model organisms possible. Current obstacles in analyzing RNA-seq data include a lack of functional annotation, which may complicate the process of linking genes to corresponding functions. Here, we provide a one-stop RNA-seq analysis pipeline, PipeOne-NM, for transcriptome functional annotation, non-coding RNA identification, and transcripts alternative splicing analysis of non-model organisms, intended for use with Illumina platform-based RNA-seq data. We performed PipeOne-NM on 237 RNA-seq runs and assembled a transcriptome with 84,827 sequences from 49,320 genes, identifying 64,582 mRNA from 35,485 genes, 20,217 lncRNA from 17,084 genes, and 3481 circRNAs from 1103 genes. In addition, we performed a co-expression analysis of lncRNA and mRNA and identified that 1319 lncRNA co-express with at least one mRNA. Further analysis of samples from sexual and asexual strains revealed the role of sexual reproduction in gene expression profiles. Samples from different parts of asexual revealed that differential expression profiles of different body parts correlated with the function of conduction of nerve impulses. In conclusion, PipeOne-NM has the potential to provide comprehensive transcriptome information for non-model organisms on a single platform.
Transforming Genomes Using MOD Files with Applications.

Shunping Huang,Chia-Yu Kao,Leonard McMillan,Wei Wang

DOI: https://doi.org/10.1145/2506583.2506643

2013-01-01

Abstract:Next generation sequencing techniques have enabled new methods of DNA and RNA quantification. Many of these methods require a step of aligning short reads to some reference genome. If the target organism differs significantly from this reference, alignment errors can lead to significant errors in downstream analysis. Various attempts have been tried to integrate known genetic variants into the reference genome so as to construct sample-specific genomes to improve read alignments. However, many hurdles in generating and annotating such genomes remain unsolved. In this paper, we propose a general framework for mapping back and forth between genomes. It employs a new format, MOD, to represent known variants between genomes, and a set of tools that facilitate genome manipulation and mapping. We demonstrate the utility of this framework using three inbred mouse strains. We built pseudogenomes from the mm9 mouse reference genome for three highly divergent mouse strains based on MOD files and used them to map the gene annotations to these new genomes. We observe that a large fraction of genes have their positions or ranges altered. Finally, using RNA-seq and DNA-seq short reads from these strains, we demonstrate that mapping to the new genomes yields a better alignment result than mapping to the standard reference. The MOD files for the 17 mouse strains sequenced in the Wellcome Trust Sanger Institute's Mouse Genomes Project can be found at http://www.csbio.unc.edu/CCstatus/index.py?run=Pseudo The auxiliary tools (i.e. MODtools and Lapels), written in Python, are available at http://code.google.com/p/modtools/ and http://code.google.com/p/lapels/.
A Snakemake Toolkit for the Batch Assembly, Annotation and Phylogenetic Analysis of Mitochondrial Genomes and Ribosomal Genes From Genome Skims of Museum Collections

Oliver W. White,Andie Hall,Ben W. Price,Suzanne T. Williams,Matthew D. Clark

DOI: https://doi.org/10.1111/1755-0998.14036

IF: 7.7

2024-10-30

Molecular Ecology Resources

Abstract:Low coverage 'genome‐skims' are often used to assemble organelle genomes and ribosomal gene sequences for cost‐effective phylogenetic and barcoding studies. Natural history collections hold invaluable biological information, yet poor preservation resulting in degraded DNA often hinders polymerase chain reaction‐based analyses. However, it is possible to generate libraries and sequence the short fragments typical of degraded DNA to generate genome‐skims from museum collections. Here we introduce a snakemake toolkit comprised of three pipelines skim2mito, skim2rrna and gene2phylo, designed to unlock the genomic potential of historical museum specimens using genome skimming. Specifically, skim2mito and skim2rrna perform the batch assembly, annotation and phylogenetic analysis of mitochondrial genomes and nuclear ribosomal genes, respectively, from low‐coverage genome skims. The third pipeline gene2phylo takes a set of gene alignments and performs phylogenetic analysis of individual genes, partitioned analysis of concatenated alignments and a phylogenetic analysis based on gene trees. We benchmark our pipelines with simulated data, followed by testing with a novel genome skimming dataset from both recent and historical solariellid gastropod samples. We show that the toolkit can recover mitochondrial and ribosomal genes from poorly preserved museum specimens of the gastropod family Solariellidae, and the phylogenetic analysis is consistent with our current understanding of taxonomic relationships. The generation of bioinformatic pipelines that facilitate processing large quantities of sequence data from the vast repository of specimens held in natural history museum collections will greatly aid species discovery and exploration of biodiversity over time, ultimately aiding conservation efforts in the face of a changing planet.

biochemistry & molecular biology,ecology,evolutionary biology
Reusable tutorials for using cloud-based computing environments for the analysis of bacterial gene expression data from bulk RNA sequencing

Steven Allers,Kyle A O'Connell,Thad Carlson,David Belardo,Benjamin L King

DOI: https://doi.org/10.1093/bib/bbae301

IF: 9.5

2024-07-13

Briefings in Bioinformatics

Abstract:This manuscript describes the development of a resource module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on RNA sequencing (RNAseq) data analysis in an interactive format that uses appropriate cloud resources for data access and analyses. Biomedical research is increasingly data-driven, and dependent upon data management and analysis methods that facilitate rigorous, robust, and reproducible research. Cloud-based computing resources provide opportunities to broaden the application of bioinformatics and data science in research. Two obstacles for researchers, particularly those at small institutions, are: (i) access to bioinformatics analysis environments tailored to their research; and (ii) training in how to use Cloud-based computing resources. We developed five reusable tutorials for bulk RNAseq data analysis to address these obstacles. Using Jupyter notebooks run on the Google Cloud Platform, the tutorials guide the user through a workflow featuring an RNAseq dataset from a study of prophage altered drug resistance in Mycobacterium chelonae. The first tutorial uses a subset of the data so users can learn analysis steps rapidly, and the second uses the entire dataset. Next, a tutorial demonstrates how to analyze the read count data to generate lists of differentially expressed genes using R/DESeq2. Additional tutorials generate read counts using the Snakemake workflow manager and Nextflow with Google Batch. All tutorials are open-source and can be used as templates for other analysis.

biochemical research methods,mathematical & computational biology
Single-cell transcriptomics for the 99.9% of species without reference genomes

Olga Borisovna Botvinnik,Venkata Naga Pranathi Vemuri,N. Tessa Pierce,Phoenix Aja Logan,Saba Nafees,Lekha Karanam,Kyle Joseph Travaglini,Camille Sophie Ezran,Lili Ren,Yanyi Juang,Jianwei Wang,Jianbin Wang,C. Titus Brown

DOI: https://doi.org/10.1101/2021.07.09.450799

2021-01-01

Abstract:Single-cell RNA-seq (scRNA-seq) is a powerful tool for cell type identification but is not readily applicable to organisms without well-annotated reference genomes. Of the approximately 10 million animal species predicted to exist on Earth, >99.9% do not have any submitted genome assembly. To enable scRNA-seq for the vast majority of animals on the planet, here we introduce the concept of “ k -mer homology,” combining biochemical synonyms in degenerate protein alphabets with uniform data subsampling via MinHash into a pipeline called Kmermaid. Implementing this pipeline enables direct detection of similar cell types across species from transcriptomic data without the need for a reference genome. Underpinning Kmermaid is the tool Orpheum, a memory-efficient method for extracting high-confidence protein-coding sequences from RNA-seq data. After validating Kmermaid using datasets from human and mouse lung, we applied Kmermaid to the Chinese horseshoe bat ( Rhinolophus sinicus ), where we propagated cellular compartment labels at high fidelity. Our pipeline provides a high-throughput tool that enables analyses of transcriptomic data across divergent species’ transcriptomes in a genome- and gene annotation-agnostic manner. Thus, the combination of Kmermaid and Orpheum identifies cell type-specific sequences that may be missing from genome annotations and empowers molecular cellular phenotyping for novel model organisms and species. ### Competing Interest Statement The authors have declared no competing interest.
A novel assembly pipeline and functional annotations for targeted sequencing: A case study on the globally threatened Margaritiferidae (Bivalvia: Unionida)

André Gomes-Dos-Santos,Elsa Froufe,John M Pfeiffer,Nathan A Johnson,Chase H Smith,André M Machado,L Filipe C Castro,Van Tu Do,Akimasa Hattori,Nicole Garrison,Nathan V Whelan,Ivan N Bolotov,Ilya V Vikhrev,Alexander V Kondakov,Mohamed Ghamizi,Vincent Prié,Arthur E Bogan,Manuel Lopes Lima

DOI: https://doi.org/10.1111/1755-0998.13802

Abstract:The proliferation of genomic sequencing approaches has significantly impacted the field of phylogenetics. Target capture approaches provide a cost-effective, fast and easily applied strategy for phylogenetic inference of non-model organisms. However, several existing target capture processing pipelines are incapable of incorporating whole genome sequencing (WGS). Here, we develop a new pipeline for capture and de novo assembly of the targeted regions using whole genome re-sequencing reads. This new pipeline captured targeted loci accurately, and given its unbiased nature, can be used with any target capture probe set. Moreover, due to its low computational demand, this new pipeline may be ideal for users with limited resources and when high-coverage sequencing outputs are required. We demonstrate the utility of our approach by incorporating WGS data into the first comprehensive phylogenomic reconstruction of the freshwater mussel family Margaritiferidae. We also provide a catalogue of well-curated functional annotations of these previously uncharacterized freshwater mussel-specific target regions, representing a complementary tool for scrutinizing phylogenetic inferences while expanding future applications of the probe set.
Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples

Tim H. H. Coorens,Michael Spencer Chapman,Nicholas Williams,Inigo Martincorena,Michael R. Stratton,Jyoti Nangalia,Peter J. Campbell

DOI: https://doi.org/10.1038/s41596-024-00962-8

IF: 14.8

2024-02-25

Nature Protocols

Abstract:Phylogenetic trees are a powerful means to display the evolutionary history of species, pathogens and, more recently, individual cells of the human body. Whole-genome sequencing of laser capture microdissections or expanded stem cells has allowed the discovery of somatic mutations in clones, which can be used as natural barcodes to reconstruct the developmental history of individual cells. Here we describe Sequoia, our pipeline to reconstruct lineage trees from clones of normal cells. Candidate somatic mutations are called against the human reference genome and filtered to exclude germline mutations and artifactual variants. These filtered somatic mutations form the basis for phylogeny reconstruction using a maximum parsimony framework. Lastly, we use a maximum likelihood framework to explicitly map mutations to branches in the phylogenetic tree. The resulting phylogenies can then serve as a basis for many subsequent analyses, including investigating embryonic development, tissue dynamics in health and disease, and mutational signatures. Sequoia can be readily applied to any clonal somatic mutation dataset, including single-cell DNA sequencing datasets, using the commands and scripts provided. Moreover, Sequoia is highly flexible and can be easily customized. Typically, the runtime of the core script ranges from minutes to an hour for datasets with a moderate number (50,000–150,000) of variants. Competent bioinformatic skills, including in-depth knowledge of the R programming language, are required. A high-performance computing cluster (one that is capable of running mutation-calling algorithms and other aspects of the analysis at scale) is also required, especially if handling large datasets.

biochemical research methods
Cactus: a user-friendly and reproducible ATAC-Seq and mRNA-Seq analysis pipeline for data preprocessing, differential analysis, and enrichment analysis

Jérôme Salignon,Lluís Millan-Ariño,Maxime Garcia,Christian G. Riedel

DOI: https://doi.org/10.1101/2023.05.11.540110

2024-05-06

Abstract:The ever decreasing cost of Next-Generation Sequencing coupled with the emergence of efficient and reproducible analysis pipelines has rendered genomic methods more accessible. However, downstream analyses are basic or missing in most workflows, creating a significant barrier for non-bioinformaticians. To help close this gap, we developed Cactus, an end-to-end pipeline for analyzing ATAC-Seq and mRNA-Seq data, either separately or jointly. Its Nextflow-, container-, and virtual environment-based architecture ensures efficient and reproducible analyses. Cactus preprocesses raw reads, conducts differential analyses between conditions, and performs enrichment analyses in various databases, including DNA-binding motifs, ChIP-Seq binding sites, chromatin states, and ontologies. We demonstrate the utility of Cactus in a multi-modal and multi-species case study as well as by showcasing its unique capabilities as compared to other ATAC-Seq pipelines. In conclusion, Cactus can assist researchers in gaining comprehensive insights from chromatin accessibility and gene expression data in a quick, user-friendly, and reproducible manner.

Bioinformatics
Sequence modeling and design from molecular to genome scale with Evo

Eric Nguyen,Michael Poli,Matthew G. Durrant,Armin W. Thomas,Brian Kang,Jeremy Sullivan,Madelena Y. Ng,Ashley Lewis,Aman Patel,Aaron Lou,Stefano Ermon,Stephen A. Baccus,Tina Hernandez-Boussard,Christopher Ré,Patrick D. Hsu,Brian L. Hie

DOI: https://doi.org/10.1101/2024.02.27.582234

2024-03-06

Abstract:The genome is a sequence that completely encodes the DNA, RNA, and proteins that orchestrate the function of a whole organism. Advances in machine learning combined with massive datasets of whole genomes could enable a biological foundation model that accelerates the mechanistic understanding and generative design of complex molecular interactions. We report Evo, a genomic foundation model that enables prediction and generation tasks from the molecular to genome scale. Using an architecture based on advances in deep signal processing, we scale Evo to 7 billion parameters with a context length of 131 kilobases (kb) at single-nucleotide, byte resolution. Trained on 2.7M prokaryotic and phage genomes, Evo can generalize across the three fundamental modalities of the central dogma of molecular biology to perform zero-shot function prediction that is competitive with, or outperforms, leading domain-specific language models. Evo also excels at multi-element generation tasks, which we demonstrate by generating synthetic CRISPR-Cas molecular complexes and entire transposable systems for the first time. Using information learned over whole genomes, Evo can also predict gene essentiality at nucleotide resolution and can generate coding-rich sequences up to 650 kb in length, orders of magnitude longer than previous methods. Advances in multi-modal and multiscale learning with Evo provides a promising path toward improving our understanding and control of biology across multiple levels of complexity.

Synthetic Biology
De novo genomic analyses for non-model organisms: an evaluation of methods across a multi-species data set

Sonal Singhal

DOI: https://doi.org/10.48550/arXiv.1211.1737

IF: 4.31

2012-11-08

Genomics

Abstract:High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still poses notable challenges, especially for those working with organisms without a high-quality reference genome. For every stage of analysis - from assembly to annotation to variant discovery - researchers have to distinguish technical artifacts from the biological realities of their data before they can make inference. In this work, I explore these challenges by generating a large de novo comparative transcriptomic dataset data for a clade of lizards and constructing a pipeline to analyze these data. Then, using a combination of novel metrics and an externally validated variant data set, I test the efficacy of my approach, identify areas of improvement, and propose ways to minimize these errors. I find that with careful data curation, HTS can be a powerful tool for generating genomic data for non-model organisms.
Revisiting genomes of non-model species with long reads yields new insights into their biology and evolution

Nadège Guiglielmoni,Laura I. Villegas,Joseph Kirangwa,Philipp H. Schiffer

DOI: https://doi.org/10.3389/fgene.2024.1308527

IF: 3.7

2024-02-07

Frontiers in Genetics

Abstract:High-quality genomes obtained using long-read data allow not only for a better understanding of heterozygosity levels, repeat content, and more accurate gene annotation and prediction when compared to those obtained with short-read technologies, but also allow to understand haplotype divergence. Advances in long-read sequencing technologies in the last years have made it possible to produce such high-quality assemblies for non-model organisms. This allows us to revisit genomes, which have been problematic to scaffold to chromosome-scale with previous generations of data and assembly software. Nematoda, one of the most diverse and speciose animal phyla within metazoans, remains poorly studied, and many previously assembled genomes are fragmented. Using long reads obtained with Nanopore R10.4.1 and PacBio HiFi, we generated highly contiguous assemblies of a diploid nematode of the Mermithidae family, for which no closely related genomes are available to date, as well as a collapsed assembly and a phased assembly for a triploid nematode from the Panagrolaimidae family. Both genomes had been analysed before, but the fragmented assemblies had scaffold sizes comparable to the length of long reads prior to assembly. Our new assemblies illustrate how long-read technologies allow for a much better representation of species genomes. We are now able to conduct more accurate downstream assays based on more complete gene and transposable element predictions.

genetics & heredity
Gene modelling and annotation for the Hawaiian bobtail squid, Euprymna scolopes

Thea F. Rogers,Gözde Yalçın,John Briseno,Nidhi Vijayan,Spencer V. Nyholm,Oleg Simakov

DOI: https://doi.org/10.1038/s41597-023-02903-8

2024-01-07

Scientific Data

Abstract:Coleoid cephalopods possess numerous complex, species-specific morphological and behavioural adaptations, e.g., a uniquely structured nervous system that is the largest among the invertebrates. The Hawaiian bobtail squid ( Euprymna scolopes ) is one of the most established cephalopod species. With its recent publication of the chromosomal-scale genome assembly and regulatory genomic data, it also emerges as a key model for cephalopod gene regulation and evolution. However, the latest genome assembly has been lacking a native gene model set. Our manuscript describes the generation of new long-read transcriptomic data and, made using this combined with a plethora of publicly available transcriptomic and protein sequence data, a new reference annotation for E. scolopes .

multidisciplinary sciences
Effects of a cognitive-behavioral intervention program on the health of caregivers of people with autism spectrum disorder

N. Ruiz-Robledillo,L. Moya‐Albiol

DOI: https://doi.org/10.1016/J.PSI.2015.01.001

Abstract:
RNA-based Phylogenetic Methods: Application to Mammalian Mitochondrial RNA Sequences

Cendrine Hudelot,Vivek Gowri-Shankar,Howsun Jow,Magnus Rattray,Paul G. Higgs

DOI: https://doi.org/10.48550/arXiv.q-bio/0404031

2004-04-23

Populations and Evolution

Abstract:The PHASE software package allows phylogenetic tree construction with a number of evolutionary models designed specifically for use with RNA sequences that have conserved secondary structure. Evolution in the paired regions of RNAs occurs via compensatory substitutions, hence changes on either side of a pair are correlated. Accounting for this correlation is important for phylogenetic inference because it affects the likelihood calculation. In the present study we use the complete set of tRNA and rRNA sequences from 69 complete mammalian mitochondrial genomes. The likelihood calculation uses two evolutionary models simultaneously for different parts of the sequence: a paired-site model for the paired sites and a single-site model for the unpaired sites. We use Bayesian phylogenetic methods and a Markov chain Monte Carlo algorithm is used to obtain the most probable trees and posterior probabilities of clades. The results are well resolved for almost all the important branches on the mammalian tree. They support the arrangement of mammalian orders within the four supra-ordinal clades that have been identified by studies of much larger data sets mainly comprising nuclear genes. Groups such as the hedgehogs and the murid rodents, which have been problematic in previous studies with mitochondrial proteins, appear in their expected position with the other members of their order. Our choice of genes and evolutionary model appears to be more reliable and less subject to biases caused by variation in base composition than previous studies with mitochondrial genomes.
From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data

Mohamed Mysara,Mercy Njima,Natalie Leys,Jeroen Raes,Pieter Monsieurs

DOI: https://doi.org/10.1093/gigascience/giw017

IF: 7.658

2017-02-01

GigaScience

Abstract:The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analyzing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors, and clustering of those sequences into Operational Taxonomic Units (OTUs). Individual algorithms grappling with each of those challenges have been combined into various bioinformatics pipelines, such as mothur, QIIME, LotuS, and USEARCH. Using a set of well-described bacterial mock communities, state-of-the-art pipelines for Illumina MiSeq amplicon sequencing data are benchmarked at the level of the amount of sequences retained, computational cost, error rate, and quality of the OTUs. In addition, a new pipeline called OCToPUS is introduced, which is making an optimal combination of different algorithms. Huge variability is observed between the different pipelines in respect to the monitored performance parameters, where in general the amount of retained reads is found to be inversely proportional to the quality of the reads. By contrast, OCToPUS achieves the lowest error rate, minimum number of spurious OTUs, and the closest correspondence to the existing community, while retaining the uppermost amount of reads when compared to other pipelines. The newly introduced pipeline translates Illumina MiSeq amplicon sequencing data into high-quality and reliable OTUs, with improved performance and accuracy compared to the currently existing pipelines.

Acute amitriptyline in a rat model of neuropathic pain: differential symptom and route effects

SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing

ExpressAnalyst: A unified platform for RNA-sequencing analysis in non-model species

A snakemake toolkit for the batch assembly, annotation, and phylogenetic analysis of mitochondrial genomes and ribosomal genes from genome skims of museum collections

An educational guide for nanopore sequencing in the classroom

Disentangling Cobionts and Contamination in Long-Read Genomic Data using Sequence Composition

Comprehensive RNA-Seq Analysis Pipeline for Non-Model Organisms and Its Application in Schmidtea mediterranea .

Transforming Genomes Using MOD Files with Applications.

A Snakemake Toolkit for the Batch Assembly, Annotation and Phylogenetic Analysis of Mitochondrial Genomes and Ribosomal Genes From Genome Skims of Museum Collections

Reusable tutorials for using cloud-based computing environments for the analysis of bacterial gene expression data from bulk RNA sequencing

Single-cell transcriptomics for the 99.9% of species without reference genomes

A novel assembly pipeline and functional annotations for targeted sequencing: A case study on the globally threatened Margaritiferidae (Bivalvia: Unionida)

Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples

Cactus: a user-friendly and reproducible ATAC-Seq and mRNA-Seq analysis pipeline for data preprocessing, differential analysis, and enrichment analysis

Sequence modeling and design from molecular to genome scale with Evo

De novo genomic analyses for non-model organisms: an evaluation of methods across a multi-species data set

Revisiting genomes of non-model species with long reads yields new insights into their biology and evolution

Gene modelling and annotation for the Hawaiian bobtail squid, Euprymna scolopes

Effects of a cognitive-behavioral intervention program on the health of caregivers of people with autism spectrum disorder

RNA-based Phylogenetic Methods: Application to Mammalian Mitochondrial RNA Sequences

From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data