A high-quality chromosome-level genome assembly of the Chinese medaka Oryzias sinensis

Zhongdian Dong,Jiangman Wang,Guozhu Chen,Yusong Guo,Na Zhao,Zhongduo Wang,Bo Zhang
DOI: https://doi.org/10.1038/s41597-024-03173-8
2024-03-29
Scientific Data
Abstract:Oryzias sinensis , also known as Chinese medaka or Chinese ricefish, is a commonly used animal model for aquatic environmental assessment in the wild as well as gene function validation or toxicology research in the lab. Here, a high-quality chromosome-level genome assembly of O. sinensis was generated using single-tube long fragment read (stLFR) reads, Nanopore long-reads, and Hi-C sequencing data. The genome is 796.58 Mb, and a total of 712.17 Mb of the assembled sequences were anchored to 23 pseudo-chromosomes. A final set of 22,461 genes were annotated, with 98.67% being functionally annotated. The Benchmarking Universal Single-Copy Orthologs (BUSCO) benchmark of genome assembly and gene annotation reached 95.1% (93.3% single-copy) and 94.6% (91.7% single-copy), respectively. Furthermore, we also use ATAC-seq to uncover chromosome transposase-accessibility as well as related genome area function enrichment for Oryzias sinensis . This study offers a new improved foundation for future genomics research in Chinese medaka.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper aims to solve the problems existing in the genome assembly of Chinese rice fish (Oryzias sinensis), especially to improve the quality and integrity of genome assembly. Specifically, by using single - tube long - fragment read (stLFR) reads, Nanopore long - read - length sequencing data and Hi - C sequencing data, the researchers generated a high - quality chromosome - level genome assembly of Chinese rice fish. This improved genome assembly can better support future genomics research on Chinese rice fish, especially in the application of water environment assessment, gene function verification and toxicology research. ### Main Objectives: 1. **Generate high - quality genome assembly**: Use multiple sequencing techniques, such as stLFR, Nanopore long - read - length sequencing and Hi - C sequencing, to construct a high - precision chromosome - level genome assembly. 2. **Improve the accuracy of gene annotation**: Improve the integrity and accuracy of genome annotation by annotating repetitive elements and gene structures in detail. 3. **Evaluate the quality of genome assembly**: Use Benchmarking Universal Single - Copy Orthologs (BUSCO) to evaluate genome assembly and gene annotation to ensure high quality. 4. **Explore chromosome accessibility**: Reveal the accessibility of chromosomes in the Chinese rice fish genome and the enrichment of related functional regions through ATAC - seq technology. ### Background and Significance: As an important model organism, Chinese rice fish has a wide range of applications in water environment assessment and laboratory research. However, the existing genome assembly is of low quality and the annotation is not perfect enough. This study lays a solid foundation for future genomics research by providing a high - quality genome assembly, which helps to understand the biological characteristics and ecological adaptability of Chinese rice fish more deeply. ### Method Overview: - **Genome assembly**: Use stLFR, Nanopore long - read - length sequencing and Hi - C sequencing data for genome assembly. - **Gene annotation**: Combine homology alignment, de novo prediction and transcriptome - assisted methods for gene prediction and annotation. - **Repetitive element annotation**: Use multiple tools (such as RepeatMasker, RepeatProteinMask, RepeatModeler, etc.) to identify repetitive elements. - **Non - coding RNA annotation**: Identify non - coding RNAs such as rRNA, tRNA, miRNA and snRNA. - **Chromosome accessibility analysis**: Analyze the open regions of chromosomes through ATAC - seq technology and perform functional enrichment analysis. ### Result Highlights: - **Genome size**: The assembled genome size is 796.58 Mb, of which 712.17 Mb is anchored to 23 pseudo - chromosomes. - **Gene annotation**: A total of 22,461 genes were annotated, and 98.67% of the genes were functionally annotated. - **BUSCO evaluation**: The BUSCO evaluations of genome assembly and gene annotation reached 95.1% and 94.6% respectively. - **Chromosome accessibility**: The open regions of chromosomes were discovered through ATAC - seq and relevant functional enrichment analysis was carried out. In conclusion, this study provides a high - quality genome assembly of Chinese rice fish, providing important basic data for future genomics research.