Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data

Zaichao Wei,Lilian Zhang,Lutao Gao,Jian Chen,Lin Peng,Linnan Yang
DOI: https://doi.org/10.1038/s41597-024-03066-w
2024-02-24
Scientific Data
Abstract:Yunling cattle is a new breed of beef cattle bred in Yunnan Province, China. It is bred by crossing the Brahman, the Murray Grey and the Yunnan Yellow cattle. Yunling cattle can adapt to the tropical and subtropical climate environment, and has good reproductive ability and growth speed under high temperature and high humidity conditions, it also has strong resistance to internal and external parasites and with good beef performance. In this study, we generated a high-quality chromosome-level genome assembly of a male Yunling cattle using a combination of short reads sequencing, PacBio HiFi sequencing and Hi-C scaffolding technologies. The genome assembly(3.09 Gb) is anchored to 31 chromosomes(29 autosomes plus one X and Y), with a contig N50 of 35.97 Mb and a scaffold N50 of 112.01 Mb. It contains 1.62 Gb of repetitive sequences and 20,660 protein-coding genes. This first construction of the Yunling cattle genome provides a valuable genetic resource that will facilitate further study of the genetic diversity of bovine species and accelerate Yunling cattle breeding efforts.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to construct a high - quality chromosome - level genome assembly of Yunling cattle. Specifically, the researchers used short - read - length sequencing, PacBio HiFi sequencing and Hi - C technology to generate a high - precision genome map of a male Yunling cattle. The following are the main objectives of this study: 1. **Construct a high - quality genome assembly**: - By combining multiple sequencing technologies (short - read - length sequencing, PacBio HiFi sequencing and Hi - C data), generate a high - quality chromosome - level genome assembly. - The genome size is approximately 3.09 Gb, containing 31 chromosomes (29 autosomes plus one X chromosome and one Y chromosome), with a contig N50 of 35.97 Mb and a scaffold N50 of 112.01 Mb. 2. **Annotate the genome**: - Identify and annotate repetitive sequences in the genome, of which 1.62 Gb are repetitive sequences, accounting for 52.26% of the total genome. - 20,660 protein - coding genes were predicted and functionally annotated. Among them, 92.8% of the genes can be functionally annotated with at least one of the five protein databases (NR, SwissProt, KOG, GO and KEGG). 3. **Evaluate the quality of the genome assembly**: - Use tools such as BUSCO and CEGMA to evaluate the integrity and accuracy of the genome assembly. The BUSCO results show that 95.78% of the conserved single - copy genes in mammals are present in the assembly, indicating a high integrity of the genome. - The accuracy of the genome assembly was verified by the alignment of short - read - length sequencing data, and 99.03% of the reads could be reliably aligned to the genome. 4. **Provide genetic resources**: - This genome assembly provides valuable genetic resources for future studies on the genetic diversity of Yunling cattle and breeding work. - It is helpful for further comparative analysis of the genomic biology of Bos species and promotes the development of breeding research. In summary, the main purpose of this paper is to construct a high - quality genome assembly of Yunling cattle through multiple sequencing technologies, and conduct detailed annotation and quality evaluation on it, so as to provide important genetic resources for subsequent research and breeding.