High-quality faba bean reference transcripts generated using PacBio and Illumina RNA-seq data

Na Zhao,Enqiang Zhou,Yamei Miao,Dong Xue,Yongqiang Wang,Kaihua Wang,Chunyan Gu,Mengnan Yao,Yao Zhou,Bo Li,Xuejun Wang,Libin Wei
DOI: https://doi.org/10.1038/s41597-024-03204-4
2024-04-10
Scientific Data
Abstract:The genome of faba bean was first published in 2023. To promote future molecular breeding studies, we improved the quality of the faba genome based on high-density genetic maps and the Illumina and Pacbio RNA-seq datasets. Two high-density genetic maps were used to conduct the scaffold ordering and orientation of faba bean, culminating in an increased length (i.e., 14.28 Mbp) of chromosomes and a decrease in the number of scaffolds by 45. In gene model mining and optimisation, the PacBio and Illumina RNA-seq datasets from 37 samples allowed for the identification and correction 121,606 transcripts, and the data facilitated a prediction of 15,640 alternative splicing events, 2,148 lncRNAs, and 1,752 fusion transcripts, thus allowing for a clearer understanding of the gene structures underlying the faba genome. Moreover, a total of 38,850 new genes including 56,188 transcripts were identified compared with the reference genome. Finally, the genetic data of the reference genome was integrated and a comprehensive and complete faba bean transcriptome sequence of 103,267 transcripts derived from 54,753 uni-genes was formed.
multidisciplinary sciences
What problem does this paper attempt to address?