Building a Sequence Map of the Pig Pan-Genome from Multiple De Novo Assemblies and Hi-C Data

Xiaomeng Tian,Ran Li,Weiwei Fu,Yan Li,Xihong Wang,Ming Li,Duo Du,Qianzi Tang,Yudong Cai,Yiming Long,Yue Zhao,Mingzhou Li,Yu Jiang
DOI: https://doi.org/10.1007/s11427-019-9551-7
2019-01-01
Science China Life Sciences
Abstract:Pigs were domesticated independently in the Near East and China, indicating that a single reference genome from one individual is unable to represent the full spectrum of divergent sequences in pigs worldwide. Therefore, 12 de novo pig assemblies from Eurasia were compared in this study to identify the missing sequences from the reference genome. As a result, 72.5 Mb of non-redundant sequences (∼3% of the genome) were found to be absent from the reference genome (Sscrofa11.1) and were defined as pan-sequences. Of the pan-sequences, 9.0 Mb were dominant in Chinese pigs, in contrast with their low frequency in European pigs. One sequence dominant in Chinese pigs contained the complete genic region of the tazarotene-induced gene 3 ( TIG3 ) gene which is involved in fatty acid metabolism. Using flanking sequences and Hi-C based methods, 27.7% of the sequences could be anchored to the reference genome. The supplementation of these sequences could contribute to the accurate interpretation of the 3D chromatin structure. A web-based pan-genome database was further provided to serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.
What problem does this paper attempt to address?