Regbase: Whole Genome Base-Wise Aggregation and Functional Prediction for Human Non-Coding Regulatory Variants
Shijie Zhang,Yukun He,Huanhuan Liu,Haoyu Zhai,Dandan Huang,Xianfu Yi,Xiaobao Dong,Zhao Wang,Ke Zhao,Yao Zhou,Jianhua Wang,Hongcheng Yao,Hang Xu,Zhenglu Yang,Pak Chung Sham,Kexin Chen,Mulin Jun Li
DOI: https://doi.org/10.1093/nar/gkz774
IF: 14.9
2019-01-01
Nucleic Acids Research
Abstract:Predicting the functional or pathogenic regulatory variants in the human non-coding genome facilitates the interpretation of disease causation. While numerous prediction methods are available, their performance is inconsistent or restricted to specific tasks, which raises the demand of developing comprehensive integration for those methods. Here, we compile whole genome base-wise aggregations, regBase, that incorporate largest prediction scores. Building on different assumptions of causality, we train three composite models to score functional, pathogenic and cancer driver non-coding regulatory variants respectively. We demonstrate the superior and stable performance of our models using independent benchmarks and show great success to fine-map causal regulatory variants on specific locus or at base-wise resolution. We believe that regBase database together with three composite models will be useful in different areas of human genetic studies, such as annotation-based casual variant fine-mapping, pathogenic variant discovery as well as cancer driver mutation identification. regBase is freely available at https://github.com/mulinlab/regBase.