Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing
Li Tai Fang,Bin Zhu,Yongmei Zhao,Wanqiu Chen,Zhaowei Yang,Liz Kerrigan,Kurt Langenbach,Maryellen de Mars,Charles Lu,Kenneth Idler,Howard Jacob,Yuanting Zheng,Luyao Ren,Ying Yu,Erich Jaeger,Gary P. Schroth,Ogan D. Abaan,Keyur Talsania,Justin Lack,Tsai-Wei Shen,Zhong Chen,Seta Stanbouly,Bao Tran,Jyoti Shetty,Yuliya Kriga,Daoud Meerzaman,Cu Nguyen,Virginie Petitjean,Marc Sultan,Margaret Cam,Monika Mehta,Tiffany Hung,Eric Peters,Rasika Kalamegham,Sayed Mohammad Ebrahim Sahraeian,Marghoob Mohiyuddin,Yunfei Guo,Lijing Yao,Lei Song,Hugo Y. K. Lam,Jiri Drabek,Petr Vojta,Roberta Maestro,Daniela Gasparotto,Sulev Kõks,Ene Reimann,Andreas Scherer,Jessica Nordlund,Ulrika Liljedahl,Roderick V. Jensen,Mehdi Pirooznia,Zhipan Li,Chunlin Xiao,Stephen T. Sherry,Rebecca Kusko,Malcolm Moos,Eric Donaldson,Zivana Tezak,Baitang Ning,Weida Tong,Jing Li,Penelope Duerken-Hughes,Claudia Catalanotti,Shamoni Maheshwari,Joe Shuga,Winnie S. Liang,Jonathan Keats,Jonathan Adkins,Erica Tassone,Victoria Zismann,Timothy McDaniel,Jeffrey Trent,Jonathan Foox,Daniel Butler,Christopher E. Mason,Huixiao Hong,Leming Shi,Charles Wang,Wenming Xiao,Meredith Ashby,Ozan Aygun,Xiaopeng Bian,Thomas M. Blomquist,Pierre Bushel,Fabien Campagne,Qingrong Chen,Tao Chen,Xin Chen,Yun-Ching Chen,Han-Yu Chuang,Youping Deng,Ben Ernest,Don Freed,Paul Giresi,Ping Gong,Ana Granat,Meijian Guan,Yan Guo,Christos Hatzis,Susan Hester,Jennifer A. Hipp,Parthav Jailwala,Wendell Jones,Bindu Kanakamedala,Samir Lababidi,Eunice Lee,Jian-Liang Li,You Li,Sharon Liang,Xuelu Liu,Tim McDaniel,Timothy Mercer,Urvashi Mehra,Corey Miles,Chris Miller,Ali Moshrefi,Aparna Natarajan,Jai Pandey,Brian N. Papas,Anand Pathak,Maurizio Polano,Arati Raziuddin,Wolfgang Resch,Fayaz Seifuddin,Steve T. Sherry,Tieliu Shi,Louis M. Staudt,Jeff Trent,Tiffany Truong,Cristobal Juan Vera,Ashley Walton,Jing Wang,Jingya Wang,Mingyi Wang,James C. Willey,Leihong Wu,Xiaojian Xu,Chunhua Yan,Gokhan Yavas,Chaoyang Zhang,
DOI: https://doi.org/10.1038/s41587-021-00993-6
IF: 46.9
2021-09-01
Nature Biotechnology
Abstract:The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor–normal genomic DNA (gDNA) samples derived from a breast cancer cell line—which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations—and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor–normal' analyses.
biotechnology & applied microbiology