Accurate Whole Human Genome Sequencing Using Reversible Terminator Chemistry.

David R. Bentley,Shankar Balasubramanian,Harold P. Swerdlow,Geoffrey P. Smith,John Milton,Clive G. Brown,Kevin P. Hall,Dirk J. Evers,Colin L. Barnes,Helen R. Bignell,Jonathan M. Boutell,Jason Bryant,Richard J. Carter,R. Keira Cheetham,Anthony J. Cox,Darren J. Ellis,Michael R. Flatbush,Niall A. Gormley,Sean J. Humphray,Leslie J. Irving,Mirian S. Karbelashvili,Scott M. Kirk,Heng Li,Xiaohai Liu,Klaus S. Maisinger,Lisa J. Murray,Bojan Obradovic,Tobias Ost,Michael L. Parkinson,Mark R. Pratt,Isabelle M. J. Rasolonjatovo,Mark T. Reed,Roberto Rigatti,Chiara Rodighiero,Mark T. Ross,Andrea Sabot,Subramanian V. Sankar,Aylwyn Scally,Gary P. Schroth,Mark E. Smith,Vincent P. Smith,Anastassia Spiridou,Peta E. Torrance,Svilen S. Tzonev,Eric H. Vermaas,Klaudia Walter,Xiaolin Wu,Lu Zhang,Mohammed D. Alam,Carole Anastasi,Ify C. Aniebo,David M. D. Bailey,Iain R. Bancarz,Saibal Banerjee,Selena G. Barbour,Primo A. Baybayan,Vincent A. Benoit,Kevin F. Benson,Claire Bevis,Phillip J. Black,Asha Boodhun,Joe S. Brennan,John A. Bridgham,Rob C. Brown,Andrew A. Brown,Dale H. Buermann,Abass A. Bundu,James C. Burrows,Nigel P. Carter,Nestor Castillo,Maria Chiara E. Catenazzi,Simon Chang,R. Neil Cooley,Natasha R. Crake,Olubunmi O. Dada,Konstantinos D. Diakoumakos,Belen Dominguez-Fernandez,David J. Earnshaw,Ugonna C. Egbujor,David W. Elmore,Sergey S. Etchin,Mark R. Ewan,Milan Fedurco,Louise J. Fraser,Karin V. Fuentes Fajardo,W. Scott Furey,David George,Kimberley J. Gietzen,Colin P. Goddard,George S. Golda,Philip A. Granieri,David E. Green,David L. Gustafson,Nancy F. Hansen,Kevin Harnish,Christian D. Haudenschild,Narinder I. Heyer,Matthew M. Hims,Johnny T. Ho,Adrian M. Horgan,Katya Hoschler,Steve Hurwitz,Denis V. Ivanov,Maria Q. Johnson,Terena James,T. A. Huw Jones,Gyoung-Dong Kang,Tzvetana H. Kerelska,Alan D. Kersey,Irina Khrebtukova,Alex P. Kindwall,Zoya Kingsbury,Paula I. Kokko-Gonzales,Anil Kumar,Marc A. Laurent,Cynthia T. Lawley,Sarah E. Lee,Xavier Lee,Arnold K. Liao,Jennifer A. Loch,Mitch Lok,Shujun Luo,Radhika M. Mammen,John W. Martin,Patrick G. McCauley,Paul McNitt,Parul Mehta,Keith W. Moon,Joe W. Mullens,Taksina Newington,Zemin Ning,Bee Ling Ng,Sonia M. Novo,Michael J. O’Neill,Mark A. Osborne,Andrew Osnowski,Omead Ostadan,Lambros L. Paraschos,Lea Pickering,Andrew C. Pike,Alger C. Pike,D. Chris Pinkard,Daniel P. Pliskin,Joe Podhasky,Victor J. Quijano,Come Raczy,Vicki H. Rae,Stephen R. Rawlings,Ana Chiva Rodriguez,Phyllida M. Roe,John Rogers,Maria C. Rogert Bacigalupo,Nikolai Romanov,Anthony Romieu,Rithy K. Roth,Natalie J. Rourke,Silke T. Ruediger,Eli Rusman,Raquel M. Sanches-Kuiper,Martin R. Schenker,Josefina M. Seoane,Richard J. Shaw,Mitch K. Shiver,Steven W. Short,Ning L. Sizto,Johannes P. Sluis,Melanie A. Smith,Sohna,Eric J. Spence,Kim Stevens,Neil Sutton,Lukasz Szajkowski,Carolyn L. Tregidgo,Gerardo Turcatti,Stephanie vandeVondele,Yuli Verhovsky,Selene M. Virk,Suzanne Wakelin,Gregory C. Walcott,Jingwen Wang,Graham J. Worsley,Juying Yan,Ling Yau,Mike Zuerlein,Jane Rogers,James C. Mullikin,Matthew E. Hurles,Nick J. McCooke,John S. West,Frank L. Oaks,Peter L. Lundberg,David Klenerman,Richard Durbin,Anthony J. Smith
DOI: https://doi.org/10.1038/nature07517
IF: 64.8
2008-01-01
Nature
Abstract:DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400–800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30× average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications. The power of the latest massively parallel synthetic DNA sequencing technologies is demonstrated in two major collaborations that shed light on the nature of genomic variation with ethnicity. The first describes the genomic characterization of an individual from the Yoruba ethnic group of west Africa. The second reports a personal genome of a Han Chinese, the group comprising 30% of the world's population. These new resources can now be used in conjunction with the Venter, Watson and NIH reference sequences. A separate study looked at genetic ethnicity on the continental scale, based on data from 1,387 individuals from more than 30 European countries. Overall there was little genetic variation between countries, but the differences that do exist correspond closely to the geographic map. Statistical analysis of the genome data places 50% of the individuals within 310 km of their reported origin. As well as its relevance for testing genetic ancestry, this work has implications for evaluating genome-wide association studies that link genes with diseases.
What problem does this paper attempt to address?