GENCODE: reference annotation for the human and mouse genomes in 2023
Adam Frankish,Sílvia Carbonell-Sala,Mark Diekhans,Irwin Jungreis,Jane E Loveland,Jonathan M Mudge,Cristina Sisu,James C Wright,Carme Arnan,If Barnes,Abhimanyu Banerjee,Ruth Bennett,Andrew Berry,Alexandra Bignell,Carles Boix,Ferriol Calvet,Daniel Cerdán-Vélez,Fiona Cunningham,Claire Davidson,Sarah Donaldson,Cagatay Dursun,Reham Fatima,Stefano Giorgetti,Carlos Garcıa Giron,Jose Manuel Gonzalez,Matthew Hardy,Peter W Harrison,Thibaut Hourlier,Zoe Hollis,Toby Hunt,Benjamin James,Yunzhe Jiang,Rory Johnson,Mike Kay,Julien Lagarde,Fergal J Martin,Laura Martínez Gómez,Surag Nair,Pengyu Ni,Fernando Pozo,Vivek Ramalingam,Magali Ruffier,Bianca M Schmitt,Jacob M Schreiber,Emily Steed,Marie-Marthe Suner,Dulika Sumathipala,Irina Sycheva,Barbara Uszczynska-Ratajczak,Elizabeth Wass,Yucheng T Yang,Andrew Yates,Zahoor Zafrulla,Jyoti S Choudhary,Mark Gerstein,Roderic Guigo,Tim J P Hubbard,Manolis Kellis,Anshul Kundaje,Benedict Paten,Michael L Tress,Paul Flicek
DOI: https://doi.org/10.1093/nar/gkac1071
2023-01-06
Abstract:GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.