Eight quick tips for including chromosome X in genome-wide association studies

Justin Bellavance,Linda Wang,Sarah A. Gagliano Taliun
DOI: https://doi.org/10.1371/journal.pcbi.1012160
2024-06-07
PLoS Computational Biology
Abstract:All individuals carry a minimum of 1 copy of chromosome X. Despite being a relatively long chromosome with more than 150 million base pairs [1], similar in length to chromosome 8, association testing of genetic variants on chromosome X is still not routinely conducted. Genome-wide association studies (GWAS) have been used to identify a vast range of genomic loci of interest for a variety of complex human diseases and traits by quantifying genetic variants that are statistically associated with a given disease/trait [2,3]. However, a lack of testing for variants on the X chromosome limits our ability to identify vital loci and subsequently understand potential mechanisms linked to this chromosome. There was a call for the inclusion of chromosome X into genome-wide association analyses presented in 2013. At that time, a scan of published GWAS from 2010 and 2011 showed that only 33% of the studies had tested variants on the X chromosome in their analyses [4]. Despite this call for inclusion, the lack of representation of this chromosome has not improved according to a 2023 study. Of the 136 publications that submitted at least 1 summary statistics file to the NHGRI-EBI GWAS Catalog in 2021, only 25% reported chromosome X results [5]. Indeed, there are several characteristics of this chromosome that make it unique compared to the autosomes, which can pose analytical challenges in association testing. Such challenges include how to account for X inactivation in individuals with an XX karyotype, how to model the hemizygous state of genotypes in individuals with an XY karyotype, or how to best code genotypes at the 2 pseudo-autosomal regions, short stretches at either end of the X with high homology with the Y chromosome, known as PAR1 and PAR2. The non-pseudo-autosomal region (nonPAR) denotes the middle sequence of the X chromosome. Furthermore, there are many well-used software that take GWAS summary statistics as input and ignore chromosome X information [6,7]. This practice can make it difficult and unintuitive for researchers to run association testing on the X chromosome. Inclusion of chromosome X routinely in GWAS and downstream analyses will serve to enhance our understanding of the genetic contributors to complex diseases and traits. Here, we propose 8 tips to help move towards the inclusion of X in GWAS to provide a suggested set of concrete actions that can be taken to overcome the challenges or obstacles preventing routine analysis of this chromosome.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?