Calculating and interpreting FST in the genomics era

Menno Jeroen de Jong,Cock van Oosterhout,Rus Hoelzel,Axel Janke
Abstract:The relative genetic distance between populations is commonly measured using the fixation index (Fst). Traditionally inferred from allele frequency differences, the question arises how Fst can be estimated and interpreted when analysing genomic datasets with low sample sizes. Here, we advocate an elegant solution first put forward by Hudson et al. (1992): Fst = (Dxy - pixy)/Dxy, where Dxy and pixy denote mean sequence dissimilarity between and within populations, respectively. This multi-locus Fst-metric can be derived from allele frequency data, but also from sequence alignment data alone, even when sample sizes are low and/or unequal. As with other FST-metrices, the numerator denotes net divergence (Da), which is equivalent to the f2-statistic and Neis D. In terms of demographic inference, net divergence measures the difference in increase of Dxy and pixy since the population split, owing to a reduction of coalescence times within populations as a result of genetic drift. Because different combinations of changes in Dxy and pixy can produce identical FST-estimates, no universal relationship exists between FST and population split time. Still, in case of recent population separation, when novel mutations are negligible, FST-estimates can be accurately converted into coalescent units (i.e., split time in multiples of 2Ne). This then allows to quantify gene tree discordance, without the need for multispecies coalescent based analyses, using the formula: P(discordance) = 2/3*(1 - Fst). To facilitate the use of the Hudson Fst-metric, we implemented new utilities in the R package SambaR.
What problem does this paper attempt to address?