The Lorenz's side of Gene Set Enrichment Analysis: Similarity and divergences of the Gene Set Enrichment Analysis from the measure of wealth inequality

Stefano Maria Pagnotta
DOI: https://doi.org/10.1101/2024.11.23.624984
2024-11-25
Abstract:In the age of microarrays (about 20 years ago), the poor quality of expression levels prompted the development of methods for associating a set of significant genes with their biological meaning. A fundamental shift in methodology was the consideration of the entire gene profile, summarizing the differential level of genes measured in two conditions. A preliminary proposal was to mimic the two samples Kolmogorov-Smirnov test. Still, the winning idea was to introduce a system of weights in constructing the empirical distribution functions. Since 2005, the Gene Set Enrichment Analysis has emerged as the most well-known methodology from these premises. This method is always referred to as based on a weighted Kolmogorov-Smirnov test. Sometimes, what is introduced as an innovation that improves a standard methodology leads to a well-known tool in a different science subject. While the accumulation of counts generates the empirical distribution function, the accumulation of weights, as defined in GSEA, leads to a function known as the Lorenz curve, introduced in 1905. Such a tool is a cornerstone in welfare studies to measure the concentration or equidistribution of richness in populations. This paper reviews the essentials of the Lorenz curve and Gene Set Enrichment Analysis. It shows that the test statistic of the last is linked to the null hypothesis comparing two Lorenz curves. The new light of the enrichment procedure makes consistent analytical tools and conceptual formulation of the methodology.
Bioinformatics
What problem does this paper attempt to address?