Does pareto's law apply to evidence distribution in software engineering? an initial report

Hao Tang,You Zhou,Xin Huang,Guoping Rong
DOI: https://doi.org/10.1145/2627508.2627510
2014-01-01
Abstract:Data is the source as well as raw format of evidence. As an important research methodology in evidence-based software engineering, systematic literature reviews (SLRs) are used for identifying the evidence and critically appraising the evidence, i.e. empirical studies that report (empirical) data about specific research questions. The 80/20 Rule (or Pareto's Law) reveals a 'vital few' phenomenon widely observed in many disciplines in the last century. However, the applicability of Pareto's Law to evidence distribution in software engineering (SE) is never tested yet. The objective of this paper is to investigate the applicability of Pareto's Law to the evidence distribution on specific topic areas in software engineering (in the form of systematic reviews), which may help us better understand the possible distribution of evidence in software engineering, and further improve the effectiveness and efficiency of literature search. We performed a tertiary study of SLRs in software engineering dated between 2004 and 2012. We further tested the Pareto's Law by collecting, analyzing, and interpreting the distribution (over publication venues) of the primary studies reported in the existing SLRs. Our search identified 255 SLRs, 107 of which were included according to the selection criteria. The analysis of the extracted data from these SLRs presents a preliminary view of the evidence (study) distribution in software engineering. The nonuniform distribution of evidence is supported by the data from the existing SLRs in SE. However, the present observation reflects a weaker 'vital few' relation between study and venue than the 80/20 Rule statement. Top referenced venues are suggested when researchers search for studies in software engineering. It is also noticeable to the community that the primary studies are improperly or incompletely reported in many SLRs.
What problem does this paper attempt to address?