Is Least-Squares Inaccurate in Fitting Power-Law Distributions? the Criticism is Complete Nonsense

Xiaoshi Zhong,Muyin Wang,Hongkun Zhang
DOI: https://doi.org/10.1145/3485447.3511995
2022-01-01
Abstract:Ordinary least-squares estimation is proved to be the best linear unbiased estimator according to the Gauss-Markov theorem. In the last two decades, however, some researchers criticized that least-squares was substantially inaccurate in fitting power-law distributions; such criticism has caused a strong bias in research community. In this paper, we conduct extensive experiments to rebut that such criticism is complete nonsense. Specifically, we sample different sizes of discrete and continuous data from power-law models, showing that even though the long-tailed noises are sampled from power-law models, they cannot be treated as power-law data. We define the correct way to bin continuous power-law data into data points and propose an average strategy for least-squares to fit power-law distributions. Experiments on both simulated and real-world data show that our proposed method fits power-law data perfectly. We uncover a fundamental flaw in the popular method proposed by Clauset et al. [12]: it tends to discard the majority of power-law data and fit the long-tailed noises. Experiments also show that the reverse cumulative distribution function is a bad idea to plot power-law data in practice because it usually hides the true probability distribution of data. We hope that our research can clean up the bias about least-squares fitting power-law distributions. Source code can be found at https://github.com/xszhong/LSavg.
What problem does this paper attempt to address?