A fast two-stage algorithm for computing PageRank and its extensions

Chris Pan-Chi Lee, Gene H Golub, Stefanos A Zenios
2003-01-01
Abstract:We present a fast two-stage algorithm for computing the PageRank vector [16]. The algorithm exploits the following observation: the homogeneous discrete-time Markov chain associated with PageR-ank is lumpable, with the lumpable subset of nodes being the dangling nodes [13]. Time to convergence is only a fraction of what’s required for the standard algorithm employed by Google [16]. On data of 451,237 webpages, convergence was achieved in 20% of the time.Our algorithm also replaces a common practice which is in general incorrect. Namely, the practice of ignoring the dangling nodes until the last stages of computation [16] does not necessarily accelerate convergence. In comparison, our algorithm is provable, generally applicable, and achieves the desired speedup. The paper ends with a discussion of possible extensions that generalize the divide-and-conquer theme. We describe two variations that incorporate a multi-stage algorithm. In the first variation, the ordinary PageRank vector is computed. In the second variation, the algorithm computes a generalized version of PageRank where webpages are divided into several classes, each incorporating a different personalization vector. The latter represents a major modeling extension and introduces greater flexibility and a potentially more refined model for web traffic.
What problem does this paper attempt to address?