skip to main content
article

Inside PageRank

Published:01 February 2005Publication History
Skip Abstract Section

Abstract

Although the interest of a Web page is strictly related to its content and to the subjective readers' cultural background, a measure of the page authority can be provided that only depends on the topological structure of the Web. PageRank is a noticeable way to attach a score to Web pages on the basis of the Web connectivity. In this article, we look inside PageRank to disclose its fundamental properties concerning stability, complexity of computational scheme, and critical role of parameters involved in the computation. Moreover, we introduce a circuit analysis that allows us to understand the distribution of the page score, the way different Web communities interact each other, the role of dangling pages (pages with no outlinks), and the secrets for promotion of Web pages.

References

  1. Bharat, K. and Henzinger, M. R. 1998. Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 104--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bianchini, M., Fanelli, S., and Gori, M. 2001. Optimal algorithms for well-conditioned nonlinear systems of equations. IEEE Trans. Comput. 50, 7, 689--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Björck, A. 1996. Numerical Methods for Least Squares Problems. Society for Industrial and Applied Mathematics.Google ScholarGoogle Scholar
  4. Bomze, I. and Gutjahr, W. 1994. The dinamics of self--evaluation. Appl. Math. Comput. 64, 47--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bomze, I. and Gutjahr, W. 1995. Estimating qualifications in a self-evaluating group. Qual. Quant. 29, 241--250.Google ScholarGoogle ScholarCross RefCross Ref
  6. Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas, P. 2001. Finding authorities and hubs from link structures on the world wide web. In Proceedings of the 10th International World Wide Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brin, S., Motwani, R., Page, L., and Winograd, T. 1998. What can you do with a web in your pocket? IEEE Bulle. Techn. Comm. Data Eng., IEEE Comput. Soc. 21, 2, 37--47.Google ScholarGoogle Scholar
  8. Brin, S. and Page, L. 1998. The anatomy of a large--scale hypertextual Web search engine. In Proceedings of the 7th World Wide Web Conference (WWW7). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brin, S., Page, L., Motwani, R., and Winograd, T. 1999. The PageRank citation ranking: Bringing order to the Web. Tech. Rep. 1999-66, Stanford University. Available on the Internet at http://dbpubs.stanford.edu:8090/pub/1999-66.Google ScholarGoogle Scholar
  10. Cohn, D. and Chang, H. 2000. Learning to probabilistically identify authoritative documents. In Proceedings of 17th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, Calif., 167--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cohn, D. and Hofmann, T. 2001. The missing link---A probabilistic model of document content and hypertext connectivity. In Neural Inf. Proc. Syst. 13.Google ScholarGoogle Scholar
  12. Diligenti, M., Gori, M., and Maggini, M. 2002. Web page scoring systems for horizontal and vertical search. In Proceedings of the 11th World Wide Web Conference (WWW11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Golub, G. H. and Van Loan, C. F. 1993. Matrix computation. The Johns Hopkins University Press.Google ScholarGoogle Scholar
  14. Haveliwala, T. H. 1999. Efficient computation of pagerank. Tech. Rep. 1999-66, Stanford University. Available on the Internet at http://dbpubs.stanford.edu:8090/pub/1999-66.Google ScholarGoogle Scholar
  15. Haveliwala, T. H. 2002. Topic sensitive pagerank. In Proceedings of the 11th World Wide Web Conference (WWW11). Available on the Internet at http://dbpubs.stanford.edu:8090/pub/2002-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Henzinger, M. 2001. Hyperlink analysis for the Web. IEEE Internet Computing 5, 1, 45--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lempel, R. and Moran, S. 2000. The stochatic approach for link--structure analysis (SALSA) and the TKC effect. In Proceedings of the 9th World Wide Web Conference (WWW9). Elsevier Science, 387--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marchiori, M. 1997. The quest for correct information on the Web: Hyper search engines. Computer Networks and ISDN Systems 29, 1225--1235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Motwani, R. and Raghavan, P. 1995. Randomized algorithms. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001a. Link analysis, eigenvectors and stability. In Proceedings of International Conference on Research and Development in Information Retrieval (SIGIR 2001). ACM, New York.Google ScholarGoogle Scholar
  22. Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001b. Stable algorithms for link analysis. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI'2001).Google ScholarGoogle Scholar
  23. Pringle, G., Allison, L., and Dowe, D. L. 1998. What is tall poppy among the Web pages? Comput. Netwo. ISDN Syst. 30, 369--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Richardson, M. and Domingos, P. 2002. The intellingent surfer: probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems, 14. MIT Press, Cambridge, Mass.Google ScholarGoogle Scholar
  25. Rumelhart, D., Hinton, G., and Williams, R. 1986. Learning representations by back-propagating errors. Nature 323, 533--536.Google ScholarGoogle ScholarCross RefCross Ref
  26. Seneta, E. 1981. Non-negative matrices and Markov chains. Springer-Verlag, New York, Chap. 4, pp. 112--158.Google ScholarGoogle Scholar
  27. Varga, R. S. 1962. Matrix Iterative Analysis. Prentice--Hall, Englewood Cliffs, N.J.Google ScholarGoogle Scholar
  28. Zhang, D. and Dong, Y. 2000. An efficient algorithm to rank web resources. In Proceedings of the 9th International World Wide Web Conference (WWW9). Elsevier Science, Amsterdam, The Netherlands. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Inside PageRank

            Recommendations

            Reviews

            Kipp Jones

            Web searching continues to be a popular topic, of both commercial and academic interest. This paper presents an in-depth mathematical analysis of the properties of Google's PageRank algorithm, which relies on the topological structure of the Web to calculate the relative value of a given Web page. In addition to the rigorous mathematical treatment, the paper provides insights into the implications of these calculations, and how they impact stability, computability, convergence, maintenance, and vulnerability. The PageRank algorithm exploits the inherent properties of the Markovian matrices that represent the Web's structure to calculate individual page ranks. The authors go on to show that it is possible to perform an optimal computation, based on the limited precision requirements imposed by the PageRank formula. This result is an important factor for the scalability of the PageRank algorithm. However, as pointed out by Langville and Meyer [1], "if the holy grail of real-time personalized search is ever to be realized, then drastic speed improvements must be made..." The authors also introduce the notion of energy, and demonstrate the mathematical properties of this concept through the use of circuit analysis on a subgraph or community of Web pages. This analysis leads to some interesting properties, including the energy loss introduced by "dangling" pages, and "outlinks" from a given community. Following this line, the authors also detail mechanisms by which the PageRank can be increased (or decreased), to influence the relative ranking of the pages within the ranking system. Indeed, one of the authors' findings is the fact that the energy of a given target community can be driven to grow linearly with the growth of a "promoting community." This results in a promotion mechanism that is very difficult to detect. Another nugget suggested by this research is that hyperlinks to outside pages should be placed in pages with a small PageRank, in addition to a large number of internal links. This property, as well as others discussed by the authors, can be exploited to retain as much energy as possible in a given community, and may provide some insights for site designers. With this paper, the authors provide a valuable resource for those interested in analyzing the behavior of, and optimizing, large-scale ranking algorithms, such as the one used by Google. The math should keep those interested in the theory busy, while the conclusions provide some practical advice for practitioners in the field. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Internet Technology
              ACM Transactions on Internet Technology  Volume 5, Issue 1
              February 2005
              297 pages
              ISSN:1533-5399
              EISSN:1557-6051
              DOI:10.1145/1052934
              Issue’s Table of Contents

              Copyright © 2005 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 February 2005
              Published in toit Volume 5, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader