research-article

Distributed Algorithms for Fully Personalized PageRank on Large Graphs

Author:
Wenqing Lin

Tencent, China

Tencent, China
View Profile

Authors Info & Claims

WWW '19: The World Wide Web ConferenceMay 2019Pages 1084–1094https://doi.org/10.1145/3308558.3313555

Published:13 May 2019Publication History

WWW '19: The World Wide Web Conference

Pages 1084–1094

ABSTRACT

Personalized PageRank (PPR) has enormous applications, such as link prediction and recommendation systems for social networks, which often require the fully PPR to be known. Besides, most of real-life graphs are edge-weighted, e.g., the interaction between users on the Facebook network. However, it is computationally difficult to compute the fully PPR, especially on large graphs, not to mention that most existing approaches do not consider the weights of edges. In particular, the existing approach cannot handle graphs with billion edges on a moderate-size cluster. To address this problem, this paper presents a novel study on the computation of fully edge-weighted PPR on large graphs using the distributed computing framework. Specifically, we employ the Monte Carlo approximation that performs a large number of random walks from each node of the graph, and exploits the parallel pipeline framework to reduce the overall running time of the fully PPR. Based on that, we develop several optimization techniques which (i) alleviate the issue of large nodes that could explode the memory space, (ii) pre-compute short walks for small nodes that largely speedup the computation of random walks, and (iii) optimize the amount of random walks to compute in each pipeline that significantly reduces the overhead. With extensive experiments on a variety of real-life graph datasets, we demonstrate that our solution is several orders of magnitude faster than the state-of-the-arts, and meanwhile, largely outperforms the baseline algorithms in terms of accuracy.

References

Eneko Agirre and Aitor Soroa. 2009. Personalizing PageRank for Word Sense Disambiguation. In EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30 - April 3, 2009. 33-41. Google ScholarDigital Library
Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. 2011. Fast personalized PageRank on MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011. 973-984. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, December 6-8, 2004. 137-150. Google ScholarDigital Library
Pavlos S. Efraimidis and Paul G. Spirakis. 2006. Weighted random sampling with a reservoir. Inf. Process. Lett.97, 5 (2006), 181-185. Google ScholarDigital Library
Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018. 1775-1784. Google ScholarDigital Library
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On Power-law Relationships of the Internet Topology. In SIGCOMM. 251-262. Google ScholarDigital Library
Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments. Internet Mathematics2, 3 (2005), 333-358.Google Scholar
Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. 2012. Efficient personalized pagerank with accuracy assurance. In The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, Beijing, China, August 12-16, 2012. 15-23. Google ScholarDigital Library
Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. 2017. Distributed Algorithms on Exact Personalized PageRank. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017. 479-494. Google ScholarDigital Library
Qirong Ho, Wenqing Lin, Eran Shaham, Shonali Krishnaswamy, The Anh Dang, Jingxuan Wang, Isabel Choo Zhongyan, and Amy She-Nash. 2016. A Distributed Graph Algorithm for Discovering Unique Behavioral Groups from Large-Scale Telco Data. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016. 1353-1362. Google ScholarDigital Library
Jinhong Jung, Namyong Park, Lee Sael, and U. Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017. 789-804. Google ScholarDigital Library
Jung Hyun Kim, K. Selçuk Candan, and Maria Luisa Sapino. 2013. LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013. 1801-1806. Google ScholarDigital Library
Wei-keng Liao, Alok N. Choudhary, Donald D. Weiner, and Pramod K. Varshney. 2005. Performance Evaluation of a Parallel Pipeline Computational Model for Space-Time Adaptive Processing. The Journal of Supercomputing31, 2 (2005), 137-160. Google ScholarDigital Library
Wenqing Lin, Xiaokui Xiao, and Gabriel Ghinita. 2014. Large-scale frequent subgraph mining in MapReduce. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014. 844-855.Google ScholarCross Ref
Wenqing Lin, Xiaokui Xiao, Xing Xie, and Xiaoli Li. 2017. Network Motif Discovery: A GPU Approach. IEEE Trans. Knowl. Data Eng.29, 3 (2017), 513-528. Google ScholarDigital Library
Qin Liu, Zhenguo Li, John C. S. Lui, and Jiefeng Cheng. 2016. PowerWalk: Scalable Personalized PageRank via Random Walks with Vertex-Centric Decomposition. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016. 195-204. Google ScholarDigital Library
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized PageRank Estimation and Search: A Bidirectional Approach. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, February 22-25, 2016. 163-172. Google ScholarDigital Library
Peter Lofgren, Siddhartha Banerjee, Ashish Goel, and Seshadhri Comandur. 2014. FAST-PPR: scaling personalized pagerank estimation for large graphs. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24 - 27, 2014. 1436-1445. Google ScholarDigital Library
Siqiang Luo, Xiaokui Xiao, Wenqing Lin, and Ben Kao. 2019. Efficient Batch One-Hop Personalized PageRanks. In IEEE 35th International Conference on Data Engineering, Macau SAR, ICDE 2019, China, April 8 - April 12, 2019.Google Scholar
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010. 135-146. Google ScholarDigital Library
Phuong Nguyen, Paolo Tomeo, Tommaso Di Noia, and Eugenio Di Sciascio. 2015. An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data. In Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, Italy, May 18-22, 2015 - Companion Volume. 1477-1482. Google ScholarDigital Library
Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. 2015. Fast distributed PageRank computation. Theor. Comput. Sci.561(2015), 113-121. Google ScholarDigital Library
Yufei Tao, Wenqing Lin, and Xiaokui Xiao. 2013. Minimal MapReduce algorithms. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013. 529-540. Google ScholarDigital Library
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast Random Walk with Restart and Its Applications. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), 18-22 December 2006, Hong Kong, China. 613-622. Google ScholarDigital Library
Michael D. Vose. 1991. A Linear Algorithm For Generating Random Numbers With a Given Distribution. IEEE Trans. Software Eng.17, 9 (1991), 972-975. Google ScholarDigital Library
Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective Indexing for Approximate Personalized PageRank. PVLDB10, 3 (2016), 205-216. Google ScholarDigital Library
Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and Effective Approximate Single-Source Personalized PageRank. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. 505-514. Google ScholarDigital Library
Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. 2018. TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018. 441-456. Google ScholarDigital Library
Wenlei Xie, David Bindel, Alan J. Demers, and Johannes Gehrke. 2015. Edge-Weighted Personalized PageRank: Breaking A Decade-Old Performance Barrier. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015. 1325-1334. Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10, Boston, MA, USA, June 22, 2010. Google ScholarDigital Library
Wanxin Zhang, Dongsheng Li, Ying Xu, and Yiming Zhang. 2016. Shuffle-efficient distributed Locality Sensitive Hashing on spark. In IEEE Conference on Computer Communications Workshops, INFOCOM Workshops 2016, San Francisco, CA, USA, April 10-14, 2016. 766-767.Google ScholarCross Ref

Recommendations

Equimatchable Graphs on Surfaces

A graph G is equimatchable if each matching in G is a subset of a maximum-size matching and it is factor critical if G-v has a perfect matching for each vertex v of G. It is known that any 2-connected equimatchable graph is either bipartite or factor ...
Read More
A Fully Dynamic Distributed Algorithm for a B-Coloring of Graphs
ISPA '08: Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications

A b-coloring of a graph $G$ is a proper coloring of the nodes of $G$ such that each color class contains a node that has a neighbor in all other color classes. A fully dynamic algorithm is an algorithm used to support modifications (insertion or ...
Read More
Restrained domination in cubic graphs

Let G =( V , E ) be a graph. A set S V is a restrained dominating set if every vertex in V S is adjacent to a vertex in S and to a vertex in V S . The restrained domination number of G , denoted _r( G ), is the smallest ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Distributed Algorithm
Graph
Personalized PageRank
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 602
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Distributed Algorithms for Fully Personalized PageRank on Large Graphs

WWW '19: The World Wide Web Conference

ABSTRACT

References

Cited By

Recommendations

Equimatchable Graphs on Surfaces

A Fully Dynamic Distributed Algorithm for a B-Coloring of Graphs

Restrained domination in cubic graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Distributed Algorithms for Fully Personalized PageRank on Large Graphs

WWW '19: The World Wide Web Conference

ABSTRACT

References

Cited By

Recommendations

Equimatchable Graphs on Surfaces

A Fully Dynamic Distributed Algorithm for a B-Coloring of Graphs

Restrained domination in cubic graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media