ABSTRACT
Personalized PageRank (PPR) has enormous applications, such as link prediction and recommendation systems for social networks, which often require the fully PPR to be known. Besides, most of real-life graphs are edge-weighted, e.g., the interaction between users on the Facebook network. However, it is computationally difficult to compute the fully PPR, especially on large graphs, not to mention that most existing approaches do not consider the weights of edges. In particular, the existing approach cannot handle graphs with billion edges on a moderate-size cluster. To address this problem, this paper presents a novel study on the computation of fully edge-weighted PPR on large graphs using the distributed computing framework. Specifically, we employ the Monte Carlo approximation that performs a large number of random walks from each node of the graph, and exploits the parallel pipeline framework to reduce the overall running time of the fully PPR. Based on that, we develop several optimization techniques which (i) alleviate the issue of large nodes that could explode the memory space, (ii) pre-compute short walks for small nodes that largely speedup the computation of random walks, and (iii) optimize the amount of random walks to compute in each pipeline that significantly reduces the overhead. With extensive experiments on a variety of real-life graph datasets, we demonstrate that our solution is several orders of magnitude faster than the state-of-the-arts, and meanwhile, largely outperforms the baseline algorithms in terms of accuracy.
- Eneko Agirre and Aitor Soroa. 2009. Personalizing PageRank for Word Sense Disambiguation. In EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30 - April 3, 2009. 33-41. Google ScholarDigital Library
- Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. 2011. Fast personalized PageRank on MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011. 973-984. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, December 6-8, 2004. 137-150. Google ScholarDigital Library
- Pavlos S. Efraimidis and Paul G. Spirakis. 2006. Weighted random sampling with a reservoir. Inf. Process. Lett.97, 5 (2006), 181-185. Google ScholarDigital Library
- Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018. 1775-1784. Google ScholarDigital Library
- Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On Power-law Relationships of the Internet Topology. In SIGCOMM. 251-262. Google ScholarDigital Library
- Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments. Internet Mathematics2, 3 (2005), 333-358.Google Scholar
- Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. 2012. Efficient personalized pagerank with accuracy assurance. In The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, Beijing, China, August 12-16, 2012. 15-23. Google ScholarDigital Library
- Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. 2017. Distributed Algorithms on Exact Personalized PageRank. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017. 479-494. Google ScholarDigital Library
- Qirong Ho, Wenqing Lin, Eran Shaham, Shonali Krishnaswamy, The Anh Dang, Jingxuan Wang, Isabel Choo Zhongyan, and Amy She-Nash. 2016. A Distributed Graph Algorithm for Discovering Unique Behavioral Groups from Large-Scale Telco Data. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016. 1353-1362. Google ScholarDigital Library
- Jinhong Jung, Namyong Park, Lee Sael, and U. Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017. 789-804. Google ScholarDigital Library
- Jung Hyun Kim, K. Selçuk Candan, and Maria Luisa Sapino. 2013. LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013. 1801-1806. Google ScholarDigital Library
- Wei-keng Liao, Alok N. Choudhary, Donald D. Weiner, and Pramod K. Varshney. 2005. Performance Evaluation of a Parallel Pipeline Computational Model for Space-Time Adaptive Processing. The Journal of Supercomputing31, 2 (2005), 137-160. Google ScholarDigital Library
- Wenqing Lin, Xiaokui Xiao, and Gabriel Ghinita. 2014. Large-scale frequent subgraph mining in MapReduce. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014. 844-855.Google ScholarCross Ref
- Wenqing Lin, Xiaokui Xiao, Xing Xie, and Xiaoli Li. 2017. Network Motif Discovery: A GPU Approach. IEEE Trans. Knowl. Data Eng.29, 3 (2017), 513-528. Google ScholarDigital Library
- Qin Liu, Zhenguo Li, John C. S. Lui, and Jiefeng Cheng. 2016. PowerWalk: Scalable Personalized PageRank via Random Walks with Vertex-Centric Decomposition. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016. 195-204. Google ScholarDigital Library
- Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized PageRank Estimation and Search: A Bidirectional Approach. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, February 22-25, 2016. 163-172. Google ScholarDigital Library
- Peter Lofgren, Siddhartha Banerjee, Ashish Goel, and Seshadhri Comandur. 2014. FAST-PPR: scaling personalized pagerank estimation for large graphs. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24 - 27, 2014. 1436-1445. Google ScholarDigital Library
- Siqiang Luo, Xiaokui Xiao, Wenqing Lin, and Ben Kao. 2019. Efficient Batch One-Hop Personalized PageRanks. In IEEE 35th International Conference on Data Engineering, Macau SAR, ICDE 2019, China, April 8 - April 12, 2019.Google Scholar
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010. 135-146. Google ScholarDigital Library
- Phuong Nguyen, Paolo Tomeo, Tommaso Di Noia, and Eugenio Di Sciascio. 2015. An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data. In Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, Italy, May 18-22, 2015 - Companion Volume. 1477-1482. Google ScholarDigital Library
- Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. 2015. Fast distributed PageRank computation. Theor. Comput. Sci.561(2015), 113-121. Google ScholarDigital Library
- Yufei Tao, Wenqing Lin, and Xiaokui Xiao. 2013. Minimal MapReduce algorithms. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013. 529-540. Google ScholarDigital Library
- Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast Random Walk with Restart and Its Applications. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), 18-22 December 2006, Hong Kong, China. 613-622. Google ScholarDigital Library
- Michael D. Vose. 1991. A Linear Algorithm For Generating Random Numbers With a Given Distribution. IEEE Trans. Software Eng.17, 9 (1991), 972-975. Google ScholarDigital Library
- Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective Indexing for Approximate Personalized PageRank. PVLDB10, 3 (2016), 205-216. Google ScholarDigital Library
- Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and Effective Approximate Single-Source Personalized PageRank. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. 505-514. Google ScholarDigital Library
- Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. 2018. TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018. 441-456. Google ScholarDigital Library
- Wenlei Xie, David Bindel, Alan J. Demers, and Johannes Gehrke. 2015. Edge-Weighted Personalized PageRank: Breaking A Decade-Old Performance Barrier. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015. 1325-1334. Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10, Boston, MA, USA, June 22, 2010. Google ScholarDigital Library
- Wanxin Zhang, Dongsheng Li, Ying Xu, and Yiming Zhang. 2016. Shuffle-efficient distributed Locality Sensitive Hashing on spark. In IEEE Conference on Computer Communications Workshops, INFOCOM Workshops 2016, San Francisco, CA, USA, April 10-14, 2016. 766-767.Google ScholarCross Ref
Recommendations
Equimatchable Graphs on Surfaces
A graph G is equimatchable if each matching in G is a subset of a maximum-size matching and it is factor critical if G-v has a perfect matching for each vertex v of G. It is known that any 2-connected equimatchable graph is either bipartite or factor ...
A Fully Dynamic Distributed Algorithm for a B-Coloring of Graphs
ISPA '08: Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with ApplicationsA b-coloring of a graph $G$ is a proper coloring of the nodes of $G$ such that each color class contains a node that has a neighbor in all other color classes. A fully dynamic algorithm is an algorithm used to support modifications (insertion or ...
Restrained domination in cubic graphs
Let G =( V , E ) be a graph. A set S V is a restrained dominating set if every vertex in V S is adjacent to a vertex in S and to a vertex in V S . The restrained domination number of G , denoted r ( G ), is the smallest ...
Comments