skip to main content
research-article
Open Access

TRIÈST: Counting Local and Global Triangles in Fully Dynamic Streams with Fixed Memory Size

Published:29 June 2017Publication History
Skip Abstract Section

Abstract

“Ogni lassada xe persa.”1-- Proverb from Trieste, Italy.

We present trièst, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local (i.e., incident to each vertex) number of triangles in a fully dynamic graph represented as an adversarial stream of edge insertions and deletions.

Our algorithms use reservoir sampling and its variants to exploit the user-specified memory space at all times. This is in contrast with previous approaches, which require hard-to-choose parameters (e.g., a fixed sampling probability) and offer no guarantees on the amount of memory they use. We analyze the variance of the estimations and show novel concentration bounds for these quantities.

Our experimental results on very large graphs demonstrate that trièst outperforms state-of-the-art approaches in accuracy and exhibits a small update time.

References

  1. Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, 1446--1455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02). SIAM, 623--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. ACM Transactions on Knowledge Discovery from Data 4, 3 (2010), 13:1--13:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jonathan W. Berry, Bruce Hendrickson, Randall A. LaViolette, and Cynthia A. Phillips. 2011. Tolerating the community detection resolution limit with edge weighting. Physical Review E 83, 5 (2011), 056119.Google ScholarGoogle ScholarCross RefCross Ref
  5. Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). ACM, 587--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Laurent Bulteau, Vincent Froese, Konstantin Kutzkov, and Rasmus Pagh. 2016. Triangle counting in dynamic graph streams. Algorithmica 76, 1 (Sep. 2016), 259--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’06). ACM, 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Òscar Celma Herrada. 2009. Music Recommendation and Discovery in the Long Tail. Technical Report. Universitat Pompeu Fabra.Google ScholarGoogle Scholar
  9. Edith Cohen, Graham Cormode, and Nick Duffield. 2012. Don’t let the negatives bring you down: Sampling from streams of signed updates. ACM SIGMETRICS Performance Evaluation Review 40, 1 (2012), 343--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yahoo! Research Webscope Datasets. 2016. Yahoo! Answers browsing behavior version 1.0. http://webscope.sandbox.yahoo.com. ((Accessed on) September 2016).Google ScholarGoogle Scholar
  11. Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, and Eli Upfal. 2016. TRIÈST: Counting local and global triangles in fully-dynamic streams with fixed memory size. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, 825--834. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jean-Claude Deville and Yves Tillé. 2004. Efficient balanced sampling: The cube method. Biometrika 91, 4 (2004), 893--912.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jean-Pierre Eckmann and Elisha Moses. 2002. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proceedings of the National Academy of Sciences 99, 9 (2002), 5825--5829.Google ScholarGoogle ScholarCross RefCross Ref
  14. Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Ismail Oner Sebe, Ahmed Taei, and Sunita Verma. 2015b. Ego-net community mining applied to friend suggestion. Proceedings of the VLDB Endowment 9, 4 (2015), 324--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015a. Efficient densest subgraph computation in evolving graph. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). ACM, 300--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rainer Gemulla, Wolfgang Lehner, and Peter J. Haas. 2008. Maintaining bounded-size sample synopses of evolving datasets. The VLDB Journal 17, 2 (2008), 173--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Hajnal and E. Szemerédi. 1970. Proof of a conjecture of P. Erdős.d. In Combinatorial Theory and Its Applications, II (Proc. Colloq., Balatonfüred, 1969), P. Erdős, A. Rényi, and Vera T. Sós (Eds.). 601--623.Google ScholarGoogle Scholar
  18. Bronwyn H. Hall, Adam B. Jaffe, and Manuel Trajtenberg. 2001. The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools. Technical Report. National Bureau of Economic Research.Google ScholarGoogle Scholar
  19. Rob J. Hyndman and Anne B. Koehler. 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22, 4 (2006), 679--688.Google ScholarGoogle ScholarCross RefCross Ref
  20. Madhav Jha, C. Seshadhri, and Ali Pinar. 2015. A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. ACM Transactions on Knowledge Discovery from Data 9, 3 (2015), 15:1--15:21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hossein Jowhari and Mohammad Ghodsi. 2005. New streaming algorithms for counting triangles in graphs. In Proceeding of the 11th Annual International Conference on Computing and Combinatorics: (COCOON’05). Springer, 710--716.Google ScholarGoogle ScholarCross RefCross Ref
  22. Daniel M. Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. 2012. Counting arbitrary subgraphs in data streams. In Automata, Languages, and Programming, Artur Czumaj, Kurt Mehlhorn, Andrew Pitts, and Roger Wattenhofer (Eds.). Lecture Notes in Computer Science, Vol. 7392. Springer, 598--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. 2012. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics 8, 1--2 (2012), 161--185.Google ScholarGoogle ScholarCross RefCross Ref
  24. Konstantin Kutzkov and Rasmus Pagh. 2013. On the streaming complexity of computing local clustering coefficients. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM’13). ACM, 677--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1 (2008), 458--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yongsub Lim and U. Kang. 2015. MASCOT: Memory-efficient and accurate sampling for counting local triangles in graph streams. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). ACM, 685--694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Madhusudan Manjunath, Kurt Mehlhorn, Konstantinos Panagiotou, and He Sun. 2011. Approximate counting of cycles in streams. In European Symposium on Algorithms (ESA’11). Springer, 677--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (2002), 824--827.Google ScholarGoogle Scholar
  31. Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis (2nd ed.). Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rasmus Pagh and Charalampos E. Tsourakakis. 2012. Colorful triangle counting and a MapReduce implementation. Information Processing Letters 112, 7 (Mar. 2012), 277--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ha-Myung Park, Francesco Silvestri, U. Kang, and Rasmus Pagh. 2014. MapReduce triangle enumeration with guarantees. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM’14). ACM, 1739--1748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ha-Myung Park and Chin-Wan Chung. 2013. An efficient MapReduce algorithm for counting triangles in a very large graph. In Proceedings of the 22nd ACM International Conference on Conference on Information 8 Knowledge Management (CIKM’13). ACM, 539--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ha-Myung Park, Sung-Hyon Myaeng, and U. Kang. 2016. PTE: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, 1115--1124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Aduri Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. Counting and sampling triangles from a graph stream. Proceedings of the VLDB Endowment 6, 14 (2013), 1870--1881. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Matthew Skala. 2013. Hypergeometric tail inequalities: Ending the insanity. arXiv preprint 1311.5939 (2013).Google ScholarGoogle Scholar
  38. Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). ACM, 607--614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. The Koblenz Network Collection (KONECT). 2016. Last.fm song network dataset. Retrieved from http://konect.uni-koblenz.de/networks/lastfm_song.Google ScholarGoogle Scholar
  40. Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. 2009. Doulion: Counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, 837--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Charalampos E. Tsourakakis, Mihail N. Kolountzakis, and Gary L. Miller. 2011. Triangle sparsifiers. Journal of Graph Algorithms and Applications 15, 6 (2011), 703--726.Google ScholarGoogle ScholarCross RefCross Ref
  42. Jeffrey S. Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software 11, 1 (1985), 37--57. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TRIÈST: Counting Local and Global Triangles in Fully Dynamic Streams with Fixed Memory Size

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Knowledge Discovery from Data
                ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 4
                Special Issue on KDD 2016 and Regular Papers
                November 2017
                419 pages
                ISSN:1556-4681
                EISSN:1556-472X
                DOI:10.1145/3119906
                • Editor:
                • Jie Tang
                Issue’s Table of Contents

                Copyright © 2017 Owner/Author

                Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 29 June 2017
                • Accepted: 1 March 2017
                • Revised: 1 January 2017
                • Received: 1 June 2016
                Published in tkdd Volume 11, Issue 4

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader