Abstract
“Ogni lassada xe persa.”1-- Proverb from Trieste, Italy.
We present trièst, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local (i.e., incident to each vertex) number of triangles in a fully dynamic graph represented as an adversarial stream of edge insertions and deletions.
Our algorithms use reservoir sampling and its variants to exploit the user-specified memory space at all times. This is in contrast with previous approaches, which require hard-to-choose parameters (e.g., a fixed sampling probability) and offer no guarantees on the amount of memory they use. We analyze the variance of the estimations and show novel concentration bounds for these quantities.
Our experimental results on very large graphs demonstrate that trièst outperforms state-of-the-art approaches in accuracy and exhibits a small update time.
- Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, 1446--1455. Google ScholarDigital Library
- Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02). SIAM, 623--632. Google ScholarDigital Library
- Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. ACM Transactions on Knowledge Discovery from Data 4, 3 (2010), 13:1--13:28. Google ScholarDigital Library
- Jonathan W. Berry, Bruce Hendrickson, Randall A. LaViolette, and Cynthia A. Phillips. 2011. Tolerating the community detection resolution limit with edge weighting. Physical Review E 83, 5 (2011), 056119.Google ScholarCross Ref
- Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). ACM, 587--596. Google ScholarDigital Library
- Laurent Bulteau, Vincent Froese, Konstantin Kutzkov, and Rasmus Pagh. 2016. Triangle counting in dynamic graph streams. Algorithmica 76, 1 (Sep. 2016), 259--278. Google ScholarDigital Library
- Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’06). ACM, 253--262. Google ScholarDigital Library
- Òscar Celma Herrada. 2009. Music Recommendation and Discovery in the Long Tail. Technical Report. Universitat Pompeu Fabra.Google Scholar
- Edith Cohen, Graham Cormode, and Nick Duffield. 2012. Don’t let the negatives bring you down: Sampling from streams of signed updates. ACM SIGMETRICS Performance Evaluation Review 40, 1 (2012), 343--354. Google ScholarDigital Library
- Yahoo! Research Webscope Datasets. 2016. Yahoo! Answers browsing behavior version 1.0. http://webscope.sandbox.yahoo.com. ((Accessed on) September 2016).Google Scholar
- Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, and Eli Upfal. 2016. TRIÈST: Counting local and global triangles in fully-dynamic streams with fixed memory size. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, 825--834. Google ScholarDigital Library
- Jean-Claude Deville and Yves Tillé. 2004. Efficient balanced sampling: The cube method. Biometrika 91, 4 (2004), 893--912.Google ScholarCross Ref
- Jean-Pierre Eckmann and Elisha Moses. 2002. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proceedings of the National Academy of Sciences 99, 9 (2002), 5825--5829.Google ScholarCross Ref
- Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Ismail Oner Sebe, Ahmed Taei, and Sunita Verma. 2015b. Ego-net community mining applied to friend suggestion. Proceedings of the VLDB Endowment 9, 4 (2015), 324--335. Google ScholarDigital Library
- Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015a. Efficient densest subgraph computation in evolving graph. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). ACM, 300--310. Google ScholarDigital Library
- Rainer Gemulla, Wolfgang Lehner, and Peter J. Haas. 2008. Maintaining bounded-size sample synopses of evolving datasets. The VLDB Journal 17, 2 (2008), 173--201. Google ScholarDigital Library
- A. Hajnal and E. Szemerédi. 1970. Proof of a conjecture of P. Erdős.d. In Combinatorial Theory and Its Applications, II (Proc. Colloq., Balatonfüred, 1969), P. Erdős, A. Rényi, and Vera T. Sós (Eds.). 601--623.Google Scholar
- Bronwyn H. Hall, Adam B. Jaffe, and Manuel Trajtenberg. 2001. The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools. Technical Report. National Bureau of Economic Research.Google Scholar
- Rob J. Hyndman and Anne B. Koehler. 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22, 4 (2006), 679--688.Google ScholarCross Ref
- Madhav Jha, C. Seshadhri, and Ali Pinar. 2015. A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. ACM Transactions on Knowledge Discovery from Data 9, 3 (2015), 15:1--15:21. Google ScholarDigital Library
- Hossein Jowhari and Mohammad Ghodsi. 2005. New streaming algorithms for counting triangles in graphs. In Proceeding of the 11th Annual International Conference on Computing and Combinatorics: (COCOON’05). Springer, 710--716.Google ScholarCross Ref
- Daniel M. Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. 2012. Counting arbitrary subgraphs in data streams. In Automata, Languages, and Programming, Artur Czumaj, Kurt Mehlhorn, Andrew Pitts, and Roger Wattenhofer (Eds.). Lecture Notes in Computer Science, Vol. 7392. Springer, 598--609. Google ScholarDigital Library
- Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. 2012. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics 8, 1--2 (2012), 161--185.Google ScholarCross Ref
- Konstantin Kutzkov and Rasmus Pagh. 2013. On the streaming complexity of computing local clustering coefficients. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM’13). ACM, 677--686. Google ScholarDigital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, 591--600. Google ScholarDigital Library
- Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1 (2008), 458--473. Google ScholarDigital Library
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 2. Google ScholarDigital Library
- Yongsub Lim and U. Kang. 2015. MASCOT: Memory-efficient and accurate sampling for counting local triangles in graph streams. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). ACM, 685--694. Google ScholarDigital Library
- Madhusudan Manjunath, Kurt Mehlhorn, Konstantinos Panagiotou, and He Sun. 2011. Approximate counting of cycles in streams. In European Symposium on Algorithms (ESA’11). Springer, 677--688. Google ScholarDigital Library
- Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (2002), 824--827.Google Scholar
- Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis (2nd ed.). Cambridge University Press. Google ScholarDigital Library
- Rasmus Pagh and Charalampos E. Tsourakakis. 2012. Colorful triangle counting and a MapReduce implementation. Information Processing Letters 112, 7 (Mar. 2012), 277--281. Google ScholarDigital Library
- Ha-Myung Park, Francesco Silvestri, U. Kang, and Rasmus Pagh. 2014. MapReduce triangle enumeration with guarantees. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM’14). ACM, 1739--1748. Google ScholarDigital Library
- Ha-Myung Park and Chin-Wan Chung. 2013. An efficient MapReduce algorithm for counting triangles in a very large graph. In Proceedings of the 22nd ACM International Conference on Conference on Information 8 Knowledge Management (CIKM’13). ACM, 539--548. Google ScholarDigital Library
- Ha-Myung Park, Sung-Hyon Myaeng, and U. Kang. 2016. PTE: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, 1115--1124. Google ScholarDigital Library
- Aduri Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. Counting and sampling triangles from a graph stream. Proceedings of the VLDB Endowment 6, 14 (2013), 1870--1881. Google ScholarDigital Library
- Matthew Skala. 2013. Hypergeometric tail inequalities: Ending the insanity. arXiv preprint 1311.5939 (2013).Google Scholar
- Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). ACM, 607--614. Google ScholarDigital Library
- The Koblenz Network Collection (KONECT). 2016. Last.fm song network dataset. Retrieved from http://konect.uni-koblenz.de/networks/lastfm_song.Google Scholar
- Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. 2009. Doulion: Counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, 837--846. Google ScholarDigital Library
- Charalampos E. Tsourakakis, Mihail N. Kolountzakis, and Gary L. Miller. 2011. Triangle sparsifiers. Journal of Graph Algorithms and Applications 15, 6 (2011), 703--726.Google ScholarCross Ref
- Jeffrey S. Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software 11, 1 (1985), 37--57. Google ScholarDigital Library
Index Terms
- TRIÈST: Counting Local and Global Triangles in Fully Dynamic Streams with Fixed Memory Size
Recommendations
TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningWe present TRIEST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local (i.e., incident to each vertex) number of triangles in a fully-dynamic graph represented as an adversarial ...
Adaptive stratified reservoir sampling over heterogeneous data streams
Reservoir sampling is a known technique for maintaining a random sample of a fixed size over a data stream of an unknown size. While reservoir sampling is suitable for applications demanding a sample over the whole data stream, it is not designed for ...
Sampling from Large Graphs with a Reservoir
NBIS '14: Proceedings of the 2014 17th International Conference on Network-Based Information SystemsSampling is a process of choosing a suitable representative subset from a population and uniformity is a basic requirement of representative ness. A sampling process produces a uniform random sample when all possible samples of the same size have the ...
Comments