Abstract
While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind and the challenges of graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.
- Bijaya Adhikari, Yao Zhang, Aditya Bharadwaj, and B. Aditya Prakash. 2017. Condensing temporal networks using propagation. In Proceedings of the 2017 SIAM International Conference on Data Mining. 417--425Google Scholar
- Charu Aggarwal and Karthik Subbian. 2014. Evolutionary network analysis: A survey. ACM Comput. Surv. 47, 1 (2014), 10:1--10:36. Google ScholarDigital Library
- Charu C. Aggarwal. 2015. Data Mining: The Textbook. Springer. Google ScholarDigital Library
- Charu C. Aggarwal and S. Yu Philip. 2005. Online analysis of community evolution in data streams. In Proceedings of the SIAM international Conference on Data Mining (SDM’05).Google Scholar
- Charu C. Aggarwal and Haixun Wang. 2010. A survey of clustering algorithms for graph data. In Managing and Mining Graph Data. Springer, 275--301.Google ScholarDigital Library
- Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J. Smola. 2013. Distributed large-scale natural graph factorization. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 37--48. Google ScholarDigital Library
- Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph sketches: Sparsification, spanners, and subgraphs. In Proceedings of the 31st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 5--14. Google ScholarDigital Library
- Sebastian E. Ahnert. 2013. Power graph compression reveals dominant relationships in genetic transcription networks. Molec. BioSyst. 9, 11 (2013), 2681--2685.Google ScholarCross Ref
- Alfred V. Aho, M. R. Garey, and Jeffrey D. Ullman. 1972. The transitive reduction of a directed graph. Siam J. Comput. 1, 2 (1972), 131--137.Google ScholarDigital Library
- Leman Akoglu, Duen Horng Chau, U. Kang, Danai Koutra, and Christos Faloutsos. 2012. OPAvion: Mining and visualization in large graphs. In Proceedings of the 2012 SIGMOD Conference. ACM, 717--720. Google ScholarDigital Library
- Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. OddBall: Spotting anomalies in weighted graphs. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’10). Google ScholarDigital Library
- Leman Akoglu, Hanghang Tong, Jilles Vreeken, and Christos Faloutsos. 2012. Fast and reliable anomaly detection in categorical data. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM’12). ACM. Google ScholarDigital Library
- Charles J. Alpert, Andrew B. Kahng, and So-Zen Yao. 1999. Spectral partitioning with multiple eigenvectors. Discr. Appl. Math. 90, 1 (1999), 3--26. Google ScholarDigital Library
- Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, (2008), 117--122. Google ScholarDigital Library
- Alberto Apostolico and Guido Drovandi. 2009. Graph compression by BFS. Algorithms 2, 3 (2009), 1031--1044.Google ScholarCross Ref
- Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos E. Papalexakis, and Danai Koutra. 2014. Com2: Fast automatic discovery of temporal (“comet”) communities. In Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, Vol. 8444. Springer, 271--283.Google Scholar
- Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, 44--54. Google ScholarDigital Library
- Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. 2009. Gephi: An open source software for exploring and manipulating networks. In Proceedings of the International AAAI Conference on Weblogs and Social Media.Google Scholar
- Joshua D. Batson, Daniel A. Spielman, Nikhil Srivastava, and Shang-Hua Teng. 2013. Spectral sparsification of graphs: Theory and algorithms. Commun. ACM 56, 8 (2013), 87--94. Google ScholarDigital Library
- Enrico Bertini and Giuseppe Santucci. 2004. By chance is not enough: Preserving relative density through non uniform sampling. In Proceedings of the Information Visualisation Conference. Google ScholarDigital Library
- Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: Compression techniques. In Proceedings of the International World Wide Web Conference. 595--602. Google ScholarDigital Library
- Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D3 data-driven documents. IEEE Trans. Vis. Comput. Graph. 17, 12 (2011), 2301--2309. Google ScholarDigital Library
- Ivan Brugere, Brian Gallagher, and Tanya Y. Berger-Wolf. 2016. Network structure inference, a survey: Motivations, methods, and applications. ACM Comput. Surv. 51, 2, Article 24. http://arxiv.org/abs/1610.00782. Google ScholarDigital Library
- Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 95--106. Google ScholarDigital Library
- Gemma Casas-Garriga. 2005. Summarizing sequential data with closed partial orders. In Proceedings of the SIAM International Conference on Data Mining (SDM’05). 380--391.Google ScholarCross Ref
- Šelja Čebirić, François Goasdoué, and Ioana Manolescu. 2015. Query-oriented summarization of RDF graphs. Proc. VLDB Endow. 8, 12 (2015), 2012--2015. Google ScholarDigital Library
- Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, and Christos Faloutsos. 2004. Fully automatic cross-associations. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’04). 79--88. Google ScholarDigital Library
- Varun Chandola and Vipin Kumar. 2005. Summarization -- Compressing data into an informative representation. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’05). 98--105. Google ScholarDigital Library
- Duen Horng Chau, Aniket Kittur, Jason I. Hong, and Christos Faloutsos. 2011. Apolo: Making sense of large network data by combining rich user interaction and machine learning. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’11).Google ScholarDigital Library
- Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, and Jiawei Han. 2009. Mining graph patterns efficiently via randomized summaries. Proc. VLDB Endow. 2, 1 (2009), 742--753. Google ScholarDigital Library
- Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. 2009. On compressing social networks. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’09). 219--228. Google ScholarDigital Library
- Yongwook Choi and Wojciech Szpankowski. 2012. Compression of graphical structures: Fundamental limits, algorithms, and experiments. IEEE Trans. Inf. Theory 58, 2 (2012), 620--638. Google ScholarDigital Library
- Rudi Cilibrasi and Paul Vitányi. 2005. Clustering by compression. IEEE Trans. Inf. Theory 51, 4 (2005), 1523--1545. Google ScholarDigital Library
- Diane J. Cook and Lawrence B. Holder. 1994. Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. 1 (1994), 231--255. Google ScholarCross Ref
- Graham Cormode and S. Muthukrishnan. 2005a. An improved data stream summary: The count-min sketch and its applications. J. Algor. 55, 1 (2005), 58--75. Google ScholarDigital Library
- Graham Cormode and S. Muthukrishnan. 2005b. Summarizing and mining skewed data streams. In Proceedings of the SIAM international Conference on Data Mining (SDM’05).Google Scholar
- Corinna Cortes, Daryl Pregibon, and Chris Volinsky. 2001. Communities of interest. In Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis. 105--114. Google ScholarDigital Library
- Uros Damnjanovic, Virginia Fernandez Arguedas, Ebroul Izquierdo, and José M. Martínez. 2008. Event detection and clustering for surveillance video summarization. In Proceedingsof the 9th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS’08). IEEE, 63--66. Google ScholarDigital Library
- Pedro O. S. Vaz de Melo, Leman Akoglu, Christos Faloutsos, and Antonio Alfredo Ferreira Loureiro. 2010. Surprising patterns for the call duration distribution of mobile phone users. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD’10). 354--369. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI’04). 10. Google ScholarDigital Library
- Pravallika Devineni, Danai Koutra, Michalis Faloutsos, and Christos Faloutsos. 2015. If walls could talk: Patterns and anomalies in Facebook wallposts. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’15). 367--374. Google ScholarDigital Library
- Inderjit Dhillon, Yuqiang Guan, and Brian Kulis. 2005. A fast kernel-based multilevel algorithm for graph clustering. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’05). ACM, 629--634. Google ScholarDigital Library
- Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and Alon Shalita. 2016. Compressing graphs and indexes with recursive graph bisection. arXiv:1602.08820. Google ScholarDigital Library
- P. Drineas, R. Kannan, and M. W. Mahoney. 2006. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM J. Comput. 36, 1 (2006), 184--206. Google ScholarDigital Library
- Cody Dunne and Ben Shneiderman. 2013. Motif simplification: Improving network visualization readability with fan, connector, and clique glyphs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). ACM, 3247--3256. Google ScholarDigital Library
- P. Elias. 2006. Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theor. 21, 2 (2006), 194--203. Google ScholarDigital Library
- Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 157--168. Google ScholarDigital Library
- Wenfei Fan, Xin Wang, and Yinghui Wu. 2013. Diversified top-k graph pattern matching. Proc. VLDB Endow. 6, 13 (2013), 1510--1521. Google ScholarDigital Library
- Jing Feng, Xiao He, Nina Hubig, Christian Böhm, and Claudia Plant. 2013. Compression-based graph mining exploiting structure primitives. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’13). IEEE, 181--190.Google ScholarCross Ref
- Jure Ferlez, Christos Faloutsos, Jure Leskovec, Dunja Mladenic, and Marko Grobelnik. 2008. Monitoring network evolution using MDL. In Proceedings of the 24th International Conference on Data Engineering (ICDE’08). 1328--1330. Google ScholarDigital Library
- Wenjie Fu, Le Song, and Eric P. Xing. 2009. Dynamic mixed membership blockmodel for evolving networks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). ACM, 329--336. Google ScholarDigital Library
- Johannes Gehrke, Edward Lui, and Rafael Pass. 2003. Towards privacy for social networks: A zero-knowledge based definition of privacy. In Proceedings of the 8th Conference on Theory of Cryptography. 432--449. Google ScholarDigital Library
- Mina Ghashami, Edo Liberty, and Jeff M. Phillips. 2016. Efficient frequent directions algorithm for sparse matrices. arXiv:1602.00412.Google Scholar
- Sean Gilpin, Tina Eliassi-Rad, and Ian Davidson. 2013. Guided learning for role discovery (gLRD): Framework, algorithms, and applications. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, 113--121. Google ScholarDigital Library
- Oshini Goonetilleke, Danai Koutra, Timos Sellis, and Kewen Liao. 2017. Edge labeling schemes for graph data. In Proceedings of the Scientific and Statistical Database Management Conference (SSDBM’17). ACM, Article 12, 12 pages. Google ScholarDigital Library
- Robert Görke, Pascal Maillard, Christian Staudt, and Dorothea Wagner. 2010. Modularity-driven clustering of dynamic graphs. In Proceedings of the 9th International Symposium on Experimental Algorithms (SEA’10). 436--448. Google ScholarDigital Library
- Szymon Grabowski and Wojciech Bieniecki. 2014. Tight and simple web graph compression for forward and reverse neighbor queries. Discr. Appl. Math. 163 (2014), 298--306. Google ScholarDigital Library
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’16). ACM. Google ScholarDigital Library
- Carlos Guestrin, Daphne Koller, Chris Gearhart, and Neal Kanodia. 2003. Generalizing plans to new environments in relational MDPs. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03). Google ScholarDigital Library
- Mohammad Al Hasan, Nesreen K. Ahmed, and Jennifer Neville. 2013. Network Sampling: Methods and Applications. Retrieved from https://www.cs.purdue.edu/homes/neville/courses/NetworkSampling-KDD13-final.pdf.Google ScholarDigital Library
- Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo. 2013. Probabilistic graph summarization. In Web-Age Information Management. Springer, 545--556. Google ScholarDigital Library
- Mark Heimann, Haoming Shen, and Danai Koutra. 2018. Node representation learning for multiple networks: The case of graph alignment. arXiv:1802.06257.Google Scholar
- Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. 2012. RolX: Structural role extraction 8 mining in large graphs. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’12). ACM, 1231--1239. Google ScholarDigital Library
- Keith Henderson, Brian Gallagher, Lei Li, Leman Akoglu, Tina Eliassi-Rad, Hanghang Tong, and Christos Faloutsos. 2011. It’s who you know: Graph mining using recursive structural features. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, 663--671. Google ScholarDigital Library
- Wilko Henecka and Matthew Roughan. 2015. Lossy compression of dynamic, weighted graphs. In Proceedings of the 2015 3rd International Conference on Future Internet of Things and Cloud (FiCloud’15). 427--434. Google ScholarDigital Library
- Shawndra Hill, Deepak Agarwal, Robert Bell, and Chris Volinsky. 2006. Building an effective representation for dynamic networks. J. Comput. Graph. Stat. 15 (2006), 1--25.Google ScholarCross Ref
- Christian Hübler, Hans-Peter Kriegel, Karsten Borgwardt, and Zoubin Ghahramani. 2008. Metropolis algorithms for representative subgraph sampling. In Proceedings of the 2008 8th IEEE International Conference on Data Mining (ICDM’08). IEEE, 283--292. Google ScholarDigital Library
- Di Jin and Danai Koutra. 2017a. Exploratory analysis of graph data by leveraging domain knowledge. In Proceedings of the 2017 IEEE International Conference on Data Mining. 187--196.Google ScholarCross Ref
- Di Jin, Aristotelis Leventidis, Haoming Shen, Ruowang Zhang, Junyue Wu, and Danai Koutra. 2017. PERSEUS-HUB: Interactive and collective exploration of large-scale graphs. Informatics 4, 3 (2017), 22.Google ScholarCross Ref
- Lisa Jin and Danai Koutra. 2017b. ECOviz: Comparative visualization of time-evolving network summaries. In Proceedings of the ACM SIGKDD 2017 Workshop on Interactive Data Exploration and Analytics.Google Scholar
- U. Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. 2014. Net-ray: Visualizing and mining web-scale graphs. In Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’14).Google Scholar
- U. Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. 2011. Gbase: A scalable and general graph management system. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, 1091--1099. Google ScholarDigital Library
- U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. 2009. PEGASUS: A peta-scale graph mining system—implementation and observations. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’09). Google ScholarDigital Library
- George Karypis and Vipin Kumar. 1999. Multilevel k-way hypergraph partitioning. In Proceedings of the 36th Annual ACM/IEEE Design Automation Conference. 343--348. Google ScholarDigital Library
- David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM. Google ScholarDigital Library
- Kifayat Ullah Khan, Waqas Nawaz, and Young-Koo Lee. 2014. Set-based unified approach for attributed graph summarization. In Proceedings of the IEEE 4th International Conference on Big Data and Cloud Computing (BdCloud’14). IEEE, 378--385. Google ScholarDigital Library
- Mikko Kivel, Alex Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno, and Mason A. Porter. 2014. Multilayer networks. J. Complex Netw. 2, 3 (2014), 203--271.Google ScholarCross Ref
- Arne Koopman and Arno Siebes. 2008. Discovering relational items sets efficiently. In Proceedings of the SIAM International Conference on Data Mining (SDM’08). 108--119.Google ScholarCross Ref
- Arne Koopman and Arno Siebes. 2009. Characteristic relational patterns. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’09). 437--446. Google ScholarDigital Library
- Danai Koutra, Abhilash Dighe, Smriti Bhagat, Udi Weinsberg, Stratis Ioannidis, Christos Faloutsos, and Jean Bolot. 2017. PNP: Fast path ensemble method for movie design. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’17). Google ScholarDigital Library
- Danai Koutra and Christos Faloutsos. 2017. Individual and Collective Graph Mining: Principles, Algorithms, and Applications. Synthesis Lectures on Data Mining and Knowledge Discovery. Morgan 8 Claypool. Google ScholarDigital Library
- Danai Koutra, Di Jin, Yuanchi Ning, and Christos Faloutsos. 2015. Perseus: An interactive large-scale graph mining and visualization tool. Proc. VLDB Endow. 8, 12, 1924--1927. Google ScholarDigital Library
- Danai Koutra, U. Kang, Jilles Vreeken, and Christos Faloutsos. 2014b. VoG: Summarizing and understanding large graphs. In Proceedings of the SIAM international Conference on Data Mining (SDM’14). 91--99.Google ScholarCross Ref
- Danai Koutra, U. Kang, Jilles Vreeken, and Christos Faloutsos. 2014a. VoG: Summarizing and understanding large graphs. In Proceedings of the SIAM international Conference on Data Mining (SDM’14). SIAM.Google ScholarCross Ref
- Danai Koutra, Vasileios Koutras, B. Aditya Prakash, and Christos Faloutsos. 2013. Patterns amongst competing task frequencies: Super-linearities, and the almond-DG model. In Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13). 201--212.Google ScholarCross Ref
- Danai Koutra, Evangelos E. Papalexakis, and Christos Faloutsos. 2012. TensorSplat: Spotting latent anomalies in time. In Proceedings of the 2012 16th Panhellenic Conference on Informatics (PCI’12). IEEE, 144--149. Google ScholarDigital Library
- Danai Koutra, Neil Shah, Joshua Vogelstein, Brian Gallagher, and Christos Faloutsos. 2015. DeltaCon: A principled massive-graph similarity function with attribution. ACM Trans. Knowl. Discov. Data 10, 3, Article 28. Google ScholarDigital Library
- Danai Koutra, Joshua Vogelstein, and Christos Faloutsos. 2013. DeltaCon: A principled massive-graph similarity function. In Proceedings of the SIAM International Conference on Data Mining (SDM’13). 162--170.Google ScholarCross Ref
- Luigi Laura, Stefano Leonardi, Guido Caldarelli, and Paolo De Los Rios. 2002. A multi-layer model for the web graph. In On-Line Proceedings of the 2nd International Workshop on Web Dynamics.Google Scholar
- Matthijs Leeuwen van Leeuwen, Jilles Vreeken, and Arno Siebes. 2006. Compression picks the item sets that matter. In Proceedings of the Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’06). 585--592.Google ScholarCross Ref
- Kristen LeFevre and Evimaria Terzi. 2010. GraSS: Graph structure summarization. In Proceedings of the SIAM International Conference on Data Mining (SDM’10). 454--465.Google ScholarCross Ref
- Jure Leskovec, Lars Backstrom, Ravi Kumar, and Andrew Tomkins. 2008. Microscopic evolution of social networks. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’08). 462--470. Google ScholarDigital Library
- Jure Leskovec and Christos Faloutsos. 2007. Scalable modeling of real graphs using Kronecker multiplication. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 497--504. Google ScholarDigital Library
- Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 177--187. Google ScholarDigital Library
- Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets. Cambridge University Press. Google ScholarDigital Library
- Cheng-Te Li and Shou-De Lin. 2009. Egocentric information abstraction for heterogeneous social networks. In Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’09). IEEE, 255--260. Google ScholarDigital Library
- Edo Liberty. 2013. Simple and deterministic matrix sketching. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, 581--588. Google ScholarDigital Library
- Yongsub Lim, U. Kang, and Christos Faloutsos. 2014. SlashBurn: Graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26, 12 (2014), 3077--3089.Google ScholarCross Ref
- Shou-De Lin, Mi-Yen Yeh, and Cheng-Te Li. 2013. Sampling and summarization for social networks. In Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13).Google Scholar
- Xuemin Lin, Qing Liu, Yidong Yuan, and Xiaofang Zhou. 2003. Multiscale histograms: Summarizing topological relations in large spatial datasets. In Proceedings of the International Conference on Very Large Databases (VLDB’03). 814--825. Google ScholarDigital Library
- Yu-Ru Lin, Hari Sundaram, and Aisling Kelliher. 2008. Summarization of social activity over time: People, actions and concepts in dynamic networks. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’08). 1379--1380. Google ScholarDigital Library
- Bing Liu, Wynne Hsu, and Yiming Ma. 1999. Pruning and summarizing the discovered associations. In Proceedings of the 5th ACM SIGKDD International Conference Knowledge Discovery and Data Mining (KDD’99). 145--154. Google ScholarDigital Library
- Chunyang Liu and Ling Chen. 2016. Summarizing uncertain transaction databases by probabilistic tiles. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’16). IEEE, 4375--4382.Google ScholarCross Ref
- Wei Liu, Andrey Kan, Jeffrey Chan, James Bailey, Christopher Leckie, Jian Pei, and Ramamohanarao Kotagiri. 2012. On compressing weighted time-evolving graphs. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, 2319--2322. Google ScholarDigital Library
- Yike Liu, Neil Shah, and Danai Koutra. 2015. An empirical comparison of the summarization power of graph clustering methods. arXiv:1511.06820.Google Scholar
- Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8 (2012), 716--727. Google ScholarDigital Library
- Antonio Maccioni and Daniel J. Abadi. 2016. Scalable pattern matching over compressed graphs via dedensification. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, 1755--1764. Google ScholarDigital Library
- Arun S. Maiya and Tanya Y. Berger-Wolf. 2010. Sampling community structure. In Proceedings of the 25th International Conference Conference on the World Wide Web (WWW’10). ACM, 701--710. Google ScholarDigital Library
- M. Mampaey, J. Vreeken, and N. Tatti. 2011. Summarizing Data with Itemsets Using Maximum Entropy Models. Technical Report 2011/02. University of Antwerp.Google Scholar
- Sebastian Maneth and Fabian Peternek. 2016. Compressing graphs by grammars. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE’16). IEEE, 109--120.Google ScholarCross Ref
- Koji Maruhashi, Fan Guo, and Christos Faloutsos. 2011. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining. 203--210. Google ScholarDigital Library
- Hossein Maserrat and Jian Pei. 2010. Neighbor query friendly compression of social networks. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’10). Google ScholarDigital Library
- Hossein Maserrat and Jian Pei. 2012. Community preserving lossy compression of social networks. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’12). IEEE, 509--518. Google ScholarDigital Library
- Michael Mathioudakis, Francesco Bonchi, Carlos Castillo, Aristides Gionis, and Antti Ukkonen. 2011. Sparsification of influence networks. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’11). 529--537. Google ScholarDigital Library
- Yasir Mehmood, Nicola Barbieri, Francesco Bonchi, and Antti Ukkonen. 2013. Csi: Community-level social influence analysis. In Machine Learning and Knowledge Discovery in Databases. Springer, 48--63.Google Scholar
- Pauli Miettinen and Jilles Vreeken. 2011. Model order selection for boolean matrix factorization. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’11). 51--59. Google ScholarDigital Library
- Pauli Miettinen and Jilles Vreeken. 2014. mdl4bmf: Minimum description length for Boolean matrix factorization. ACM Trans. Knowl. Discov. Data 8, 4 (2014), 1--30. Google ScholarDigital Library
- Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph summarization with bounded error. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD’08). 419--432. Google ScholarDigital Library
- Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 2 (2004), 026113+.Google Scholar
- Carlos Ordonez, Norberto Ezquerra, and Cesar A. Santana. 2006. Constraining and summarizing association rules in medical data. Knowl. Inf. Syst. 9, 3 (2006), 259--283. Google ScholarDigital Library
- Themis Palpanas, Michail Vlachos, Eamonn J. Keogh, and Dimitrios Gunopulos. 2008. Streaming time series summarization using user-defined amnesic functions. IEEE Trans. Knowl. Data Eng. 20, 7 (2008), 992--1006. Google ScholarDigital Library
- Jia-Yu Pan, Hyung-Jeong Yang, and Christos Faloutsos. 2004. MMSS: Multi-modal story-oriented video summarization. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’04). 491--494. Google ScholarDigital Library
- Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining cross-graph quasi-cliques. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’05). 228--238. Google ScholarDigital Library
- David Peleg and Alejandro A. Schäffer. 1989. Graph spanners. J. Graph Theory 13, 1 (1989), 99--116.Google ScholarCross Ref
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, 701--710. Google ScholarDigital Library
- Robert Pienta, Acar Tamersoy, Hanghang Tong, and Duen Horng Chau. 2014. MAGE: Matching approximate patterns in richly-attributed graphs. In Proceedings of the 2014 IEEE International Conference on Big Data. 585--590.Google ScholarCross Ref
- B. Aditya Prakash, Jilles Vreeken, and Christos Faloutsos. 2012. Spotting culprits in epidemics: How many and which ones? In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’12). IEEE. Google ScholarDigital Library
- M. Purohit, B. A. Prakash, C. Kang, Y. Zhang, and V. S. Subrahmanian. 2014. Fast influence-based coarsening for large networks. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, 1296--1305. Google ScholarDigital Library
- Qiang Qu, Siyuan Liu, Christian S. Jensen, Feida Zhu, and Christos Faloutsos. 2014. Interestingness-driven diffusion process summarization in dynamic networks. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD’14). 597--613. Google ScholarDigital Library
- Davood Rafiei and Stephen Curial. 2005. Effectively visualizing large networks through sampling. In Proceedings of the 16th IEEE Visualization Conference (VIS’05). 48.Google Scholar
- Sriram Raghavan and Hector Garcia-Molina. 2003. Representing web graphs. In Proceedings of the 2003 IEEE International Conference on Data Engineering (ICDE’03). IEEE, 405--416.Google Scholar
- Xuguang Ren and Junhu Wang. 2015. Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endow. 8, 5 (2015), 617--628. Google ScholarDigital Library
- Leonardo F. R. Ribeiro, Pedro H. P. Saverese, and Daniel R. Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’17). ACM, 385--394. Google ScholarDigital Library
- Matteo Riondato, David García-Soriano, and Francesco Bonchi. 2014. Graph summarization with quality guarantees. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’14). IEEE, 947--952. Google ScholarDigital Library
- Ryan Rossi, Brian Gallagher, Jennifer Neville, and Keith Henderson. 2012. Role-dynamics: Fast mining of large dynamic networks. InProceedings of the 25th International Conference Companion on the World Wide Web (WWW’12 Companion). ACM, 997--1006. Google ScholarDigital Library
- T. Safavi, C. Sripada, and D. Koutra. 2017. Scalable hashing-based network discovery. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’17). 405--414.Google Scholar
- Barna Saha and Pabitra Mitra. 2007. Dynamic algorithm for graph clustering using minimum cut tree. In Proceedings of the SIAM International Conference on Data Mining (SDM’07). 581--586. Google ScholarDigital Library
- Neil Shah, Danai Koutra, Lisa Jin, Tianmin Zou, Brian Gallagher, and Christos Faloutsos. 2017. On summarizing large-scale dynamic graphs. IEEE Data Eng. Bull. 40, 3 (2017), 75--88.Google Scholar
- Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, and Christos Faloutsos. 2015. TimeCrunch: Interpretable dynamic graph summarization. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’15). Google ScholarDigital Library
- P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski, and T. Ideker. 2003. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 11 (2003), 2498.Google ScholarCross Ref
- Umang Sharan and Jennifer Neville. 2008. Temporal-relational classifiers for prediction in evolving domains. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’08). 540--549. Google ScholarDigital Library
- Z. Shen, K.-L. Ma, and T. Eliassi-Rad. 2006. Visual analysis of large heterogeneous social networks by semantic and structural abstraction. IEEE Trans. Vis. Comput. Graph. 12, 6 (2006), 1427--1439. Google ScholarDigital Library
- Lei Shi, Hanghang Tong, Jie Tang, and Chuang Lin. 2015. VEGAS: Visual influence graph summarization on citation networks. IEEE Trans. Knowl. Data Eng. 27, 12 (2015), 3417--3431. Google ScholarDigital Library
- Ben Shneiderman. 2008. Extreme visualization: Squeezing a billion records into a million pixels. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD’08). Google ScholarDigital Library
- Mahsa Shoaran, Alex Thomo, and Jens H. Weber-Jahnke. 2013. Zero-knowledge private graph summarization. In Proceedings of the IEEE International Conference on Big Data. IEEE, 597--605.Google Scholar
- Koen Smets and Jilles Vreeken. 2011. The odd one out: Identifying and characterising anomalies. In Proceedings of the SIAM International Conference on Data Mining (SDM’11). 804--815.Google ScholarCross Ref
- Qi Song, Yinhui Wu, and Xin Luna Dong. 2016. Mining summaries for knowledge graph search. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). 1215--1220.Google ScholarCross Ref
- Sucheta Soundarajan, Acar Tamersoy, Elias B. Khalil, Tina Eliassi-Rad, Duen Horng Chau, Brian Gallagher, and Kevin Roundy. 2016. Generating graph snapshots from streaming edge data. In Proceedings of the 25th International Conference Companion on the World Wide Web. 109--110. Google ScholarDigital Library
- Daniel A. Spielman and Nikhil Srivastava. 2011. Graph sparsification by effective resistances. SIAM J. Comput. 40, 6 (2011), 1913--1926. Google ScholarDigital Library
- Olaf Sporns. 2010. Networks of the Brain. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and Philip S. Yu. 2007. GraphScope: Parameter-free mining of large time-evolving graphs. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’07). ACM, 687--696. Google ScholarDigital Library
- Jimeng Sun, Charalampos E. Tsourakakis, Evan Hoke, Christos Faloutsos, and Tina Eliassi-Rad. 2008. Two heads better than one: Pattern discovery in time-evolving multi-aspect data. Data Min. Knowl. Discov. 17, 1 (2008), 111--128. Google ScholarDigital Library
- Jimeng Sun, Yinglian Xie, Hui Zhang, and Christos Faloutsos. 2007. Less is more: Compact matrix decomposition for large sparse graphs. In Proceedings of the SIAM International Conference on Data Mining (SDM’07).Google ScholarCross Ref
- Yizhou Sun and Jiawei Han. 2012. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan 8 Claypool. Google ScholarDigital Library
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 1067--1077. Google ScholarDigital Library
- Nan Tang, Qing Chen, and Prasenjit Mitra. 2016. Graph stream summarization: From big bang to big crunch. In Proceedings of the 2016 International Conference on Management of Data. ACM, 1481--1496. Google ScholarDigital Library
- Chayant Tantipathananandh and Tanya Berger-Wolf. 2011. Finding communities in dynamic social networks. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’11). IEEE, 1236--1241. Google ScholarDigital Library
- Lini T. Thomas, Satyanarayana R. Valluri, and Kamalakar Karlapalem. 2010. MARGIN: Maximal frequent subgraph mining. ACM Trans. Knowl. Discov. Data 4, 3 (2010), 10:1--10:42. Google ScholarDigital Library
- Yuanyuan Tian, Richard A. Hankins, and Jignesh M. Patel. 2008. Efficient aggregation for graph summarization. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD’08). ACM, 567--580. Google ScholarDigital Library
- Yuanyuan Tian and Jignesh M. Patel. 2008. TALE: A tool for approximate large graph matching. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering. 963--972. Google ScholarDigital Library
- Hannu Toivonen, Fang Zhou, Aleksi Hartikainen, and Atte Hinkka. 2011. Compression of weighted graphs. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’11). 965--973. Google ScholarDigital Library
- Hanghang Tong, Christos Faloutsos, Brian Gallagher, and Tina Eliassi-Rad. 2007. Fast best-effort pattern matching in large attributed graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 737--746. Google ScholarDigital Library
- Jilles Vreeken, Matthijs van Leeuwen, and Arno Siebes. 2011. Krimp: Mining itemsets that compress. Data Min. Knowl. Disc. 23, 1 (2011), 169--214. Google ScholarDigital Library
- Bianca Wackersreuther, Peter Wackersreuther, Annahita Oswald, Christian Böhm, and Karsten M. Borgwardt. 2010. Frequent subgraph discovery in dynamic networks. In Proceedings of the 8th Workshop on Mining and Learning with Graphs. ACM, 155--162. Google ScholarDigital Library
- Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’16). Google ScholarDigital Library
- Jianyong Wang and George Karypis. 2004. SUMMARY: Efficiently summarizing transactions for clustering. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’04). 241--248. Google ScholarDigital Library
- S. Wasserman and J. Galaskiewicz. 1994. Advances in Social Network Analysis: Research in the Social and Behavioral Sciences. SAGE Publications.Google Scholar
- Ye Wu, Zhinong Zhong, Wei Xiong, and Ning Jing. 2014. Graph summarization for attributed graphs. In Proceedings of the International Conference on Information Science, Electronics, and Electrical Engineering (ISEEE’14). IEEE, 503--507.Google ScholarCross Ref
- Yang Xiang, Ruoming Jin, David Fuhry, and Feodor Dragan. 2010. Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Disc. 23, 2, 215--251. Google ScholarDigital Library
- Kevin S. Xu, Mark Kliger, and Alfred O. Hero III. 2011. Tracking communities in dynamic social networks. In Proceedings of the 4th International Conference on Social Computing, Behavioral-Cultural Modeling, 8 Prediction (SBP’11). 219--226. Google ScholarDigital Library
- Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, and James Cheng. 2012. A model-based approach to attributed graph clustering. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12). ACM, 505--516. Google ScholarDigital Library
- Xifeng Yan, Hong Cheng, Jiawei Han, and Dong Xin. 2005. Summarizing itemset patterns: A profile-based approach. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’05). 314--323. Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. 2013. Overlapping community detection at scale: A nonnegative matrix factorization approach. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM’13). ACM, 587--596. Google ScholarDigital Library
- Jaewon Yang, Julian McAuley, and Jure Leskovec. 2013. Community detection in networks with node attributes. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’13). IEEE, 1151--1156.Google ScholarCross Ref
- Liu Yang, Susan T. Dumais, Paul N. Bennett, and Ahmed Hassan Awadallah. 2017. Characterizing and predicting enterprise email reply behavior. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). ACM, 235--244. Google ScholarDigital Library
- Jinguo You, Qiuping Pan, Wei Shi, Zhipeng Zhang, and Jianhua Hu. 2013. Towards graph summary and aggregation: A survey. In Social Media Retrieval and Mining. Springer, 3--12.Google Scholar
- Ning Zhang, Yuanyuan Tian, and Jignesh M. Patel. 2010. Discovery-driven graph summarization. In Proceedings of the 2003 IEEE International Conference on Data Engineering (ICDE’10). 880--891.Google Scholar
- Peixiang Zhao, Charu C. Aggarwal, and Min Wang. 2011. gSketch: On query estimation in graph streams. Proc. VLDB Endow. 5, 3 (2011), 193--204. Google ScholarDigital Library
- Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2, 1 (2009), 718--729. Google ScholarDigital Library
- Linhong Zhu, Majid Ghasemi-Gol, Pedro Szekely, Aram Galstyan, and Craig A. Knoblock. 2016. Unsupervised entity resolution on multi-type graphs. In Proceedings of the International Semantic Web Conference. 649--667.Google Scholar
Index Terms
- Graph Summarization Methods and Applications: A Survey
Recommendations
Incremental Lossless Graph Summarization
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningGiven a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot? As large-scale graphs are prevalent, concisely representing them is inevitable ...
Graph Interpretation, Summarization and Visualization Techniques: A Review and Open Research Issues
AbstractGraphs has been a ubiquitous way of representing heterogeneous data. There are many studies focused on graph learning highlighting the approaches for graph data extraction, interpretation and graph summarization. Graph data summarization is ...
Opinion summarization methods
Research about some aspect-based opinion summarization methods.A new content selection strategy to produce extractive summaries is proposed.A novel NLG template-based system to generate abstractive summaries is proposed.Extractive and abstractive ...
Comments