skip to main content
research-article

Network Sampling: From Static to Streaming Graphs

Published:01 June 2013Publication History
Skip Abstract Section

Abstract

Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling by highlighting the different objectives, population and units of interest, and classes of network sampling methods. In addition, we propose a spectrum of computational models for network sampling methods, ranging from the traditionally studied model based on the assumption of a static domain to a more challenging model that is appropriate for streaming domains. We design a family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs. Furthermore, we demonstrate how traditional static sampling algorithms can be modified for graph streams for each of the three main classes of sampling methods: node, edge, and topology-based sampling. Experimental results indicate that our proposed family of sampling methods more accurately preserve the underlying properties of the graph in both static and streaming domains. Finally, we study the impact of network sampling algorithms on the parameter estimation and performance evaluation of relational classification algorithms.

References

  1. L. A. Adamic and N. Glance. 2005. The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery. 36--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. 2003. A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases. 81--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. C. Aggarwal, Y. Li, P. S. Yu, and R. Jin. 2010a. On dense pattern mining in graph streams. In Proceedings of the VLDB Endowment 3, 1--2 (2010), 975--984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Aggarwal, Y. Zhao, and P. Yu. 2010b. On clustering graph streams. In SIAM International Conference on Data Mining. 478--489.Google ScholarGoogle Scholar
  5. C. C. Aggarwal, Y. Zhao, and P. S. Yu. 2011. Outlier detection in graph streams. In IEEE 27th International Conference on Data Engineering. 399--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Charu C. Aggarwal. 2006. On biased reservoir sampling in the presence of stream evolution. In Proceedings of the 32nd International Conference on Very Large Data Bases. 607--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Charu C. Aggarwal (Ed.). 2007. Data Streams - Models and Algorithms. Advances in Database Systems, Vol. 31. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. K. Ahmed, F. Berchmans, J. Neville, and R. Kompella. 2010a. Time-based sampling of social network activity graphs. In Proceedings of the 8th Workshop on Mining and Learning with Graphs. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. K. Ahmed, J. Neville, and R. Kompella. 2012a. Space-efficient sampling from social activity streams. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications. 53--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nesreen Ahmed, Jennifer Neville, and Ramana Rao Kompella. 2011. Network sampling via edge-based node selection with graph induction. Technical Report 11-016. Purdue Digital Library.Google ScholarGoogle Scholar
  11. N. K. Ahmed, J. Neville, and R. Kompella. 2010b. Reconsidering the foundations of network sampling. In Proceedings of the 2nd Workshop on Information in Networks.Google ScholarGoogle Scholar
  12. N. K. Ahmed, J. Neville, and R. Kompella. 2012b. Network sampling designs for relational classification. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 1--4.Google ScholarGoogle Scholar
  13. Y. Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. 2007. Analysis of topological characteristics of huge online social networking services. In Proceedings of the 16th International Conference on World Wide Web. 835--844. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Al Hasan and M. J. Zaki. 2009. Output space sampling for graph patterns. Proceedings of the VLDB Endowment 2, 1 (2009), 730--741. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. I. Alvarez-Hamelin, L. Dall’Asta, A. Barrat, and A. Vespignani. 2005. k-core decomposition of Internet graphs: Hierarchies, self-similarity and measurement biases. arXiv preprint cs/0511007 (2005).Google ScholarGoogle Scholar
  16. K. Avrachenkov, B. Ribeiro, and D. Towsley. 2010. Improving random walk estimation accuracy with uniform restarts. In Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science, Vol. 6516. Springer, 98--109.Google ScholarGoogle Scholar
  17. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. 2002a. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Babcock, M. Datar, and R. Motwani. 2002b. Sampling from a moving window over streaming data. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. 633--634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Backstrom and J. Kleinberg. 2011. Network bucket testing. In Proceedings of the 20th International Conference on World Wide Web. 615--624. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic. 2012. The role of social networks in information diffusion. In Proceedings of the 21st International Conference on World Wide Web. 519--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. 623--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Baykan, M. Henzinger, S. F. Keller, S. De Castelberg, and M. Kinzler. 2009. A comparison of techniques for sampling web pages. Arxiv preprint arXiv:0902.1604 (2009).Google ScholarGoogle Scholar
  23. Mansurul A. Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, and Mohammad Al Hasan. 2012. GUISE: Uniform sampling of graphlets for large graph analysis. In IEEE 12th International Conference on Data Mining. 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. L. Buchsbaum, R. Giancarlo, and J. R. Westbrook. 2003. On finding common neighborhoods in massive graphs. Theoretical Computer Science 299, 1 (2003), 707--718. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. 2006. Counting triangles in data streams. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Carmi, S. Havlin, S. Kirkpatrick, Y. Shavitt, and E. Shir. 2007. A model of internet topology using k-shell decomposition. Proceedings of the National Academy of Sciences 104, 27 (2007), 11150--11154.Google ScholarGoogle ScholarCross RefCross Ref
  27. D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A recursive model for graph mining. In SIAM International Conference on Data Mining. 442--446.Google ScholarGoogle Scholar
  28. M. Charikar, K. Chen, and M. Farach-Colton. 2002. Finding frequent items in data streams. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming. 693--703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Chen and C. Wang. 2010. Continuous subgraph pattern search over certain and uncertain graph streams. IEEE Transactions on Knowledge and Data Engineering 22, 8 (2010), 1093--1109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Conover, J. Ratkiewicz, M. Francisco, B Gonçalves, A. Flammini, and F. Menczer. 2011. Political polarization on twitter. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 89--96.Google ScholarGoogle Scholar
  31. G. Cormode and S. Muthukrishnan. 2005. Space efficient mining of multigraph streams. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 271--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Dasgupta, R. Kumar, and D. Sivakumar. 2012. Social sampling. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 235--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. De Choudhury, Y. R. Lin, H. Sundaram, K. S. Candan, L. Xie, and A. Kelliher. 2010. How does the data sampling strategy impact the discovery of information diffusion in social media. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 34--41.Google ScholarGoogle Scholar
  34. P. Domingos and G. Hulten. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Fan. 2004a. StreamMiner: A classifier ensemble-based engine to mine concept-drifting data streams. In Proceedings of the 30th International Conference on Very Large Data Bases - Volume 30. 1257--1260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W. Fan. 2004b. Systematic data selection to mine concept-drifting data streams. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 128--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. Fan, Y. Huang, H. Wang, and P. S. Yu. 2004. Active mining of data streams. In SIAM International Conference on Data Mining. 457--461.Google ScholarGoogle Scholar
  38. O. Frank. 1977. Survey sampling in graphs. Journal of Statistical Planning and Inference 1, 3 (1977), 235--264.Google ScholarGoogle ScholarCross RefCross Ref
  39. O. Frank. 1980. Sampling and inference in a population graph. International Statistical Review/Revue Internationale de Statistique 48, 1 (1980), 33--41.Google ScholarGoogle ScholarCross RefCross Ref
  40. O. Frank. 1981. A survey of statistical methods for graph analysis. Sociological Methodology 12 (1981), 110--155.Google ScholarGoogle ScholarCross RefCross Ref
  41. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. 1999. Learning probabilistic relational models. In Proceedings of the 16th International Joint Conference on Artificial Intelligence. 1300--1309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy. 2005. Mining data streams: A review. ACM Sigmod Record 34, 2 (2005), 18--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Gao, W. Fan, J. Han, and P. S. Yu. 2007. A general framework for mining concept-drifting data streams with skewed distributions. In SIAM International Conference on Data Mining. 3--14.Google ScholarGoogle Scholar
  44. Aurelien Gautreau, Alain Barrat, and Marc Barthélemy. 2009. Microdynamics in stationary complex networks. Proceedings of the National Academy of Sciences 106, 22 (2009), 8847--8852.Google ScholarGoogle ScholarCross RefCross Ref
  45. K. J. Gile and M. S. Handcock. 2010. Respondent-driven sampling: An assessment of current methodology. Sociological Methodology 40, 1 (2010), 285--327.Google ScholarGoogle ScholarCross RefCross Ref
  46. M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. In IEEE International Conference on Computer Communications. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. 2011. Practical recommendations on crawling online social networks. IEEE Journal on Selected Areas in Communications 29, 9 (2011), 1872--1892.Google ScholarGoogle ScholarCross RefCross Ref
  48. C. Gkantsidis, M. Mihail, and A. Saberi. 2004. Random walks in peer-to-peer networks. In IEEE International Conference on Computer Communications. 1--12.Google ScholarGoogle Scholar
  49. David F. Gleich. 2012. Graph of Flickr Photo-Sharing Social Network Crawled in May 2006. (Feb 2012). https://research.hub.purdue.edu/publications/1002.Google ScholarGoogle Scholar
  50. L. Golab and M. T. Özsu. 2003. Issues in data stream management. ACM Sigmod Record 32, 2 (2003), 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. L. A. Goodman. 1961. Snowball sampling. The Annals of Mathematical Statistics 32, 1 (1961), 148--170.Google ScholarGoogle ScholarCross RefCross Ref
  52. M. Granovetter. 1976. Network sampling: Some first steps. American Journal of Sociology 81, 6 (1976), 1287--1303.Google ScholarGoogle ScholarCross RefCross Ref
  53. S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. 2003. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15, 3 (2003), 515--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. W. H. Haemers. 1995. Interlacing eigenvalues and graphs. Linear Algebra Applications 226 (1995), 593--616.Google ScholarGoogle ScholarCross RefCross Ref
  55. M. H. Hansen and W. N. Hurwitz. 1943. On the theory of sampling from finite populations. The Annals of Mathematical Statistics 14, 4 (1943), 333--362.Google ScholarGoogle ScholarCross RefCross Ref
  56. D. D. Heckathorn. 1997. Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems 2 (1997), 174--199.Google ScholarGoogle ScholarCross RefCross Ref
  57. M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. 2000. On near-uniform URL sampling. Computer Networks 33, 1 (2000), 295--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. M. Henzinger, P. Raghavan, and S. Rajagopalan. 1999. Computing on data streams. In External Memory Algorithms: Dimacs Workshop External Memory and Visualization, Vol. 50. 107--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Christian Hubler, Hans-Peter Kriegel, Karsten M. Borgwardt, and Zoubin Ghahramani. 2008. Metropolis algorithms for representative subgraph sampling. In IEEE 8th International Conference on Data Mining. 283--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. G. Hulten, L. Spencer, and P. Domingos. 2001. Mining time-changing data streams. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Lorenzo Isella, Juliette Stehlé, Alain Barrat, Ciro Cattuto, Jean-François Pinton, and Wouter Van den Broeck. 2011. What’s in a crowd? Analysis of face-to-face behavioral networks. Journal of Theoretical Biology 271, 1 (2011), 166--180.Google ScholarGoogle ScholarCross RefCross Ref
  62. Y. Jia, J. Hoberock, M. Garland, and J. Hart. 2008. On the visualization of social and other scale-free networks. IEEE Transactions on Visualization and Computer Graphics 14, 6 (2008), 1285--1292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. 1999. The web as a graph: Measurements, models, and methods. In Computing and Combinatorics. Lecture Notes in Computer Science, Vol. 1627. Springer, 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. E. D. Kolaczyk. 2009. Statistical Analysis of Network Data. Springer, Chapter 5: Sampling and Estimation in Network Graphs, 123--152.Google ScholarGoogle Scholar
  65. Christine Körner and Stefan Wrobel. 2005. Bias-Free hypothesis evaluation in multirelational domains. In Proceedings of the 4th International Workshop on Multi-relational Mining. 33--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. V. Krishnamurthy, M. Faloutsos, M. Chrobak, J. H. Cui, L. Lao, and A. G. Percus. 2007. Sampling large internet topologies for simulation purposes. Computer Networks 51, 15 (2007), 4284--4302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. R. Kumar, J. Novak, and A. Tomkins. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 337--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. M. Kurant, A. Markopoulou, and P. Thiran. 2011. Towards unbiased BFS sampling. IEEE Journal on Selected Areas in Communications 29, 9 (2011), 1799--1809.Google ScholarGoogle ScholarCross RefCross Ref
  69. Justin Lafferty. 2012. Facebook Users Worldwide: 3.2 Billion Likes And Comments Per Day. http://allfacebook.com/facebook-marketing-infographic-engagement_b98277. (August 2012).Google ScholarGoogle Scholar
  70. A. Lakhina, J. W. Byers, M. Crovella, and P. Xie. 2003. Sampling biases in IP topology measurements. In IEEE International Conference on Computer Communications. 332--341.Google ScholarGoogle Scholar
  71. L. Lee. 2001. On the effectiveness of the skew divergence for statistical language analysis. In Artificial Intelligence and Statistics. 65--72.Google ScholarGoogle Scholar
  72. S. Lee, P. Kim, and H. Jeong. 2006. Statistical properties of sampled networks. Physical Review E 73 (2006), 016102.Google ScholarGoogle ScholarCross RefCross Ref
  73. SNAP. 2013. Stanford Network Analysis Project. http://snap.stanford.edu/data/.Google ScholarGoogle Scholar
  74. J. Leskovec and C. Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 631--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. X. Li, P. S. Yu, B. Liu, and S. K. Ng. 2009. Positive unlabeled learning for data stream classification. In SIAM International Conference on Data Mining. 259--270.Google ScholarGoogle Scholar
  76. Xuesong Lu and Stéphane Bressan. 2012. Sampling connected induced subgraphs uniformly at random. In Scientific and Statistical Database Management. Lecture Notes in Computer Science, Vol. 7338. Springer Berlin Heidelberg, 195--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. S. Macskassy and F. Provost. 2007. Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research 8 (2007), 935--983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. A. S. Maiya and T. Y. Berger-Wolf. 2010. Sampling community structure. In Proceedings of the 19th International Conference on World Wide Web. 701--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. A. S. Maiya and T. Y. Berger-Wolf. 2011. Benefits of bias: Towards better characterization of network sampling. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 105--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Gurmeet Singh, Manku and Rajeev Motwani. 2002. Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases. 346--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. A. McGregor. 2009. Graph mining on streams. Encyclopedia of Database Systems, Springer, 1271--1275.Google ScholarGoogle Scholar
  82. Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. S. Muthukrishnan. 2005. Data streams: Algorithms and applications. Now Publishers Inc.Google ScholarGoogle Scholar
  84. J. Neville, B. Gallagher, and T. Eliassi-Rad. 2009. Evaluating statistical tests for within-network classifiers of relational data. In IEEE 9th International Conference on Data Mining. 397--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. M. E. J. Newman. 2002. Assortative mixing in networks. Physical Review Letters 89, 20 (2002), 208701.Google ScholarGoogle ScholarCross RefCross Ref
  86. M. Papagelis, G. Das, and N. Koudas. 2013. Sampling online social networks. IEEE Transactions on Knowledge and Data Engineering 25, 3 (2013), 662--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. 2009. Respondent-driven sampling for characterizing unstructured overlays. In IEEE International Conference on Computer Communications. 2701--2705.Google ScholarGoogle Scholar
  88. S. Redner. 1998. How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B-Condensed Matter and Complex Systems 4, 2 (1998), 131--134.Google ScholarGoogle ScholarCross RefCross Ref
  89. Bruno Ribeiro and Don Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. 390--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Ryan Rossi and Jennifer Neville. 2012. Time-evolving relational classification and ensemble methods. In Proceedings of the 16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining-Vol. Part I. 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Ryan A. Rossi, Luke K. McDowell, David W. Aha, and Jennifer Neville. 2012. Transforming graph data for statistical relational learning. Journal of Artificial Intelligence Research 45, 1 (2012), 363--441. Google ScholarGoogle ScholarCross RefCross Ref
  92. A. D. Sarma, S. Gollapudi, and R. Panigrahy. 2008. Estimating pagerank on graph streams. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. 2008. Collective classification in network data. AI magazine 29, 3 (2008), 93--106.Google ScholarGoogle Scholar
  94. C. Seshadhri, Ali Pinar, and Tamara G. Kolda. 2013. An in-depth analysis of stochastic Kronecker graphs. Journal of the ACM 60, 2 (2013), 1--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. M. P. H. Stumpf, C. Wiuf, and R. M. May. 2005. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences 102, 12 (2005), 4221--4224.Google ScholarGoogle ScholarCross RefCross Ref
  96. D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. 2006. On unbiased sampling for unstructured peer-to-peer networks. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement. 27--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. B. Taskar, E. Segal, and D. Koller. 2001. Probabilistic classification and clustering in relational data. In Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2. 870--876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. 2003. Load shedding in a data stream manager. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29. 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. A. Vattani, D. Chakrabarti, and M. Gurevich. 2011. Preserving personalized pagerank in subgraphs. In Proceedings of the 28th International Conference on Machine Learning. 793--800.Google ScholarGoogle Scholar
  100. Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM Workshop on Online Social Networks. 37--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Jeffrey S. Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematics Software 11 (1985), 37--57. Issue 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. H. Wang, W. Fan, P. S. Yu, and J. Han. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 226--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. H. Wang, P. S. Yu, and J. Han. 2010. Data Mining and Knowledge Discovery Handbook. Springer, Chapter 40: Mining Concept-Drifting Data Streams, 789--802.Google ScholarGoogle Scholar
  104. J. K. Watters and P. Biernacki. 1989. Targeted sampling: Options for the study of hidden populations. Social Problems 36, 4 (1989), 416--430.Google ScholarGoogle ScholarCross RefCross Ref
  105. Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 6684 (1998), 440--442.Google ScholarGoogle Scholar
  106. Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Y. Zhao. 2009. User interactions in social networks and their implications. In Proceedings of the 4th ACM European Conference on Computer Systems. 205--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. R. Xiang, J. Neville, and M. Rogati. 2010. Modeling relationship strength in online social networks. In Proceedings of the 19th International Conference on World Wide Web. 981--990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. S. Ye, J. Lang, and F. Wu. 2010. Crawling online social graphs. In IEEE 12th International Asia-Pacific Web Conference. 236--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Sooyeon Yoon, Sungmin Lee, Soon-Hyung Yook, and Yup Kim. 2007. Statistical properties of sampled networks by random walks. Physical Review E 75 (2007), 046114.Google ScholarGoogle ScholarCross RefCross Ref
  110. J. Zhang. 2010. Managing and Mining Graph Data. Springer, Chapter 13: A survey on streaming algorithms for massive graphs, 393--420.Google ScholarGoogle Scholar

Index Terms

  1. Network Sampling: From Static to Streaming Graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 8, Issue 2
      June 2014
      161 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2630935
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2013
      • Accepted: 1 May 2013
      • Revised: 1 April 2013
      • Received: 1 November 2012
      Published in tkdd Volume 8, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader