Abstract
Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling by highlighting the different objectives, population and units of interest, and classes of network sampling methods. In addition, we propose a spectrum of computational models for network sampling methods, ranging from the traditionally studied model based on the assumption of a static domain to a more challenging model that is appropriate for streaming domains. We design a family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs. Furthermore, we demonstrate how traditional static sampling algorithms can be modified for graph streams for each of the three main classes of sampling methods: node, edge, and topology-based sampling. Experimental results indicate that our proposed family of sampling methods more accurately preserve the underlying properties of the graph in both static and streaming domains. Finally, we study the impact of network sampling algorithms on the parameter estimation and performance evaluation of relational classification algorithms.
- L. A. Adamic and N. Glance. 2005. The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery. 36--43. Google ScholarDigital Library
- C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. 2003. A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases. 81--92. Google ScholarDigital Library
- C. C. Aggarwal, Y. Li, P. S. Yu, and R. Jin. 2010a. On dense pattern mining in graph streams. In Proceedings of the VLDB Endowment 3, 1--2 (2010), 975--984. Google ScholarDigital Library
- C. Aggarwal, Y. Zhao, and P. Yu. 2010b. On clustering graph streams. In SIAM International Conference on Data Mining. 478--489.Google Scholar
- C. C. Aggarwal, Y. Zhao, and P. S. Yu. 2011. Outlier detection in graph streams. In IEEE 27th International Conference on Data Engineering. 399--409. Google ScholarDigital Library
- Charu C. Aggarwal. 2006. On biased reservoir sampling in the presence of stream evolution. In Proceedings of the 32nd International Conference on Very Large Data Bases. 607--618. Google ScholarDigital Library
- Charu C. Aggarwal (Ed.). 2007. Data Streams - Models and Algorithms. Advances in Database Systems, Vol. 31. Springer. Google ScholarDigital Library
- N. K. Ahmed, F. Berchmans, J. Neville, and R. Kompella. 2010a. Time-based sampling of social network activity graphs. In Proceedings of the 8th Workshop on Mining and Learning with Graphs. 1--9. Google ScholarDigital Library
- N. K. Ahmed, J. Neville, and R. Kompella. 2012a. Space-efficient sampling from social activity streams. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications. 53--60. Google ScholarDigital Library
- Nesreen Ahmed, Jennifer Neville, and Ramana Rao Kompella. 2011. Network sampling via edge-based node selection with graph induction. Technical Report 11-016. Purdue Digital Library.Google Scholar
- N. K. Ahmed, J. Neville, and R. Kompella. 2010b. Reconsidering the foundations of network sampling. In Proceedings of the 2nd Workshop on Information in Networks.Google Scholar
- N. K. Ahmed, J. Neville, and R. Kompella. 2012b. Network sampling designs for relational classification. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 1--4.Google Scholar
- Y. Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. 2007. Analysis of topological characteristics of huge online social networking services. In Proceedings of the 16th International Conference on World Wide Web. 835--844. Google ScholarDigital Library
- M. Al Hasan and M. J. Zaki. 2009. Output space sampling for graph patterns. Proceedings of the VLDB Endowment 2, 1 (2009), 730--741. Google ScholarDigital Library
- J. I. Alvarez-Hamelin, L. Dall’Asta, A. Barrat, and A. Vespignani. 2005. k-core decomposition of Internet graphs: Hierarchies, self-similarity and measurement biases. arXiv preprint cs/0511007 (2005).Google Scholar
- K. Avrachenkov, B. Ribeiro, and D. Towsley. 2010. Improving random walk estimation accuracy with uniform restarts. In Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science, Vol. 6516. Springer, 98--109.Google Scholar
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. 2002a. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 1--16. Google ScholarDigital Library
- B. Babcock, M. Datar, and R. Motwani. 2002b. Sampling from a moving window over streaming data. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. 633--634. Google ScholarDigital Library
- L. Backstrom and J. Kleinberg. 2011. Network bucket testing. In Proceedings of the 20th International Conference on World Wide Web. 615--624. Google ScholarDigital Library
- E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic. 2012. The role of social networks in information diffusion. In Proceedings of the 21st International Conference on World Wide Web. 519--528. Google ScholarDigital Library
- Z. Bar-Yossef, R. Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. 623--632. Google ScholarDigital Library
- E. Baykan, M. Henzinger, S. F. Keller, S. De Castelberg, and M. Kinzler. 2009. A comparison of techniques for sampling web pages. Arxiv preprint arXiv:0902.1604 (2009).Google Scholar
- Mansurul A. Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, and Mohammad Al Hasan. 2012. GUISE: Uniform sampling of graphlets for large graph analysis. In IEEE 12th International Conference on Data Mining. 91--100. Google ScholarDigital Library
- A. L. Buchsbaum, R. Giancarlo, and J. R. Westbrook. 2003. On finding common neighborhoods in massive graphs. Theoretical Computer Science 299, 1 (2003), 707--718. Google ScholarDigital Library
- L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. 2006. Counting triangles in data streams. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 253--262. Google ScholarDigital Library
- S. Carmi, S. Havlin, S. Kirkpatrick, Y. Shavitt, and E. Shir. 2007. A model of internet topology using k-shell decomposition. Proceedings of the National Academy of Sciences 104, 27 (2007), 11150--11154.Google ScholarCross Ref
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A recursive model for graph mining. In SIAM International Conference on Data Mining. 442--446.Google Scholar
- M. Charikar, K. Chen, and M. Farach-Colton. 2002. Finding frequent items in data streams. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming. 693--703. Google ScholarDigital Library
- L. Chen and C. Wang. 2010. Continuous subgraph pattern search over certain and uncertain graph streams. IEEE Transactions on Knowledge and Data Engineering 22, 8 (2010), 1093--1109. Google ScholarDigital Library
- M. Conover, J. Ratkiewicz, M. Francisco, B Gonçalves, A. Flammini, and F. Menczer. 2011. Political polarization on twitter. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 89--96.Google Scholar
- G. Cormode and S. Muthukrishnan. 2005. Space efficient mining of multigraph streams. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 271--282. Google ScholarDigital Library
- A. Dasgupta, R. Kumar, and D. Sivakumar. 2012. Social sampling. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 235--243. Google ScholarDigital Library
- M. De Choudhury, Y. R. Lin, H. Sundaram, K. S. Candan, L. Xie, and A. Kelliher. 2010. How does the data sampling strategy impact the discovery of information diffusion in social media. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 34--41.Google Scholar
- P. Domingos and G. Hulten. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 71--80. Google ScholarDigital Library
- W. Fan. 2004a. StreamMiner: A classifier ensemble-based engine to mine concept-drifting data streams. In Proceedings of the 30th International Conference on Very Large Data Bases - Volume 30. 1257--1260. Google ScholarDigital Library
- W. Fan. 2004b. Systematic data selection to mine concept-drifting data streams. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 128--137. Google ScholarDigital Library
- W. Fan, Y. Huang, H. Wang, and P. S. Yu. 2004. Active mining of data streams. In SIAM International Conference on Data Mining. 457--461.Google Scholar
- O. Frank. 1977. Survey sampling in graphs. Journal of Statistical Planning and Inference 1, 3 (1977), 235--264.Google ScholarCross Ref
- O. Frank. 1980. Sampling and inference in a population graph. International Statistical Review/Revue Internationale de Statistique 48, 1 (1980), 33--41.Google ScholarCross Ref
- O. Frank. 1981. A survey of statistical methods for graph analysis. Sociological Methodology 12 (1981), 110--155.Google ScholarCross Ref
- N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. 1999. Learning probabilistic relational models. In Proceedings of the 16th International Joint Conference on Artificial Intelligence. 1300--1309. Google ScholarDigital Library
- M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy. 2005. Mining data streams: A review. ACM Sigmod Record 34, 2 (2005), 18--26. Google ScholarDigital Library
- J. Gao, W. Fan, J. Han, and P. S. Yu. 2007. A general framework for mining concept-drifting data streams with skewed distributions. In SIAM International Conference on Data Mining. 3--14.Google Scholar
- Aurelien Gautreau, Alain Barrat, and Marc Barthélemy. 2009. Microdynamics in stationary complex networks. Proceedings of the National Academy of Sciences 106, 22 (2009), 8847--8852.Google ScholarCross Ref
- K. J. Gile and M. S. Handcock. 2010. Respondent-driven sampling: An assessment of current methodology. Sociological Methodology 40, 1 (2010), 285--327.Google ScholarCross Ref
- M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. In IEEE International Conference on Computer Communications. 1--9. Google ScholarDigital Library
- M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. 2011. Practical recommendations on crawling online social networks. IEEE Journal on Selected Areas in Communications 29, 9 (2011), 1872--1892.Google ScholarCross Ref
- C. Gkantsidis, M. Mihail, and A. Saberi. 2004. Random walks in peer-to-peer networks. In IEEE International Conference on Computer Communications. 1--12.Google Scholar
- David F. Gleich. 2012. Graph of Flickr Photo-Sharing Social Network Crawled in May 2006. (Feb 2012). https://research.hub.purdue.edu/publications/1002.Google Scholar
- L. Golab and M. T. Özsu. 2003. Issues in data stream management. ACM Sigmod Record 32, 2 (2003), 5--14. Google ScholarDigital Library
- L. A. Goodman. 1961. Snowball sampling. The Annals of Mathematical Statistics 32, 1 (1961), 148--170.Google ScholarCross Ref
- M. Granovetter. 1976. Network sampling: Some first steps. American Journal of Sociology 81, 6 (1976), 1287--1303.Google ScholarCross Ref
- S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. 2003. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15, 3 (2003), 515--528. Google ScholarDigital Library
- W. H. Haemers. 1995. Interlacing eigenvalues and graphs. Linear Algebra Applications 226 (1995), 593--616.Google ScholarCross Ref
- M. H. Hansen and W. N. Hurwitz. 1943. On the theory of sampling from finite populations. The Annals of Mathematical Statistics 14, 4 (1943), 333--362.Google ScholarCross Ref
- D. D. Heckathorn. 1997. Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems 2 (1997), 174--199.Google ScholarCross Ref
- M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. 2000. On near-uniform URL sampling. Computer Networks 33, 1 (2000), 295--308. Google ScholarDigital Library
- M. Henzinger, P. Raghavan, and S. Rajagopalan. 1999. Computing on data streams. In External Memory Algorithms: Dimacs Workshop External Memory and Visualization, Vol. 50. 107--118. Google ScholarDigital Library
- Christian Hubler, Hans-Peter Kriegel, Karsten M. Borgwardt, and Zoubin Ghahramani. 2008. Metropolis algorithms for representative subgraph sampling. In IEEE 8th International Conference on Data Mining. 283--292. Google ScholarDigital Library
- G. Hulten, L. Spencer, and P. Domingos. 2001. Mining time-changing data streams. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 97--106. Google ScholarDigital Library
- Lorenzo Isella, Juliette Stehlé, Alain Barrat, Ciro Cattuto, Jean-François Pinton, and Wouter Van den Broeck. 2011. What’s in a crowd? Analysis of face-to-face behavioral networks. Journal of Theoretical Biology 271, 1 (2011), 166--180.Google ScholarCross Ref
- Y. Jia, J. Hoberock, M. Garland, and J. Hart. 2008. On the visualization of social and other scale-free networks. IEEE Transactions on Visualization and Computer Graphics 14, 6 (2008), 1285--1292. Google ScholarDigital Library
- J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. 1999. The web as a graph: Measurements, models, and methods. In Computing and Combinatorics. Lecture Notes in Computer Science, Vol. 1627. Springer, 1--17. Google ScholarDigital Library
- E. D. Kolaczyk. 2009. Statistical Analysis of Network Data. Springer, Chapter 5: Sampling and Estimation in Network Graphs, 123--152.Google Scholar
- Christine Körner and Stefan Wrobel. 2005. Bias-Free hypothesis evaluation in multirelational domains. In Proceedings of the 4th International Workshop on Multi-relational Mining. 33--38. Google ScholarDigital Library
- V. Krishnamurthy, M. Faloutsos, M. Chrobak, J. H. Cui, L. Lao, and A. G. Percus. 2007. Sampling large internet topologies for simulation purposes. Computer Networks 51, 15 (2007), 4284--4302. Google ScholarDigital Library
- R. Kumar, J. Novak, and A. Tomkins. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 337--357. Google ScholarDigital Library
- M. Kurant, A. Markopoulou, and P. Thiran. 2011. Towards unbiased BFS sampling. IEEE Journal on Selected Areas in Communications 29, 9 (2011), 1799--1809.Google ScholarCross Ref
- Justin Lafferty. 2012. Facebook Users Worldwide: 3.2 Billion Likes And Comments Per Day. http://allfacebook.com/facebook-marketing-infographic-engagement_b98277. (August 2012).Google Scholar
- A. Lakhina, J. W. Byers, M. Crovella, and P. Xie. 2003. Sampling biases in IP topology measurements. In IEEE International Conference on Computer Communications. 332--341.Google Scholar
- L. Lee. 2001. On the effectiveness of the skew divergence for statistical language analysis. In Artificial Intelligence and Statistics. 65--72.Google Scholar
- S. Lee, P. Kim, and H. Jeong. 2006. Statistical properties of sampled networks. Physical Review E 73 (2006), 016102.Google ScholarCross Ref
- SNAP. 2013. Stanford Network Analysis Project. http://snap.stanford.edu/data/.Google Scholar
- J. Leskovec and C. Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 631--636. Google ScholarDigital Library
- X. Li, P. S. Yu, B. Liu, and S. K. Ng. 2009. Positive unlabeled learning for data stream classification. In SIAM International Conference on Data Mining. 259--270.Google Scholar
- Xuesong Lu and Stéphane Bressan. 2012. Sampling connected induced subgraphs uniformly at random. In Scientific and Statistical Database Management. Lecture Notes in Computer Science, Vol. 7338. Springer Berlin Heidelberg, 195--212. Google ScholarDigital Library
- S. Macskassy and F. Provost. 2007. Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research 8 (2007), 935--983. Google ScholarDigital Library
- A. S. Maiya and T. Y. Berger-Wolf. 2010. Sampling community structure. In Proceedings of the 19th International Conference on World Wide Web. 701--710. Google ScholarDigital Library
- A. S. Maiya and T. Y. Berger-Wolf. 2011. Benefits of bias: Towards better characterization of network sampling. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 105--113. Google ScholarDigital Library
- Gurmeet Singh, Manku and Rajeev Motwani. 2002. Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases. 346--357. Google ScholarDigital Library
- A. McGregor. 2009. Graph mining on streams. Encyclopedia of Database Systems, Springer, 1271--1275.Google Scholar
- Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. 29--42. Google ScholarDigital Library
- S. Muthukrishnan. 2005. Data streams: Algorithms and applications. Now Publishers Inc.Google Scholar
- J. Neville, B. Gallagher, and T. Eliassi-Rad. 2009. Evaluating statistical tests for within-network classifiers of relational data. In IEEE 9th International Conference on Data Mining. 397--406. Google ScholarDigital Library
- M. E. J. Newman. 2002. Assortative mixing in networks. Physical Review Letters 89, 20 (2002), 208701.Google ScholarCross Ref
- M. Papagelis, G. Das, and N. Koudas. 2013. Sampling online social networks. IEEE Transactions on Knowledge and Data Engineering 25, 3 (2013), 662--676. Google ScholarDigital Library
- A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. 2009. Respondent-driven sampling for characterizing unstructured overlays. In IEEE International Conference on Computer Communications. 2701--2705.Google Scholar
- S. Redner. 1998. How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B-Condensed Matter and Complex Systems 4, 2 (1998), 131--134.Google ScholarCross Ref
- Bruno Ribeiro and Don Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. 390--403. Google ScholarDigital Library
- Ryan Rossi and Jennifer Neville. 2012. Time-evolving relational classification and ensemble methods. In Proceedings of the 16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining-Vol. Part I. 1--13. Google ScholarDigital Library
- Ryan A. Rossi, Luke K. McDowell, David W. Aha, and Jennifer Neville. 2012. Transforming graph data for statistical relational learning. Journal of Artificial Intelligence Research 45, 1 (2012), 363--441. Google ScholarCross Ref
- A. D. Sarma, S. Gollapudi, and R. Panigrahy. 2008. Estimating pagerank on graph streams. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 69--78. Google ScholarDigital Library
- P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. 2008. Collective classification in network data. AI magazine 29, 3 (2008), 93--106.Google Scholar
- C. Seshadhri, Ali Pinar, and Tamara G. Kolda. 2013. An in-depth analysis of stochastic Kronecker graphs. Journal of the ACM 60, 2 (2013), 1--32. Google ScholarDigital Library
- M. P. H. Stumpf, C. Wiuf, and R. M. May. 2005. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences 102, 12 (2005), 4221--4224.Google ScholarCross Ref
- D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. 2006. On unbiased sampling for unstructured peer-to-peer networks. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement. 27--40. Google ScholarDigital Library
- B. Taskar, E. Segal, and D. Koller. 2001. Probabilistic classification and clustering in relational data. In Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2. 870--876. Google ScholarDigital Library
- N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. 2003. Load shedding in a data stream manager. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29. 309--320. Google ScholarDigital Library
- A. Vattani, D. Chakrabarti, and M. Gurevich. 2011. Preserving personalized pagerank in subgraphs. In Proceedings of the 28th International Conference on Machine Learning. 793--800.Google Scholar
- Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM Workshop on Online Social Networks. 37--42. Google ScholarDigital Library
- Jeffrey S. Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematics Software 11 (1985), 37--57. Issue 1. Google ScholarDigital Library
- H. Wang, W. Fan, P. S. Yu, and J. Han. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 226--235. Google ScholarDigital Library
- H. Wang, P. S. Yu, and J. Han. 2010. Data Mining and Knowledge Discovery Handbook. Springer, Chapter 40: Mining Concept-Drifting Data Streams, 789--802.Google Scholar
- J. K. Watters and P. Biernacki. 1989. Targeted sampling: Options for the study of hidden populations. Social Problems 36, 4 (1989), 416--430.Google ScholarCross Ref
- Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 6684 (1998), 440--442.Google Scholar
- Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Y. Zhao. 2009. User interactions in social networks and their implications. In Proceedings of the 4th ACM European Conference on Computer Systems. 205--218. Google ScholarDigital Library
- R. Xiang, J. Neville, and M. Rogati. 2010. Modeling relationship strength in online social networks. In Proceedings of the 19th International Conference on World Wide Web. 981--990. Google ScholarDigital Library
- S. Ye, J. Lang, and F. Wu. 2010. Crawling online social graphs. In IEEE 12th International Asia-Pacific Web Conference. 236--242. Google ScholarDigital Library
- Sooyeon Yoon, Sungmin Lee, Soon-Hyung Yook, and Yup Kim. 2007. Statistical properties of sampled networks by random walks. Physical Review E 75 (2007), 046114.Google ScholarCross Ref
- J. Zhang. 2010. Managing and Mining Graph Data. Springer, Chapter 13: A survey on streaming algorithms for massive graphs, 393--420.Google Scholar
Index Terms
- Network Sampling: From Static to Streaming Graphs
Recommendations
Graph sample and hold: a framework for big-graph analytics
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningSampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, ...
Towards Quantifying Sampling Bias in Network Inference
WWW '18: Companion Proceedings of the The Web Conference 2018Relational inference leverages relationships between entities and links in a network to infer information about the network from a small sample. This method is often used when global information about the network is not available or difficult to obtain. ...
Biased sampling from facebook multilayer activity network using learning automata
Although much research has been devoted to unbiased sampling of various networks, bias is not always disadvantageous, but sometimes useful. Especially for many real-world applications such as detecting influential nodes, spam users, and the most ...
Comments