ABSTRACT
Determining the frequencies and the distribution of small subgraph patterns in a large input graph is an important part of many graph based mining tasks such as Frequent Subgraph Mining (FSM) and Motif Detection. Due to the exponential number of such graph patterns the interpretation of the mining results is mostly limited to finding unexpectedly frequent patterns, and in general identifying few particularly interesting patterns to then understand their function in the network. However, the full distribution of patterns itself encodes much more information about the underlying graph. Looking at this pattern distribution could be of particular interest in Social Network Analysis because social networks seem to have more random noise and less hard structural constraints compared to other types of graphs. This makes it unlikely to find meaning in only the most frequent patterns. In this paper we contribute in two ways to the usage of pattern distributions in networks analysis. First, we introduce two different sampling-based algorithms for computing the mentioned pattern distributions. Second we sketch two original ideas which can be used to harness the information in pattern distributions for gaining insights on global and local properties of Social Networks.
- Stephen P. Borgatti and Martin G. Everett. "A Graph-theoretic perspective on centrality". In: Social Networks 28.4 (2006), pp. 466--484.Google ScholarCross Ref
- Thomas F. Coleman and Jorge J. Moré. "Estimation of Sparse Jacobian Matrices and Graph Coloring Blems". In: SIAM Journal on Numerical Analysis 20 (1983), pp. 187--209.Google ScholarCross Ref
- Mohammed Elseidy et al. "GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph". In: Proc. VLDB Endow. 7.7 (Mar. 2014), pp. 517--528. Google ScholarDigital Library
- Neil Zhenqiang Gong et al. "Predicting Links and Inferring Attributes using a Social-Attribute Network (SAN)". In: CoRR abs/1112.3265 (2011).Google Scholar
- Xu Hong-lin et al. "Social Network Analysis Based on Network Motifs". In: Journal of Applied Mathematics 2014 (2014).Google Scholar
- Jun Huan et al. "SPIN: Mining Maximal Frequent Subgraphs from Graph Databases". In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD '04. Seattle, WA, USA: ACM, 2004, pp. 581--586. Google ScholarDigital Library
- Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda. "An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data". In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. PKDD '00. London, UK, UK: Springer-Verlag, 2000, pp. 13--23. Google ScholarDigital Library
- Zahra Razaghi Moghadam Kashani et al. "Kavosh: a new algorithm for finding network motifs". In: BMC Bioinformatics 10.1 (2009), pp. 1--12.Google Scholar
- N. Kashtan et al. "Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs". In: Bioinformatics 20.11 (July 22, 2004), pp. 1746--1758. Google ScholarDigital Library
- Nikhil S. Ketkar, Lawrence B. Holder, and Diane J. Cook. "Subdue: Compression-based Frequent Pattern Discovery in Graph Data". In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations. OSDM '05. Chicago, Illinois: ACM, 2005, pp. 71--76. Google ScholarDigital Library
- Michihiro Kuramochi and George Karypis. "Frequent Subgraph Discovery". In: Proceedings of the 2001 IEEE International Conference on Data Mining. ICDM '01. Washington, DC, USA: IEEE Computer Society, 2001, pp. 313--320. Google ScholarDigital Library
- Jure Leskovec and Christos Faloutsos. "Sampling from large graphs". In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. 2006, pp. 631--636. Google ScholarDigital Library
- David Asher Levin, Yuval Peres, and Elizabeth Lee Wilmer. Markov chains and mixing times. With a chapter on coupling from the past by James G. Propp and David B. Wilson. Providence, R.I. American Mathematical Society, 2009.Google Scholar
- Rong-Hua Li et al. "On random walk based graph sampling". In: 2015 IEEE 31st International Conference on Data Engineering. IEEE. 2015, pp. 927--938.Google Scholar
- Xin Li et al. "NetMODE: Network Motif Detection without Nauty". In: PLoS One 7.12 (Dec. 18, 2012). Ed. by Luís A. Nunes Amaral. 23272055{pmid}, e50093.Google Scholar
- Kai Liu, William K. Cheung, and Jiming Liu. "Stochastic Network Motif Detection in Social Media". In: 2011 IEEE 11th International Conference on Data Mining Workshops. IEEE, Dec. 2011, pp. 949--956. Google ScholarDigital Library
- Xuesong Lu and Stéphane Bressan. "Sampling Connected Induced Subgraphs Uniformly at Random". In: Proceedings of the 24th International Conference on Scientific and Statistical Database Management. SSDBM'12. Chania, Crete, Greece: Springer-Verlag, 2012, pp. 195--212. Google ScholarDigital Library
- Brendan D McKay and Adolfo Piperno. "Practical graph isomorphism, II". In: Journal of Symbolic Computation 60 (2014), pp. 94--112. Google ScholarDigital Library
- Tijana Milenković and Nataša Pržulj. "Uncovering biological network function via graphlet degree signatures." In: Cancer informatics 6 (2008), pp. 257--273.Google ScholarCross Ref
- R. Milo et al. "Network Motifs: Simple Building Blocks of Complex Networks". In: Science 298.5594 (2002), pp. 824--827.Google ScholarCross Ref
- M. E. J. Newman. "Assortative Mixing in Networks". In: Phys. Rev. Lett. 89 (20 Oct. 2002), p. 208701.Google ScholarCross Ref
- Chiara Orsini et al. "Quantifying randomness in real networks". In: Nature communications 6 (2015).Google Scholar
- Tanay Kumar Saha and Mohammad Al Hasan. "FS3: A sampling based method for top-k frequent subgraph mining". In: Statistical Analysis and Data Mining: The ASA Data Science Journal 8.4 (2015), pp. 245--261. Google ScholarDigital Library
- Alistair Sinclair and Mark Jerrum. "Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains". In: Inf. Comput. 82.1 (July 1989), pp. 93--133. Google ScholarDigital Library
- Duncan J. Watts and Steven H. Strogatz. "Collective dynamics of 'small-world' networks". In: Nature 393.6684 (June 4, 1998), pp. 440--442.Google ScholarCross Ref
- Sebastian Wernicke and Florian Rasche. "FANMOD: a tool for fast network motif detection". In: Bioinformatics 22.9 (2006), pp. 1152--1153. Google ScholarDigital Library
- Xifeng Yan and Jiawei Han. "gSpan: graph-based substructure pattern mining". In: Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on. IEEE, 2002, pp. 721--724. Google ScholarDigital Library
- Xiaowei Ying, Leting Wu, and Xintao Wu. "A spectrum-based framework for quantifying randomness of social networks". In: Knowledge and Data Engineering, IEEE Transactions on 23.12 (2011), pp. 1842--1856. Google ScholarDigital Library
- Elena Zheleva and Lise Getoor. "To Join or not to Join: The Illusion of Privacy in Social Networks with Mixed Public and Private User Profiles". In: 18th International World Wide Web conference (WWW). Apr. 2009. Google ScholarDigital Library
Recommendations
Social Network Analysis of Popular YouTube Videos via Vertical Quantitative Mining
ASONAM '22: Proceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningFrequent itemset (or frequent pattern) mining is a technique used in big data mining to discover frequently occurring sets of items (such as popular co-purchased merchandise) and has numerous applications in the field of databases. Traditional ...
Expressions for Rényi and Shannon entropies for bivariate distributions
Exact forms of Rényi and Shannon entropies are determined for 27 continuous bivariate distributions, including the Kotz type distribution, truncated normal distribution, distributions with normal and centered normal conditionals, natural exponential ...
Second-order heavy-tailed distributions and tail analysis
This correspondence studies the second-order distributions of heavy-tail distributed random variables (RVs). Two models for the heavy-tailed distributions are considered: power law and epsi-contaminated distributions. Special cases of the models ...
Comments