ABSTRACT
Probabilistic models of network growth have been extensively studied as idealized representations of network evolution. Models, such as the Kronecker model, duplication-based models, and preferential attachment models, have been used for tasks such as representing null models, detecting anomalies, algorithm testing, and developing an understanding of various mechanistic growth processes. However, developing a new growth model to fit observed properties of a network is a difficult task, and as new networks are studied, new models must constantly be developed. Here, we present a framework, called GrowCode, for the automatic discovery of novel growth models that match user-specified topological features in undirected graphs. GrowCode introduces a set of basic commands that are general enough to encode several previously developed models. Coupling this formal representation with an optimization approach, we show that GrowCode is able to discover models for protein interaction networks, autonomous systems networks, and scientific collaboration networks that closely match properties such as the degree distribution, the clustering coefficient, and assortativity that are observed in real networks of these classes. Additional tests on simulated networks show that the models learned by GrowCode generate distributions of graphs with similar variance as existing models for these classes.
Supplemental Material
- L. Akoglu and C. Faloutsos. RTG: a recursive realistic graph generator using random typing. Data Mining and Knowledge Discovery, 19(2):194--209, 2009. Google ScholarDigital Library
- A. Barabási. Emergence of Scaling in Random Networks. Science, 286(5439):509--512, 1999.Google Scholar
- A. Bhan, D. Galas, and T. Dewey. A duplication growth model of gene expression networks. Bioinformatics, 18:1486--1493, 2002.Google ScholarCross Ref
- M. Brameier. On linear genetic programming. PhD thesis, University of Hamburg, 2004.Google Scholar
- B. K. Bulik-Sullivan and P. F. Sullivan. The authorship network of genome-wide association studies. Nature Genetics, 44(2):113--113, 2012.Google ScholarCross Ref
- D. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. Newman, and S. H. Strogatz. Are randomly grown graphs really random? Phys. Rev. E, 64:041902--041908, 2001.Google ScholarCross Ref
- A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-Law Distributions in Empirical Data. SIAM Review, 51(4):661, 2009. Google ScholarDigital Library
- K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182--197, 2002. Google ScholarDigital Library
- S. N. Dorogovtsev, J. F. Mendes, and A. N. Samukhin. Structure of growing networks with preferential linking. Phys. Rev. Lett., 85(21), 2000.Google ScholarCross Ref
- P. Erdős and A. Rényi. On the evolution of random graphs. Evolution, 5(1):17--61, 1960.Google Scholar
- T. A. Gibson and D. S. Goldberg. Improving evolutionary models of protein interaction networks. Bioinformatics, 27(3):376--382, 2011. Google ScholarDigital Library
- I. Ispolatov, P. L. Krapivsky, and A. Yuryev. Duplication-divergence model of protein interaction network. Phys. Rev. E, 71(6 Pt 1):061911, 2005.Google Scholar
- M. Kim and J. Leskovec. Multiplicative attribute graph model of real-world networks. Internet Mathematics, 8(1-2):113--160, 2012.Google ScholarCross Ref
- W. K. Kim and E. M. Marcotte. Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence. PLoS Comput. Biol., 4(11):e1000232, 2008.Google ScholarCross Ref
- R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, pages 611--617, 2006. Google ScholarDigital Library
- J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker Graphs: An Approach to Modeling Networks. The Journal of Machine Learning Research, 11:985--1042, 2010. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time. In Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, pages 177--187, New York, USA, 2005. Google ScholarDigital Library
- S. Luke. Issues in Scaling Genetic Programming: Breeding Strategies, Tree Generation, and Code Bloat. PhD thesis, University of Maryland, College Park, 2000.Google Scholar
- M. Middendorf, E. Ziv, C. Adams, J. Hom, R. Koytcheff, C. Levovitz, G. Woods, L. Chen, and C. Wiggins. Discriminative topological features reveal biological network mechanisms. BMC Bioinformatics, 5:181, 2004.Google ScholarCross Ref
- M. Middendorf, E. Ziv, and C. Wiggins. Inferring network mechanisms: The Drosophila melanogaster protein interaction network. Proc. Natl. Acad. Sci. USA, 102:3192--3197, 2005.Google ScholarCross Ref
- S. Navlakha and C. Kingsford. Network archaeology: Uncovering ancient networks from present-day interactions. PLoS Comput. Biol., 7(4):e1001119, 2011.Google ScholarCross Ref
- G. Palla, L. Lovász, and T. Vicsek. Multifractal network generator. Proc. Natl. Acad. Sci. USA, 107:7640--7645, 2010.Google ScholarCross Ref
- N. Przulj, O. Kuchaiev, A. Stevanovic, and W. Hayes. Geometric evolutionary dynamics of protein interaction networks. Pac. Symp. Biocomput., 15:178--189, 2010.Google Scholar
- R. Solé, R. Pastor-Satorras, E. Smith, and T. B. Kepler. A model of large-scale proteome evolution. Adv. Complex Syst., 5(1):43--54, 2002.Google ScholarCross Ref
- Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, pages 797--806, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- S. A. Teichmann and M. M. Babu. Gene regulatory network growth by duplication. Nat. Genet., 36:492--496, 2004.Google ScholarCross Ref
- A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Modeling of protein interaction networks. Complexus, 1(38):9, 2001.Google Scholar
- D. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 363:202--204, 1998.Google Scholar
- R. C. Wilson and P. Zhu. A study of graph spectra for comparing graphs and trees. Pattern Recognition, 41:2833--2841, 2008. Google ScholarDigital Library
- H. Yu, P. Braun, M. A. Yildirim, I. Lemmens, K. Venkatesan, J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li, N. Simonis, T. Hao, J.-F. Rual, A. Dricot, A. Vazquez, R. R. Murray, C. Simon, L. Tardivo, S. Tam, N. Svrzikapa, C. Fan, A.-S. De Smet, A. Motyl, M. E. Hudson, J. Park, X. Xin, M. E. Cusick, T. Moore, C. Boone, M. Snyder, F. P. Roth, A.-L. Barabási, J. Tavernier, D. E. Hill, and M. Vidal. High-quality binary protein interaction map of the yeast interactome network. Science, 322(5898):104--110, 2008.Google ScholarCross Ref
Index Terms
- The missing models: a data-driven approach for learning how networks grow
Recommendations
How network visibility and strategic networking leads to the emergence of certain network characteristics: a complex adaptive system approach
ICEC '16: Proceedings of the 18th Annual International Conference on Electronic Commerce: e-Commerce in Smart connected WorldPerson-to-person interactions within an organization form a network of people. Changes of the structural properties of these networks are caused through a variety of dynamic processes among the people. We argue in this paper that there is a feedback ...
Parallelizing Preferential Attachment Models for Generating Large-Scale Social Networks that Cannot Fit into Memory
SOCIALCOM-PASSAT '12: Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and TrustSocial network generation is an important problem in social network analysis. The goal is to produce artificial networks that preserve some real world properties of social networks. As one of most popular social network generation algorithms, the ...
New Frontiers of Multi-Network Mining: Recent Developments and Future Trend
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningNetworks (i.e., graphs) are often collected from multiple sources and platforms, such as social networks extracted from multiple online platforms, team-specific collaboration networks within an organization, and inter-dependent infrastructure networks, ...
Comments