skip to main content
10.1145/2339530.2339541acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

The missing models: a data-driven approach for learning how networks grow

Published:12 August 2012Publication History

ABSTRACT

Probabilistic models of network growth have been extensively studied as idealized representations of network evolution. Models, such as the Kronecker model, duplication-based models, and preferential attachment models, have been used for tasks such as representing null models, detecting anomalies, algorithm testing, and developing an understanding of various mechanistic growth processes. However, developing a new growth model to fit observed properties of a network is a difficult task, and as new networks are studied, new models must constantly be developed. Here, we present a framework, called GrowCode, for the automatic discovery of novel growth models that match user-specified topological features in undirected graphs. GrowCode introduces a set of basic commands that are general enough to encode several previously developed models. Coupling this formal representation with an optimization approach, we show that GrowCode is able to discover models for protein interaction networks, autonomous systems networks, and scientific collaboration networks that closely match properties such as the degree distribution, the clustering coefficient, and assortativity that are observed in real networks of these classes. Additional tests on simulated networks show that the models learned by GrowCode generate distributions of graphs with similar variance as existing models for these classes.

Skip Supplemental Material Section

Supplemental Material

306_m_talk_5.mp4

mp4

151 MB

References

  1. L. Akoglu and C. Faloutsos. RTG: a recursive realistic graph generator using random typing. Data Mining and Knowledge Discovery, 19(2):194--209, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Barabási. Emergence of Scaling in Random Networks. Science, 286(5439):509--512, 1999.Google ScholarGoogle Scholar
  3. A. Bhan, D. Galas, and T. Dewey. A duplication growth model of gene expression networks. Bioinformatics, 18:1486--1493, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  4. M. Brameier. On linear genetic programming. PhD thesis, University of Hamburg, 2004.Google ScholarGoogle Scholar
  5. B. K. Bulik-Sullivan and P. F. Sullivan. The authorship network of genome-wide association studies. Nature Genetics, 44(2):113--113, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. Newman, and S. H. Strogatz. Are randomly grown graphs really random? Phys. Rev. E, 64:041902--041908, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-Law Distributions in Empirical Data. SIAM Review, 51(4):661, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182--197, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. N. Dorogovtsev, J. F. Mendes, and A. N. Samukhin. Structure of growing networks with preferential linking. Phys. Rev. Lett., 85(21), 2000.Google ScholarGoogle ScholarCross RefCross Ref
  10. P. Erdős and A. Rényi. On the evolution of random graphs. Evolution, 5(1):17--61, 1960.Google ScholarGoogle Scholar
  11. T. A. Gibson and D. S. Goldberg. Improving evolutionary models of protein interaction networks. Bioinformatics, 27(3):376--382, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Ispolatov, P. L. Krapivsky, and A. Yuryev. Duplication-divergence model of protein interaction network. Phys. Rev. E, 71(6 Pt 1):061911, 2005.Google ScholarGoogle Scholar
  13. M. Kim and J. Leskovec. Multiplicative attribute graph model of real-world networks. Internet Mathematics, 8(1-2):113--160, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  14. W. K. Kim and E. M. Marcotte. Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence. PLoS Comput. Biol., 4(11):e1000232, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  15. R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, pages 611--617, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker Graphs: An Approach to Modeling Networks. The Journal of Machine Learning Research, 11:985--1042, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time. In Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, pages 177--187, New York, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Luke. Issues in Scaling Genetic Programming: Breeding Strategies, Tree Generation, and Code Bloat. PhD thesis, University of Maryland, College Park, 2000.Google ScholarGoogle Scholar
  19. M. Middendorf, E. Ziv, C. Adams, J. Hom, R. Koytcheff, C. Levovitz, G. Woods, L. Chen, and C. Wiggins. Discriminative topological features reveal biological network mechanisms. BMC Bioinformatics, 5:181, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Middendorf, E. Ziv, and C. Wiggins. Inferring network mechanisms: The Drosophila melanogaster protein interaction network. Proc. Natl. Acad. Sci. USA, 102:3192--3197, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Navlakha and C. Kingsford. Network archaeology: Uncovering ancient networks from present-day interactions. PLoS Comput. Biol., 7(4):e1001119, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  22. G. Palla, L. Lovász, and T. Vicsek. Multifractal network generator. Proc. Natl. Acad. Sci. USA, 107:7640--7645, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  23. N. Przulj, O. Kuchaiev, A. Stevanovic, and W. Hayes. Geometric evolutionary dynamics of protein interaction networks. Pac. Symp. Biocomput., 15:178--189, 2010.Google ScholarGoogle Scholar
  24. R. Solé, R. Pastor-Satorras, E. Smith, and T. B. Kepler. A model of large-scale proteome evolution. Adv. Complex Syst., 5(1):43--54, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  25. Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, pages 797--806, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. A. Teichmann and M. M. Babu. Gene regulatory network growth by duplication. Nat. Genet., 36:492--496, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  27. A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Modeling of protein interaction networks. Complexus, 1(38):9, 2001.Google ScholarGoogle Scholar
  28. D. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 363:202--204, 1998.Google ScholarGoogle Scholar
  29. R. C. Wilson and P. Zhu. A study of graph spectra for comparing graphs and trees. Pattern Recognition, 41:2833--2841, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Yu, P. Braun, M. A. Yildirim, I. Lemmens, K. Venkatesan, J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li, N. Simonis, T. Hao, J.-F. Rual, A. Dricot, A. Vazquez, R. R. Murray, C. Simon, L. Tardivo, S. Tam, N. Svrzikapa, C. Fan, A.-S. De Smet, A. Motyl, M. E. Hudson, J. Park, X. Xin, M. E. Cusick, T. Moore, C. Boone, M. Snyder, F. P. Roth, A.-L. Barabási, J. Tavernier, D. E. Hill, and M. Vidal. High-quality binary protein interaction map of the yeast interactome network. Science, 322(5898):104--110, 2008.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The missing models: a data-driven approach for learning how networks grow

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2012
      1616 pages
      ISBN:9781450314626
      DOI:10.1145/2339530

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader