research-article

The missing models: a data-driven approach for learning how networks grow

Authors:
Robert Patro

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Geet Duggal

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Emre Sefer

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Hao Wang

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Darya Filippova

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Carl Kingsford

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2012Pages 42–50https://doi.org/10.1145/2339530.2339541

Published:12 August 2012Publication History

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 42–50

ABSTRACT

Probabilistic models of network growth have been extensively studied as idealized representations of network evolution. Models, such as the Kronecker model, duplication-based models, and preferential attachment models, have been used for tasks such as representing null models, detecting anomalies, algorithm testing, and developing an understanding of various mechanistic growth processes. However, developing a new growth model to fit observed properties of a network is a difficult task, and as new networks are studied, new models must constantly be developed. Here, we present a framework, called GrowCode, for the automatic discovery of novel growth models that match user-specified topological features in undirected graphs. GrowCode introduces a set of basic commands that are general enough to encode several previously developed models. Coupling this formal representation with an optimization approach, we show that GrowCode is able to discover models for protein interaction networks, autonomous systems networks, and scientific collaboration networks that closely match properties such as the degree distribution, the clustering coefficient, and assortativity that are observed in real networks of these classes. Additional tests on simulated networks show that the models learned by GrowCode generate distributions of graphs with similar variance as existing models for these classes.

Supplemental Material

306_m_talk_5.mp4

mp4

151 MB

Download

References

L. Akoglu and C. Faloutsos. RTG: a recursive realistic graph generator using random typing. Data Mining and Knowledge Discovery, 19(2):194--209, 2009. Google ScholarDigital Library
A. Barabási. Emergence of Scaling in Random Networks. Science, 286(5439):509--512, 1999.Google Scholar
A. Bhan, D. Galas, and T. Dewey. A duplication growth model of gene expression networks. Bioinformatics, 18:1486--1493, 2002.Google ScholarCross Ref
M. Brameier. On linear genetic programming. PhD thesis, University of Hamburg, 2004.Google Scholar
B. K. Bulik-Sullivan and P. F. Sullivan. The authorship network of genome-wide association studies. Nature Genetics, 44(2):113--113, 2012.Google ScholarCross Ref
D. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. Newman, and S. H. Strogatz. Are randomly grown graphs really random? Phys. Rev. E, 64:041902--041908, 2001.Google ScholarCross Ref
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-Law Distributions in Empirical Data. SIAM Review, 51(4):661, 2009. Google ScholarDigital Library
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182--197, 2002. Google ScholarDigital Library
S. N. Dorogovtsev, J. F. Mendes, and A. N. Samukhin. Structure of growing networks with preferential linking. Phys. Rev. Lett., 85(21), 2000.Google ScholarCross Ref
P. Erdős and A. Rényi. On the evolution of random graphs. Evolution, 5(1):17--61, 1960.Google Scholar
T. A. Gibson and D. S. Goldberg. Improving evolutionary models of protein interaction networks. Bioinformatics, 27(3):376--382, 2011. Google ScholarDigital Library
I. Ispolatov, P. L. Krapivsky, and A. Yuryev. Duplication-divergence model of protein interaction network. Phys. Rev. E, 71(6 Pt 1):061911, 2005.Google Scholar
M. Kim and J. Leskovec. Multiplicative attribute graph model of real-world networks. Internet Mathematics, 8(1-2):113--160, 2012.Google ScholarCross Ref
W. K. Kim and E. M. Marcotte. Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence. PLoS Comput. Biol., 4(11):e1000232, 2008.Google ScholarCross Ref
R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, pages 611--617, 2006. Google ScholarDigital Library
J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker Graphs: An Approach to Modeling Networks. The Journal of Machine Learning Research, 11:985--1042, 2010. Google ScholarDigital Library
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time. In Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, pages 177--187, New York, USA, 2005. Google ScholarDigital Library
S. Luke. Issues in Scaling Genetic Programming: Breeding Strategies, Tree Generation, and Code Bloat. PhD thesis, University of Maryland, College Park, 2000.Google Scholar
M. Middendorf, E. Ziv, C. Adams, J. Hom, R. Koytcheff, C. Levovitz, G. Woods, L. Chen, and C. Wiggins. Discriminative topological features reveal biological network mechanisms. BMC Bioinformatics, 5:181, 2004.Google ScholarCross Ref
M. Middendorf, E. Ziv, and C. Wiggins. Inferring network mechanisms: The Drosophila melanogaster protein interaction network. Proc. Natl. Acad. Sci. USA, 102:3192--3197, 2005.Google ScholarCross Ref
S. Navlakha and C. Kingsford. Network archaeology: Uncovering ancient networks from present-day interactions. PLoS Comput. Biol., 7(4):e1001119, 2011.Google ScholarCross Ref
G. Palla, L. Lovász, and T. Vicsek. Multifractal network generator. Proc. Natl. Acad. Sci. USA, 107:7640--7645, 2010.Google ScholarCross Ref
N. Przulj, O. Kuchaiev, A. Stevanovic, and W. Hayes. Geometric evolutionary dynamics of protein interaction networks. Pac. Symp. Biocomput., 15:178--189, 2010.Google Scholar
R. Solé, R. Pastor-Satorras, E. Smith, and T. B. Kepler. A model of large-scale proteome evolution. Adv. Complex Syst., 5(1):43--54, 2002.Google ScholarCross Ref
Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, pages 797--806, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
S. A. Teichmann and M. M. Babu. Gene regulatory network growth by duplication. Nat. Genet., 36:492--496, 2004.Google ScholarCross Ref
A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Modeling of protein interaction networks. Complexus, 1(38):9, 2001.Google Scholar
D. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 363:202--204, 1998.Google Scholar
R. C. Wilson and P. Zhu. A study of graph spectra for comparing graphs and trees. Pattern Recognition, 41:2833--2841, 2008. Google ScholarDigital Library
H. Yu, P. Braun, M. A. Yildirim, I. Lemmens, K. Venkatesan, J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li, N. Simonis, T. Hao, J.-F. Rual, A. Dricot, A. Vazquez, R. R. Murray, C. Simon, L. Tardivo, S. Tam, N. Svrzikapa, C. Fan, A.-S. De Smet, A. Motyl, M. E. Hudson, J. Park, X. Xin, M. E. Cusick, T. Moore, C. Boone, M. Snyder, F. P. Roth, A.-L. Barabási, J. Tavernier, D. E. Hill, and M. Vidal. High-quality binary protein interaction map of the yeast interactome network. Science, 322(5898):104--110, 2008.Google ScholarCross Ref

Index Terms

The missing models: a data-driven approach for learning how networks grow
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

How network visibility and strategic networking leads to the emergence of certain network characteristics: a complex adaptive system approach
ICEC '16: Proceedings of the 18th Annual International Conference on Electronic Commerce: e-Commerce in Smart connected World

Person-to-person interactions within an organization form a network of people. Changes of the structural properties of these networks are caused through a variety of dynamic processes among the people. We argue in this paper that there is a feedback ...
Read More
Parallelizing Preferential Attachment Models for Generating Large-Scale Social Networks that Cannot Fit into Memory
SOCIALCOM-PASSAT '12: Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust

Social network generation is an important problem in social network analysis. The goal is to produce artificial networks that preserve some real world properties of social networks. As one of most popular social network generation algorithms, the ...
Read More
New Frontiers of Multi-Network Mining: Recent Developments and Future Trend
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Networks (i.e., graphs) are often collected from multiple sources and platforms, such as social networks extracted from multiple online platforms, team-specific collaboration networks within an organization, and inter-dependent infrastructure networks, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithms
graph mining
network growth models
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 887
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The missing models: a data-driven approach for learning how networks grow

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

How network visibility and strategic networking leads to the emergence of certain network characteristics: a complex adaptive system approach

Parallelizing Preferential Attachment Models for Generating Large-Scale Social Networks that Cannot Fit into Memory

New Frontiers of Multi-Network Mining: Recent Developments and Future Trend