ABSTRACT
In this paper we propose a novel specialized data structure that we call g-trie, designed to deal with collections of subgraphs. The main conceptual idea is akin to a prefix tree in the sense that we take advantage of common topology by constructing a multiway tree where the descendants of a node share a common substructure. We give algorithms to construct a g-trie, to list all stored subgraphs, and to find occurrences on another graph of the subgraphs stored in the g-trie. We evaluate the implementation of this structure and its associated algorithms on a set of representative benchmark biological networks in order to find network motifs. To assess the efficiency of our algorithms we compare their performance with other known network motif algorithms also implemented in the same common platform. Our results show that indeed, g-tries are a feasible, adequate and very efficient data structure for network motifs discovery, clearly outperforming previous algorithms and data structures.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499, 1994. Google ScholarDigital Library
- I. Albert and R. Albert. Conserved network motifs allow protein-protein interaction prediction. Bioinformatics, 20(18):3346--3352, December 2004. Google ScholarDigital Library
- V. Batagelj and A. Mrvar. Pajek datasets, 2006. http://vlado.fmf.uni-lj.si/pub/networks/data/.Google Scholar
- D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucl. Acids Res., 31(9):2443--2450, May 2003.Google ScholarCross Ref
- D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Computing Surveys, 38:1, 2006. Google ScholarDigital Library
- L. da F. Costa, F. A. Rodrigues, G. Travieso, and P. R. V. Boas. Characterization of complex networks: A survey of measurements. Advances In Physics, 56:167, 2007.Google ScholarCross Ref
- R. Dobrin, Q. K. Beg, A. Barabasi, and Z. Oltvai. Aggregation of topological motifs in the escherichia coli transcriptional regulatory network. BMC Bioinformatics, 5:10, 2004.Google ScholarCross Ref
- E. Fredkin. Trie memory. Commun. ACM, 3(9):490--499, September 1960. Google ScholarDigital Library
- J. Grochow and M. Kellis. Network motif discovery using subgraph enumeration and symmetry-breaking. pages 92--106. 2007.Google Scholar
- J. Han and M. Kamber. Data Mining, Second Edition: Concepts and Techniques. Morgan Kaufmann, September 2006. Google ScholarDigital Library
- S. Itzkovitz, R. Levitt, N. Kashtan, R. Milo, M. Itzkovitz, and U. Alon. Coarse-graining and self-dissimilarity of complex networks. Phys Rev E Stat Nonlin Soft Matter Phys, 71(1 Pt 2), January 2005.Google Scholar
- N. Kashtan, S. Itzkovitz, R. Milo, and U. Alon. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 20(11):1746--1758, 2004. Google ScholarDigital Library
- M. Kondoh. Building trophic modules into a persistent food web. Proceedings of the National Academy of Sciences, 105(43):16631--16635, 2008.Google ScholarCross Ref
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. Data Mining, IEEE International Conference on, 0:313, 2001. Google ScholarDigital Library
- A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 78(4), 2008.Google Scholar
- D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. can geographic isolation explain this unique trait? Behavioral Ecology and Sociobiology, 54(4):396--405, 2003.Google ScholarCross Ref
- B. McKay. Practical graph isomorphism. Cong. Numerantium, 30:45--87, 1981.Google Scholar
- R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824--827, October 2002.Google ScholarCross Ref
- M. Newman. Network data, september, 2009. http://www-personal.umich.edu/~mejn/netdata/.Google Scholar
- M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45, 2003.Google Scholar
- O. Sporns and R. Kotter. Motifs in brain networks. PLoS Biology, 2, 2004.Google Scholar
- S. Wasserman, K. Faust, and D. Iacobucci. Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press, November 1994.Google Scholar
- D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393(6684):440--442, June 1998.Google ScholarCross Ref
- S. Wernicke. Efficient detection of network motifs. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 3(4):347--359, 2006. Google ScholarDigital Library
- S. Wernicke and F. Rasche. Fanmod: a tool for fast network motif detection. Bioinformatics, 22(9):1152--1153, May 2006. Google ScholarDigital Library
Index Terms
- g-tries: an efficient data structure for discovering network motifs
Recommendations
Towards a faster network-centric subgraph census
ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningDetermining the frequency of small subgraphs is an important computational task lying at the core of several graph mining methodologies, such as network motifs discovery or graphlet based measurements. In this paper we try to improve a class of ...
G-Tries: a data structure for storing and finding subgraphs
The ability to find and count subgraphs of a given network is an important non trivial task with multidisciplinary applicability. Discovering network motifs or computing graphlet signatures are two examples of methodologies that at their core rely ...
Querying subgraph sets with g-tries
DBSocial '12: Proceedings of the 2nd ACM SIGMOD Workshop on Databases and Social NetworksIn this paper we present an universal methodology for finding all the occurrences of a given set of subgraphs in one single larger graph. Past approaches would either enumerate all possible subgraphs of a certain size or query a single subgraph. We use ...
Comments