skip to main content
article

Feature-based similarity search in graph structures

Authors Info & Claims
Published:01 December 2006Publication History
Skip Abstract Section

Abstract

Similarity search of complex structures is an important operation in graph-related applications since exact matching is often too restrictive. In this article, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed feature misses, our structural filtering algorithm can filter graphs without performing pairwise similarity computation. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy that could maximize the filtering capability. We prove that the complexity of optimal feature set selection is Ω(2m) in the worst case, where m is the number of features for selection. In practice, we identify several criteria to build effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly within a multifilter composition framework. The proposed feature-based filtering concept can be generalized and applied to searching approximate nonconsecutive sequences, trees, and other structured data as well.

References

  1. Beretti, S., Bimbo, A., and Vicario, E. 2001. Efficient matching and indexing of graph models in content based retrieval. IEEE Trans. Patt. Analys. Mach. Intell. 23, 1089--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., and Bourne, P. 2000. The protein data bank. Nucl. Acids Res. 28, 235--242.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bunke, H. and Shearer, K. 1998. A graph distance metric based on the maximal common subgraph. Patt. Recogn. Lett. 19, 255--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Erickson, J. 1996. Lower bounds for fundamental geometric problems. Ph.D. dissertation. University of California at Berkeley, Berkley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Feige, U. 1998. A threshold of ln n for approximating set cover. J. ACM 45, 634--652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Garey, M. and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman & Co., New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Giugno, R. and Shasha, D. 2002. Graphgrep: A fast and universal method for querying graphs. Proceedings of the 2002 International Conference on Pattern Recognition (ICPR'02). 112--115.Google ScholarGoogle Scholar
  8. Gravano, L., Ipeirotis, P., Jagadish, H., Koudas, N., Muthukrishnan, S., Pietarinen, L., and Srivastava, D. 2001. Using q-grams in a dbms for approximate string processing. Data Eng. Bull. 24, 28--37.Google ScholarGoogle Scholar
  9. Hagadone, T. 1992. Molecular substructure similarity searching: Efficient retrieval in two-dimensional structure databases. J. Chem. Inf. Comput. Sci. 32, 515--521.Google ScholarGoogle ScholarCross RefCross Ref
  10. Hochbaum, D. 1997. Approximation Algorithms for NP-Hard Problems. PWS Publishing, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Holder, L., Cook, D., and Djoko, S. 1994. Substructure discovery in the subdue system. In Proceedings of the AAAI'94 Workshop on Knowledge Discovery in Databases (KDD'94). 169--180.Google ScholarGoogle Scholar
  12. Kailing, K., Kriegel, H., Schnauer, S., and Seidl, T. 2004. Efficient similarity search for hierarchical data in large databases. In Proceedings of the 9th International Conference on Extending Database Technology (EDBT'04). 676--693.Google ScholarGoogle Scholar
  13. Kanehisa, M. and Goto, S. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucl. Acids Res. 28, 27--30.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kislicyn, S. S. 1964. On the selection of the kth element of an ordered set by pairwise comparisons. Sibirskii Mat. Z. (in Russian) 5, 557--564.Google ScholarGoogle Scholar
  15. Kuramochi, M. and Karypis, G. 2001. Frequent subgraph discovery. In Proceedings of the 2001 International Conference on Data Mining (ICDM'01). 313--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Messmer, B. and Bunke, H. 1998. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. Patt. Analys. Mach. Intell. 20, 493--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Navarro, G. 2001. A guided tour to approximate string matching. ACM Comput. Surv. 33, 31--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nilsson, N. 1980. Principles of Artificial Intelligence. Morgan Kaufmann, Palo Alto, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Padberg, M. 1995. Linear Optimization and Extensions. Springer-Verlag, Berlin, Germany.Google ScholarGoogle Scholar
  20. Petrakis, E. and Faloutsos, C. 1997. Similarity searching in medical image databases. Knowl. Data Eng. 9, 3, 435--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Raymond, J., Gardiner, E., and Willett, P. 2002. Rascal: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45, 631--644.Google ScholarGoogle ScholarCross RefCross Ref
  22. Shasha, D., Wang, J., and Giugno, R. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the 21th ACM Symposium on Principles of Database Systems (PODS'02). 39--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Srinivasa, S. and Kumar, S. 2003. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proceedings of the 2003 International Conference on Very Large Data Bases (VLDB'03). 975--986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ukkonen, E. 1992. Approximate string matching with q-grams and maximal matches. Theoret. Comput. Sci. 191--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ullmann, J. 1977. Binary n-gram technique for automatic correction of substitution, deletion, insertion, and reversal errors in words. Comput. J. 20, 141--147.Google ScholarGoogle ScholarCross RefCross Ref
  26. Wang, J., Zhang, K., Jeong, K., and Shasha, D. 1994. A system for approximate tree matching. IEEE Trans. Knowl. Data Eng. 6, 559--571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Willett, P., Barnard, J., and Downs, G. 1998. Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983--996.Google ScholarGoogle ScholarCross RefCross Ref
  28. Yan, X., Yu, P., and Han, J. 2004. Graph indexing: A frequent structure-based approach. In Proceedings of the 2004 ACM International Conference on Management of Data (SIGMOD'04). 335--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yan, X., Yu, P., and Han, J. 2005. Substructure similarity search in graph databases. In Proceedings of the 2005 ACM International Conference on Management of Data (SIGMOD'05). 766--777. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Feature-based similarity search in graph structures

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Database Systems
            ACM Transactions on Database Systems  Volume 31, Issue 4
            December 2006
            357 pages
            ISSN:0362-5915
            EISSN:1557-4644
            DOI:10.1145/1189769
            Issue’s Table of Contents

            Copyright © 2006 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 December 2006
            Published in tods Volume 31, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader