article

Feature-based similarity search in graph structures

Authors:
Xifeng Yan

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
Feida Zhu

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
Philip S. Yu

IBM T. J. Watson Research Center, Hawthorne, NY

IBM T. J. Watson Research Center, Hawthorne, NY
View Profile

,
Jiawei Han

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 31 Issue 4pp 1418–1453https://doi.org/10.1145/1189769.1189777

Published:01 December 2006Publication History

ACM Transactions on Database Systems

Abstract

Similarity search of complex structures is an important operation in graph-related applications since exact matching is often too restrictive. In this article, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed feature misses, our structural filtering algorithm can filter graphs without performing pairwise similarity computation. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy that could maximize the filtering capability. We prove that the complexity of optimal feature set selection is Ω(2^m) in the worst case, where m is the number of features for selection. In practice, we identify several criteria to build effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly within a multifilter composition framework. The proposed feature-based filtering concept can be generalized and applied to searching approximate nonconsecutive sequences, trees, and other structured data as well.

References

Beretti, S., Bimbo, A., and Vicario, E. 2001. Efficient matching and indexing of graph models in content based retrieval. IEEE Trans. Patt. Analys. Mach. Intell. 23, 1089--1105. Google ScholarDigital Library
Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., and Bourne, P. 2000. The protein data bank. Nucl. Acids Res. 28, 235--242.Google ScholarCross Ref
Bunke, H. and Shearer, K. 1998. A graph distance metric based on the maximal common subgraph. Patt. Recogn. Lett. 19, 255--259. Google ScholarDigital Library
Erickson, J. 1996. Lower bounds for fundamental geometric problems. Ph.D. dissertation. University of California at Berkeley, Berkley, CA. Google ScholarDigital Library
Feige, U. 1998. A threshold of ln n for approximating set cover. J. ACM 45, 634--652. Google ScholarDigital Library
Garey, M. and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman & Co., New York, NY. Google ScholarDigital Library
Giugno, R. and Shasha, D. 2002. Graphgrep: A fast and universal method for querying graphs. Proceedings of the 2002 International Conference on Pattern Recognition (ICPR'02). 112--115.Google Scholar
Gravano, L., Ipeirotis, P., Jagadish, H., Koudas, N., Muthukrishnan, S., Pietarinen, L., and Srivastava, D. 2001. Using q-grams in a dbms for approximate string processing. Data Eng. Bull. 24, 28--37.Google Scholar
Hagadone, T. 1992. Molecular substructure similarity searching: Efficient retrieval in two-dimensional structure databases. J. Chem. Inf. Comput. Sci. 32, 515--521.Google ScholarCross Ref
Hochbaum, D. 1997. Approximation Algorithms for NP-Hard Problems. PWS Publishing, Boston, MA. Google ScholarDigital Library
Holder, L., Cook, D., and Djoko, S. 1994. Substructure discovery in the subdue system. In Proceedings of the AAAI'94 Workshop on Knowledge Discovery in Databases (KDD'94). 169--180.Google Scholar
Kailing, K., Kriegel, H., Schnauer, S., and Seidl, T. 2004. Efficient similarity search for hierarchical data in large databases. In Proceedings of the 9th International Conference on Extending Database Technology (EDBT'04). 676--693.Google Scholar
Kanehisa, M. and Goto, S. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucl. Acids Res. 28, 27--30.Google ScholarCross Ref
Kislicyn, S. S. 1964. On the selection of the kth element of an ordered set by pairwise comparisons. Sibirskii Mat. Z. (in Russian) 5, 557--564.Google Scholar
Kuramochi, M. and Karypis, G. 2001. Frequent subgraph discovery. In Proceedings of the 2001 International Conference on Data Mining (ICDM'01). 313--320. Google ScholarDigital Library
Messmer, B. and Bunke, H. 1998. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. Patt. Analys. Mach. Intell. 20, 493--504. Google ScholarDigital Library
Navarro, G. 2001. A guided tour to approximate string matching. ACM Comput. Surv. 33, 31--88. Google ScholarDigital Library
Nilsson, N. 1980. Principles of Artificial Intelligence. Morgan Kaufmann, Palo Alto, CA. Google ScholarDigital Library
Padberg, M. 1995. Linear Optimization and Extensions. Springer-Verlag, Berlin, Germany.Google Scholar
Petrakis, E. and Faloutsos, C. 1997. Similarity searching in medical image databases. Knowl. Data Eng. 9, 3, 435--447. Google ScholarDigital Library
Raymond, J., Gardiner, E., and Willett, P. 2002. Rascal: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45, 631--644.Google ScholarCross Ref
Shasha, D., Wang, J., and Giugno, R. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the 21th ACM Symposium on Principles of Database Systems (PODS'02). 39--52. Google ScholarDigital Library
Srinivasa, S. and Kumar, S. 2003. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proceedings of the 2003 International Conference on Very Large Data Bases (VLDB'03). 975--986. Google ScholarDigital Library
Ukkonen, E. 1992. Approximate string matching with q-grams and maximal matches. Theoret. Comput. Sci. 191--211. Google ScholarDigital Library
Ullmann, J. 1977. Binary n-gram technique for automatic correction of substitution, deletion, insertion, and reversal errors in words. Comput. J. 20, 141--147.Google ScholarCross Ref
Wang, J., Zhang, K., Jeong, K., and Shasha, D. 1994. A system for approximate tree matching. IEEE Trans. Knowl. Data Eng. 6, 559--571. Google ScholarDigital Library
Willett, P., Barnard, J., and Downs, G. 1998. Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983--996.Google ScholarCross Ref
Yan, X., Yu, P., and Han, J. 2004. Graph indexing: A frequent structure-based approach. In Proceedings of the 2004 ACM International Conference on Management of Data (SIGMOD'04). 335--346. Google ScholarDigital Library
Yan, X., Yu, P., and Han, J. 2005. Substructure similarity search in graph databases. In Proceedings of the 2005 ACM International Conference on Management of Data (SIGMOD'05). 766--777. Google ScholarDigital Library

Index Terms

Feature-based similarity search in graph structures

Recommendations

Efficient processing of graph similarity search

A graph similarity search is to find a set of graphs from a graph database that are similar to a given query graph. Existing works solve this problem by first defining a similarity measure between two graphs, and then presenting a filtering mechanism ...
Read More
Finding top-k similar graphs in graph databases
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology

Querying similar graphs in graph databases has been widely studied in graph query processing in recent years. Existing works mainly focus on subgraph similarity search and supergraph similarity search. In this paper, we study the problem of finding top-...
Read More
Index Structures for Fast Similarity Search for Binary Vectors

This article reviews index structures for fast similarity search for objects represented by binary vectors (with components equal to 0 or 1). Structures for both exact and approximate search by Hamming distance and other similarity measures are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 31, Issue 4
December 2006
357 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1189769
Issue’s Table of Contents

Copyright © 2006 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2006
Published in tods Volume 31, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Graph database
complexity
index
similarity search
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 1,527
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Feature-based similarity search in graph structures

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Efficient processing of graph similarity search

Finding top-k similar graphs in graph databases

Index Structures for Fast Similarity Search for Binary Vectors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Feature-based similarity search in graph structures

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Efficient processing of graph similarity search

Finding top-k similar graphs in graph databases

Index Structures for Fast Similarity Search for Binary Vectors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media