research-article

Graph Query Reformulation with Diversity

Authors:
Davide Mottin

University of Trento, Trento, Italy

University of Trento, Trento, Italy
View Profile

,
Francesco Bonchi

Yahoo Labs, Barcelona, Spain

Yahoo Labs, Barcelona, Spain
View Profile

,
Francesco Gullo

Yahoo Labs, Barcelona, Spain

Yahoo Labs, Barcelona, Spain
View Profile

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2015Pages 825–834https://doi.org/10.1145/2783258.2783343

Published:10 August 2015Publication History

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 825–834

ABSTRACT

We study a problem of graph-query reformulation enabling explorative query-driven discovery in graph databases. Given a query issued by the user, the system, apart from returning the result patterns, also proposes a number of specializations (i.e., supergraphs) of the original query to facilitate the exploration of the results.

We formalize the problem of finding a set of reformulations of the input query by maximizing a linear combination of coverage (of the original query's answer set) and diversity among the specializations. We prove that our problem is hard, but also that a simple greedy algorithm achieves a (1/2)-approximation guarantee.

The most challenging step of the greedy algorithm is the computation of the specialization that brings the maximum increment to the objective function. To efficiently solve this step, we show how to compute the objective-function increment of a specialization linearly in the number of its results and derive an upper bound that we exploit to devise an efficient search-space visiting strategy.

An extensive evaluation on real and synthetic databases attests high efficiency and accuracy of our proposal.

References

C. C. Aggarwal and H. Wang. Managing and Mining Graph Data. Springer, 2010. Google ScholarDigital Library
M. K. Anand, S. Bowers, and B. Ludäscher. Techniques for efficiently querying scientific workflow provenance graphs. In EDBT, pages 287--298, 2010. Google ScholarDigital Library
R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval - the concepts and technology behind search, Second edition. Pearson Education Ltd., 2011. Google ScholarDigital Library
S. Berretti, A. D. Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content-based retrieval. TPAMI, 23(10):1089--1105, 2001. Google ScholarDigital Library
P. Boldi, F. Bonchi, C. Castillo, and S. Vigna. Query reformulation mining: models, patterns, and applications. Inf. Retr., 14(3), 2011. Google ScholarDigital Library
A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS, pages 155--166, 2012. Google ScholarDigital Library
J. Cheng, Y. Ke, A. W.-C. Fu, and J. X. Yu. Fast graph query processing with a low-cost index. VLDBJ, 20(4):521--539, 2011. Google ScholarDigital Library
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: Towards verification free query processing on graph databases. In SIGMOD, 2007. Google ScholarDigital Library
V. Dang and B. W. Croft. Query reformulation using anchor text. In WSDM, pages 41--50, 2010. Google ScholarDigital Library
M. Drosou and E. Pitoura. Disc diversity: Result diversification based on dissimilarity and coverage. PVLDB, 6(1):13--24, 2012. Google ScholarDigital Library
M. Ferrer, E. Valveny, F. Serratosa, I. Bardají, and H. Bunke. Graph-based k-means clustering: A comparison of the set median versus the generalized median graph. In X. Jiang and N. Petkov, editors, CAIP, volume 5702, pages 342--350. 2009. Google ScholarDigital Library
D. S. Hochbaum, editor. Approximation Algorithms for NP-hard Problems. PWS Publishing Co., 1997. Google ScholarDigital Library
J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining protein family specific residue packing patterns from protein structure graphs. In RECOMB, pages 308--315, 2004. Google ScholarDigital Library
J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: mining maximal frequent subgraphs from graph databases. In KDD, 2004. Google ScholarDigital Library
X. Huang, H. Cheng, J. Yang, J. X. Yu, H. Fei, and J. Huan. Semi-supervised clustering of graph objects: A subgraph mining approach. In DASFAA, pages 197--212, 2012. Google ScholarDigital Library
W. Lin, X. Xiao, J. Cheng, and S. S. Bhowmick. Efficient algorithms for generalized subgraph query processing. In CIKM, 2012. Google ScholarDigital Library
T. Meinl, M. Wörlein, O. Urzova, I. Fischer, and M. Philippsen. The parmol package for frequent subgraph mining. ECEEASST, 1, 2007.Google Scholar
C. Mishra and N. Koudas. Interactive query refinement. In EDBT, pages 862--873, 2009. Google ScholarDigital Library
P. Missier, N. W. Paton, and K. Belhajjame. Fine-grained and efficient lineage querying of collection-based workflow provenance. In EDBT, pages 299--310, 2010. Google ScholarDigital Library
S. Morishita and J. Sese. Transversing itemset lattices with statistical metric pruning. In SIGMOD, pages 226--236, 2000. Google ScholarDigital Library
D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, and Y. Velegrakis. A probabilistic optimization framework for the empty-answer problem. PVLDB, 6(14):1762--1773, 2013. Google ScholarDigital Library
S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004. Google ScholarDigital Library
S. A. Rahman et al. Small molecule subgraph detector (smsd) toolkit. J. Cheminformatics, 1:1--12, 2009.Google ScholarCross Ref
S. Ranu, M. Hoang, and A. Singh. Answering top-k representative queries on graph databases. In SIGMOD, pages 1163--1174, 2014. Google ScholarDigital Library
S. Y. Rieh and H. Xie. Analysis of multiple query reformulations on the web: the interactive information retrieval context. Inf. Process. Manage., 42(3), 2006. Google ScholarDigital Library
C. E. Scheidegger, H. T. Vo, D. Koop, J. Freire, and C. T. Silva. Querying and re-using workflows with vstrails. In SIGMOD, 2008. Google ScholarDigital Library
H. Shang, X. Lin, Y. Zhang, J. X. Yu, and W. Wang. Connected substructure similarity search. In SIGMOD, pages 903--914, 2010. Google ScholarDigital Library
A. Shokoufandeh and S. J. Dickinson. Graph-theoretical methods in computer vision. In Theoretical Aspects of Computer Science, 2000. Google Scholar
J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarDigital Library
L. G. Valiant. The complexity of computing the permanent. TCS, 8(2):189--201, 1979.Google ScholarCross Ref
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD, pages 433--444, 2008. Google ScholarDigital Library
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. TODS, 30(4):960--993, 2005. Google ScholarDigital Library
J. Yao, B. Cui, L. Hua, and Y. Huang. Keyword query reformulation on structured data. In ICDE, 2012. Google ScholarDigital Library
D. Yuan and P. Mitra. Lindex: a lattice-based index for graph databases. VLDBJ, 22(2):229--252, 2013. Google ScholarDigital Library

Index Terms

Graph Query Reformulation with Diversity
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals

Recommendations

Graph-Query Suggestions for Knowledge Graph Exploration
WWW '20: Proceedings of The Web Conference 2020

We consider the task of exploratory search through graph queries on knowledge graphs. We propose to assist the user by expanding the query with intuitive suggestions to provide a more informative (full) query that can retrieve more detailed and relevant ...
Read More
Knowledge Graphs

In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After ...
Read More
Diversified Subgraph Query Generation with Group Fairness
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

This paper investigates the problem of subgraph query generation with output that satisfies both diversity and fairness constraints. Given a set of groups with associated cardinality requirements, it is to compute subgraph queries with diversified ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph databases
graph queries
query reformulation
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 722
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Graph Query Reformulation with Diversity

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Graph-Query Suggestions for Knowledge Graph Exploration

Knowledge Graphs

Diversified Subgraph Query Generation with Group Fairness