ABSTRACT
We study a problem of graph-query reformulation enabling explorative query-driven discovery in graph databases. Given a query issued by the user, the system, apart from returning the result patterns, also proposes a number of specializations (i.e., supergraphs) of the original query to facilitate the exploration of the results.
We formalize the problem of finding a set of reformulations of the input query by maximizing a linear combination of coverage (of the original query's answer set) and diversity among the specializations. We prove that our problem is hard, but also that a simple greedy algorithm achieves a (1/2)-approximation guarantee.
The most challenging step of the greedy algorithm is the computation of the specialization that brings the maximum increment to the objective function. To efficiently solve this step, we show how to compute the objective-function increment of a specialization linearly in the number of its results and derive an upper bound that we exploit to devise an efficient search-space visiting strategy.
An extensive evaluation on real and synthetic databases attests high efficiency and accuracy of our proposal.
- C. C. Aggarwal and H. Wang. Managing and Mining Graph Data. Springer, 2010. Google ScholarDigital Library
- M. K. Anand, S. Bowers, and B. Ludäscher. Techniques for efficiently querying scientific workflow provenance graphs. In EDBT, pages 287--298, 2010. Google ScholarDigital Library
- R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval - the concepts and technology behind search, Second edition. Pearson Education Ltd., 2011. Google ScholarDigital Library
- S. Berretti, A. D. Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content-based retrieval. TPAMI, 23(10):1089--1105, 2001. Google ScholarDigital Library
- P. Boldi, F. Bonchi, C. Castillo, and S. Vigna. Query reformulation mining: models, patterns, and applications. Inf. Retr., 14(3), 2011. Google ScholarDigital Library
- A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS, pages 155--166, 2012. Google ScholarDigital Library
- J. Cheng, Y. Ke, A. W.-C. Fu, and J. X. Yu. Fast graph query processing with a low-cost index. VLDBJ, 20(4):521--539, 2011. Google ScholarDigital Library
- J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: Towards verification free query processing on graph databases. In SIGMOD, 2007. Google ScholarDigital Library
- V. Dang and B. W. Croft. Query reformulation using anchor text. In WSDM, pages 41--50, 2010. Google ScholarDigital Library
- M. Drosou and E. Pitoura. Disc diversity: Result diversification based on dissimilarity and coverage. PVLDB, 6(1):13--24, 2012. Google ScholarDigital Library
- M. Ferrer, E. Valveny, F. Serratosa, I. Bardají, and H. Bunke. Graph-based k-means clustering: A comparison of the set median versus the generalized median graph. In X. Jiang and N. Petkov, editors, CAIP, volume 5702, pages 342--350. 2009. Google ScholarDigital Library
- D. S. Hochbaum, editor. Approximation Algorithms for NP-hard Problems. PWS Publishing Co., 1997. Google ScholarDigital Library
- J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining protein family specific residue packing patterns from protein structure graphs. In RECOMB, pages 308--315, 2004. Google ScholarDigital Library
- J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: mining maximal frequent subgraphs from graph databases. In KDD, 2004. Google ScholarDigital Library
- X. Huang, H. Cheng, J. Yang, J. X. Yu, H. Fei, and J. Huan. Semi-supervised clustering of graph objects: A subgraph mining approach. In DASFAA, pages 197--212, 2012. Google ScholarDigital Library
- W. Lin, X. Xiao, J. Cheng, and S. S. Bhowmick. Efficient algorithms for generalized subgraph query processing. In CIKM, 2012. Google ScholarDigital Library
- T. Meinl, M. Wörlein, O. Urzova, I. Fischer, and M. Philippsen. The parmol package for frequent subgraph mining. ECEEASST, 1, 2007.Google Scholar
- C. Mishra and N. Koudas. Interactive query refinement. In EDBT, pages 862--873, 2009. Google ScholarDigital Library
- P. Missier, N. W. Paton, and K. Belhajjame. Fine-grained and efficient lineage querying of collection-based workflow provenance. In EDBT, pages 299--310, 2010. Google ScholarDigital Library
- S. Morishita and J. Sese. Transversing itemset lattices with statistical metric pruning. In SIGMOD, pages 226--236, 2000. Google ScholarDigital Library
- D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, and Y. Velegrakis. A probabilistic optimization framework for the empty-answer problem. PVLDB, 6(14):1762--1773, 2013. Google ScholarDigital Library
- S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004. Google ScholarDigital Library
- S. A. Rahman et al. Small molecule subgraph detector (smsd) toolkit. J. Cheminformatics, 1:1--12, 2009.Google ScholarCross Ref
- S. Ranu, M. Hoang, and A. Singh. Answering top-k representative queries on graph databases. In SIGMOD, pages 1163--1174, 2014. Google ScholarDigital Library
- S. Y. Rieh and H. Xie. Analysis of multiple query reformulations on the web: the interactive information retrieval context. Inf. Process. Manage., 42(3), 2006. Google ScholarDigital Library
- C. E. Scheidegger, H. T. Vo, D. Koop, J. Freire, and C. T. Silva. Querying and re-using workflows with vstrails. In SIGMOD, 2008. Google ScholarDigital Library
- H. Shang, X. Lin, Y. Zhang, J. X. Yu, and W. Wang. Connected substructure similarity search. In SIGMOD, pages 903--914, 2010. Google ScholarDigital Library
- A. Shokoufandeh and S. J. Dickinson. Graph-theoretical methods in computer vision. In Theoretical Aspects of Computer Science, 2000. Google Scholar
- J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarDigital Library
- L. G. Valiant. The complexity of computing the permanent. TCS, 8(2):189--201, 1979.Google ScholarCross Ref
- X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD, pages 433--444, 2008. Google ScholarDigital Library
- X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. TODS, 30(4):960--993, 2005. Google ScholarDigital Library
- J. Yao, B. Cui, L. Hua, and Y. Huang. Keyword query reformulation on structured data. In ICDE, 2012. Google ScholarDigital Library
- D. Yuan and P. Mitra. Lindex: a lattice-based index for graph databases. VLDBJ, 22(2):229--252, 2013. Google ScholarDigital Library
Index Terms
- Graph Query Reformulation with Diversity
Recommendations
Graph-Query Suggestions for Knowledge Graph Exploration
WWW '20: Proceedings of The Web Conference 2020We consider the task of exploratory search through graph queries on knowledge graphs. We propose to assist the user by expanding the query with intuitive suggestions to provide a more informative (full) query that can retrieve more detailed and relevant ...
Knowledge Graphs
In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After ...
Diversified Subgraph Query Generation with Group Fairness
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data MiningThis paper investigates the problem of subgraph query generation with output that satisfies both diversity and fairness constraints. Given a set of groups with associated cardinality requirements, it is to compute subgraph queries with diversified ...
Comments