skip to main content
10.1145/2783258.2783343acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Graph Query Reformulation with Diversity

Published:10 August 2015Publication History

ABSTRACT

We study a problem of graph-query reformulation enabling explorative query-driven discovery in graph databases. Given a query issued by the user, the system, apart from returning the result patterns, also proposes a number of specializations (i.e., supergraphs) of the original query to facilitate the exploration of the results.

We formalize the problem of finding a set of reformulations of the input query by maximizing a linear combination of coverage (of the original query's answer set) and diversity among the specializations. We prove that our problem is hard, but also that a simple greedy algorithm achieves a (1/2)-approximation guarantee.

The most challenging step of the greedy algorithm is the computation of the specialization that brings the maximum increment to the objective function. To efficiently solve this step, we show how to compute the objective-function increment of a specialization linearly in the number of its results and derive an upper bound that we exploit to devise an efficient search-space visiting strategy.

An extensive evaluation on real and synthetic databases attests high efficiency and accuracy of our proposal.

References

  1. C. C. Aggarwal and H. Wang. Managing and Mining Graph Data. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. K. Anand, S. Bowers, and B. Ludäscher. Techniques for efficiently querying scientific workflow provenance graphs. In EDBT, pages 287--298, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval - the concepts and technology behind search, Second edition. Pearson Education Ltd., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Berretti, A. D. Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content-based retrieval. TPAMI, 23(10):1089--1105, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Boldi, F. Bonchi, C. Castillo, and S. Vigna. Query reformulation mining: models, patterns, and applications. Inf. Retr., 14(3), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS, pages 155--166, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cheng, Y. Ke, A. W.-C. Fu, and J. X. Yu. Fast graph query processing with a low-cost index. VLDBJ, 20(4):521--539, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: Towards verification free query processing on graph databases. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Dang and B. W. Croft. Query reformulation using anchor text. In WSDM, pages 41--50, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Drosou and E. Pitoura. Disc diversity: Result diversification based on dissimilarity and coverage. PVLDB, 6(1):13--24, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Ferrer, E. Valveny, F. Serratosa, I. Bardají, and H. Bunke. Graph-based k-means clustering: A comparison of the set median versus the generalized median graph. In X. Jiang and N. Petkov, editors, CAIP, volume 5702, pages 342--350. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. S. Hochbaum, editor. Approximation Algorithms for NP-hard Problems. PWS Publishing Co., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining protein family specific residue packing patterns from protein structure graphs. In RECOMB, pages 308--315, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: mining maximal frequent subgraphs from graph databases. In KDD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Huang, H. Cheng, J. Yang, J. X. Yu, H. Fei, and J. Huan. Semi-supervised clustering of graph objects: A subgraph mining approach. In DASFAA, pages 197--212, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Lin, X. Xiao, J. Cheng, and S. S. Bhowmick. Efficient algorithms for generalized subgraph query processing. In CIKM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Meinl, M. Wörlein, O. Urzova, I. Fischer, and M. Philippsen. The parmol package for frequent subgraph mining. ECEEASST, 1, 2007.Google ScholarGoogle Scholar
  18. C. Mishra and N. Koudas. Interactive query refinement. In EDBT, pages 862--873, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Missier, N. W. Paton, and K. Belhajjame. Fine-grained and efficient lineage querying of collection-based workflow provenance. In EDBT, pages 299--310, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Morishita and J. Sese. Transversing itemset lattices with statistical metric pruning. In SIGMOD, pages 226--236, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, and Y. Velegrakis. A probabilistic optimization framework for the empty-answer problem. PVLDB, 6(14):1762--1773, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. A. Rahman et al. Small molecule subgraph detector (smsd) toolkit. J. Cheminformatics, 1:1--12, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Ranu, M. Hoang, and A. Singh. Answering top-k representative queries on graph databases. In SIGMOD, pages 1163--1174, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Y. Rieh and H. Xie. Analysis of multiple query reformulations on the web: the interactive information retrieval context. Inf. Process. Manage., 42(3), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. E. Scheidegger, H. T. Vo, D. Koop, J. Freire, and C. T. Silva. Querying and re-using workflows with vstrails. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Shang, X. Lin, Y. Zhang, J. X. Yu, and W. Wang. Connected substructure similarity search. In SIGMOD, pages 903--914, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Shokoufandeh and S. J. Dickinson. Graph-theoretical methods in computer vision. In Theoretical Aspects of Computer Science, 2000. Google ScholarGoogle Scholar
  29. J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. G. Valiant. The complexity of computing the permanent. TCS, 8(2):189--201, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  31. X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD, pages 433--444, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. TODS, 30(4):960--993, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Yao, B. Cui, L. Hua, and Y. Huang. Keyword query reformulation on structured data. In ICDE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Yuan and P. Mitra. Lindex: a lattice-based index for graph databases. VLDBJ, 22(2):229--252, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Graph Query Reformulation with Diversity

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2015
        2378 pages
        ISBN:9781450336642
        DOI:10.1145/2783258

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 August 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader