research-article

Learning Join Queries from User Examples

Authors:
Angela Bonifati

University of Lyon 1, University of Lille 1, Inria LINKS

University of Lyon 1, University of Lille 1, Inria LINKS
View Profile

,
Radu Ciucanu

University of Oxford, University of Lille 1, Inria LINKS

University of Oxford, University of Lille 1, Inria LINKS
View Profile

,
Sławek Staworko

University of Lille 3, Inria LINKS, University of Edinburgh

University of Lille 3, Inria LINKS, University of Edinburgh
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 40 Issue 4Article No.: 24pp 1–38https://doi.org/10.1145/2818637

Published:04 January 2016Publication History

ACM Transactions on Database Systems

Abstract

We investigate the problem of learning join queries from user examples. The user is presented with a set of candidate tuples and is asked to label them as positive or negative examples, depending on whether or not she would like the tuples as part of the join result. The goal is to quickly infer an arbitrary n-ary join predicate across an arbitrary number m of relations while keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints across the involved relations. Inferring the join predicate across multiple relations when the referential constraints are unknown may occur in several applications, such as data integration, reverse engineering of database queries, and schema inference. In such scenarios, the number of tuples involved in the join is typically large. We introduce a set of strategies that let us inspect the search space and aggressively prune what we call uninformative tuples, and we directly present to the user the informative ones—that is, those that allow the user to quickly find the goal query she has in mind. In this article, we focus on the inference of joins with equality predicates and also allow disjunctive join predicates and projection in the queries. We precisely characterize the frontier between tractability and intractability for the following problems of interest in these settings: consistency checking, learnability, and deciding the informativeness of a tuple. Next, we propose several strategies for presenting tuples to the user in a given order that allows minimization of the number of interactions. We show the efficiency of our approach through an experimental study on both benchmark and synthetic datasets.

References

S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
A. Abouzied, D. Angluin, C. H. Papadimitriou, J. M. Hellerstein, and A. Silberschatz. 2013. Learning and verifying quantified Boolean queries by example. In Proceedings of the PODS Conference. 49--60. Google ScholarDigital Library
A. Abouzied, J. M. Hellerstein, and A. Silberschatz. 2012. Playful query specification with DataPlay. Proceedings of the VLDB Endowment 5, 12, 1938--1941. Google ScholarDigital Library
B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. 2011a. Designing and refining schema mappings via data examples. In Proceedings of the SIGMOD Conference. 133--144. Google ScholarDigital Library
B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. 2011b. EIRENE: Interactive design and refinement of schema mappings via data examples. Proceedings of the VLDB Endowment 4, 12, 1414--1417.Google ScholarDigital Library
D. Angluin. 1988. Queries and concept learning. Machine Learning 2, 4, 319--342. Google ScholarDigital Library
F. Bancilhon. 1978. On the completeness of query languages for relational data bases. In Proceedings of the MFCS Conference. 112--123.Google ScholarCross Ref
G. J. Bex, W. Gelade, F. Neven, and S. Vansummeren. 2010. Learning deterministic regular expressions for the inference of schemas from XML data. ACM Transactions on the Web 4, 4, Article No. 14. Google ScholarDigital Library
A. Bonifati, R. Ciucanu, and A. Lemay. 2015. Learning path queries on graph databases. In Proceedings of the EDBT Conference. 109--120.Google Scholar
A. Bonifati, R. Ciucanu, and S. Staworko. 2014a. Interactive inference of join queries. Proceedings of the EDBT Conference. 451--462.Google Scholar
A. Bonifati, R. Ciucanu, and S. Staworko. 2014b. Interactive join query inference with JIM. Proceedings of the VLDB Endowment 7, 13, 1541--1544. Google ScholarDigital Library
S. Cohen and Y. Weiss. 2013. Certain and possible XPath answers. In Proceedings of the ICDT Conference. 237--248. Google ScholarDigital Library
A. Das Sarma, A. Parameswaran, H. Garcia-Molina, and J. Widom. 2010. Synthesizing view definitions from data. In Proceedings of the ICDT Conference. 89--103. Google ScholarDigital Library
W. Fan, F. Geerts, J. Li, and M. Xiong. 2011. Discovering conditional functional dependencies. IEEE Transactions on Knowledge and Data Engineering 23, 5, 683--698. Google ScholarDigital Library
G. Fletcher, M. Gyssens, J. Paredaens, and D. Van Gucht. 2009. On the expressive power of the relational algebra on finite sets of relation pairs. IEEE Transactions on Knowledge and Data Engineering 21, 6, 939--942. Google ScholarDigital Library
M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. 2011. CrowdDB: Answering queries with crowdsourcing. In Proceedings of the SIGMOD Conference. 61--72. Google ScholarDigital Library
E. M. Gold. 1967. Language identification in the limit. Information and Control 10, 5, 447--474.Google ScholarCross Ref
E. M. Gold. 1978. Complexity of automaton identification from given data. Information and Control 37, 3, 302--320.Google ScholarCross Ref
G. Gottlob and P. Senellart. 2010. Schema mapping discovery from data instances. Journal of the ACM 57, 2, Article No. 6. Google ScholarDigital Library
T. Imielinski and W. Lipski Jr. 1984. Incomplete information in relational databases. Journal of the ACM 31, 4, 761--791. Google ScholarDigital Library
H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. 2007. Making database systems usable. In Proceedings of the SIGMOD Conference. 13--24. Google ScholarDigital Library
M. J. Kearns and U. V. Vazirani. 1994. An Introduction to Computational Learning Theory. MIT Press, Cambridge, MA. Google ScholarDigital Library
A. Lemay, S. Maneth, and J. Niehren. 2010. A learning algorithm for top-down XML transformations. In Proceedings of the PODS Conference. 285--296. Google ScholarDigital Library
A. Marcus, E. Wu, D. Karger, S. Madden, and R. Miller. 2011. Human-powered sorts and joins. Proceedings of the VLDB Endowment 5, 1, 13--24. Google ScholarDigital Library
A. Nandi and H. V. Jagadish. 2011. Guided interaction: Rethinking the query-result paradigm. Proceedings of the VLDB Endowment 4, 12, 1466--1469.Google ScholarDigital Library
J. Paredaens. 1978. On the expressive power of the relational algebra. Information Processing Letters 7, 2, 107--111.Google ScholarCross Ref
L. Qian, M. J. Cafarella, and H. V. Jagadish. 2012. Sample-driven schema mapping. In Proceedings of the SIGMOD Conference. 73--84. Google ScholarDigital Library
S. J. Russell and P. Norvig. 2010. Artificial Intelligence: A Modern Approach. Pearson Education. Google ScholarDigital Library
T. Sellam and M. L. Kersten. 2013. Meet Charles, big data query advisor. In Proceedings of the CIDR Conference. 1--6.Google Scholar
S. Staworko and P. Wieczorek. 2012. Learning twig and path queries. In Proceedings of the ICDT Conference. 140--154. Google ScholarDigital Library
B. ten Cate, V. Dalmau, and P. G. Kolaitis. 2013. Learning schema mappings. ACM Transactions on Database Systems 38, 4, 28. Google ScholarDigital Library
Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. 2009. Query by output. In Proceedings of the SIGMOD Conference. 535--548. Google ScholarDigital Library
D. Van Gucht. 1987. On the expressive power of the extended relational algebra for the unnormalized relational model. In Proceedings of the PODS Conference. 302--312. Google ScholarDigital Library
J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. 2013. Leveraging transitive relations for crowdsourced joins. In Proceedings of the SIGMOD Conference. 229--240. Google ScholarDigital Library
Z. Yan, N. Zheng, Z. G. Ives, P. P. Talukdar, and C. Yu. 2013. Actively soliciting feedback for query answers in keyword search-based data integration. Proceedings of the VLDB Endowment 6, 3, 205--216. Google ScholarDigital Library
M. Zhang, H. Elmeleegy, C. M. Procopiuc, and D. Srivastava. 2013. Reverse engineering complex join queries. In Proceedings of the SIGMOD Conference. 809--820. Google ScholarDigital Library
M. M. Zloof. 1975. Query by example. In Proceedings of the AFIPS Conference. 431--438. Google ScholarDigital Library

Index Terms

Learning Join Queries from User Examples
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Discovering queries based on example tuples
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

An enterprise information worker is often aware of a few example tuples (but not the entire result) that should be present in the output of the query. We study the problem of discovering the minimal project join query that contains the given example ...
Read More
Reverse engineering complex join queries
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

We study the following problem: Given a database D with schema G and an output table Out, compute a join query Q that generates OUT from D. A simpler variant allows Q to return a superset of Out. This problem has numerous applications, both by itself, ...
Read More
Processing multi-join queries
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Database Systems Volume 40, Issue 4
Special Issue: Invited 2014 PODS and EDBT Revised Articles
February 2016
248 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2866579
Editor:
Christian S. Jensen
Aalborg University, Denmark
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 January 2016
- Accepted: 1 August 2015
- Revised: 1 June 2015
- Received: 1 December 2014
Published in tods Volume 40, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SQL query discovery
incomplete schema
reverse engineering
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 625
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning Join Queries from User Examples

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Discovering queries based on example tuples

Reverse engineering complex join queries

Processing multi-join queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning Join Queries from User Examples

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Discovering queries based on example tuples

Reverse engineering complex join queries

Processing multi-join queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media