ABSTRACT
The provenance of data has recently been recognized as central tothe trust one places in data. It is also important to annotation, todata integration and to probabilistic databases. Three workshops havebeen held on the topic, and it has been the focus of several researchprojects and prototype systems. This tutorial will attempt to providean overview of research in provenance in databases with a focus onrecent database research and technology in this area. This tutorialis aimed at a general database research audience and at people whowork with scientific data.
Supplemental Material
- O. Benjelloun, A. D. Sarma, A. Y. Halevy, and J. Widom. ULDBs: Databases with Uncertainty and Lineage. In Very Large Data Bases (VLDB), pages 953--964, 2006. Google ScholarDigital Library
- D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. Very Large Data Bases (VLDB) Journal, 14(4):373--396, 2005.Google ScholarCross Ref
- biodas.org. http://biodas.org.Google Scholar
- R. Bose and J. Frew. Lineage Retrieval for Scientific Data Processing: A Survey. ACM Computing Survey, 37(1):1--28, 2005. Google ScholarDigital Library
- S. Bowers, T. McPhillips, B. Ludäscher, S. Cohen, and S. B. Davidson. A Model for User-Oriented Data Provenance in Pipelined Scientific Workflow. In International Provenance and Annotation Workshop (IPAW'06), Chicago, Illinois, 2006. Google ScholarDigital Library
- P. Buneman, A. Chapman, and J. Cheney. Provenance Management in Curated Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD), pages 539--550, 2006. Google ScholarDigital Library
- P. Buneman, J. Cheney, and S. VanSummeren. On the Expressiveness of Implicit Provenance in Query and Update Languages. In International Conference on Database Theory (ICDT), pages 209--223, 2007. Google ScholarDigital Library
- P. Buneman, S. Khanna, and W. C. Tan. Why and Where: A Characterization of Data Provenance. In International Conference on Database Theory (ICDT), pages 316--330, 2001. Google ScholarDigital Library
- P. Buneman, S. Khanna, and W. C. Tan. On Propagation of Deletions and Annotations Through Views. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of database systems (PODS), pages 150--158, 2002. Google ScholarDigital Library
- L. Chiticariu and W. C. Tan. Debugging Schema Mappings with Routes. In Very Large Data Bases (VLDB), pages 79--90, 2006. Google ScholarDigital Library
- Y. Cui, J. Widom, and J. L. Wiener. Tracing the Lineage of View Data in a Warehousing Environment. ACM Transactionson Database Systems, 25(2):179--227, 2000. Google ScholarDigital Library
- F. Geerts, A. Kementsietsidis, and D. Milano. MONDRIAN: Annotating and Querying Databases through Colors and Blocks. In International Conference on Data Engineering (ICDE), page 82, 2006. Google ScholarDigital Library
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance Semirings. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of database systems (PODS) (To appear), 2007. Google ScholarDigital Library
- Harvard University Art Museums, Provenance Research. http://www.artmuseums.harvard.edu/provenance/, cited on 14 November 2006.Google Scholar
- Z. Ives, N. Khandelwal, A. Kapur, and M. Cakir. Orchestra: Rapid, Collaborative Sharing of Dynamic Data. In Conference on Innovative Database Systems Research (CIDR), 2005.Google Scholar
- Y. Simmhan, B. Plale, and D. Gannon. A Survey of Data Provenance in E-Science. SIGMOD Record, 34:31--36, 2005. Google ScholarDigital Library
- Y. L. Simmhan, B. Plale, and D. Gannon. A Framework for Collecting Provenance in Data-Centric Scientific Workflows. In International Conference on Web Service (ICWS), 2006. Google ScholarDigital Library
- M. Szomszor and L. Moreau. Recording and Reasoning over Data Provenance in Web and Grid Services. In International Conference on Ontologies, Databases and Applications of SEmantics (ODBASE), pages 603--620, 2003.Google ScholarCross Ref
- W. C. Tan. Containment of Relational Queries with Annotation Propagation. In Database Programming Languages (DBPL), pages 37--53, 2003.Google Scholar
- N. E. Taylor and Z. Ives. Reconciling while Tolerating Disagreement in Collaborative Data Sharing. In Proceedings of the ACM SIGMOD International Conference on Managementof Data (SIGMOD), pages 13--24, 2006. Google ScholarDigital Library
- Y. R. Wang and S. E. Madnick. A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective. In Very Large Data Bases (VLDB), pages 519--538, 1990.Google Scholar
- J. Widom. Trio: A System for Integrated Management of Data, Accuracy, and Lineage. In Conference on Innovative Database Systems Research (CIDR), pages 262--276, 2005.Google Scholar
- S. C. Wong, S. Miles, W. Fang, P. Groth, and L. Moreau. Provenance-based Validation of E-Science Experiments. In Proceedings of Internation Semantic Web Conference (ISWC), pages 801--815, 2005. Google ScholarDigital Library
- A. Woodruff and M. Stonebraker. Supporting Fine-grained Data Lineage in a Database Visualization Environment. In International Conference on Data Engineering (ICDE), pages 91--102, 1997. Google ScholarDigital Library
- J. Zhao, C. Wroe, C. Goble, R. Stevens, D. Quan, and M. Greenwood. Using Semantic Web Technologies for Representing e-Science Provenance. In International Semantic Web Conference (ISWC), pages 92--106, 2004.Google ScholarDigital Library
Index Terms
- Provenance in databases
Recommendations
Provenance semirings
PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsWe show that relational algebra calculations for incomplete databases, probabilistic databases, bag semantics and why-provenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance ...
Data Provenance for Historical Queries in Relational Database
Compute '15: Proceedings of the 8th Annual ACM India ConferenceCapturing, modeling, and querying data provenance in databases has gained considerable importance in the last decade. All kinds of applications developed on top of databases, now a days collect provenance for various purposes like trustworthiness of ...
Efficient Multi-depth Querying on Provenance of Relational Queries Using Graph Database
COMPUTE '16: Proceedings of the 9th Annual ACM India ConferenceData Provenance is the history associated with that data. It constitutes the origin, creation, processing, and archiving of data. In today's Internet era, it has gained significant importance for database analytics. Most of the provenance models store ...
Comments