Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. Schema matching is one of the basic operations required by the process of data and schema integration, and thus has a great effect on its outcomes, whether these involve targeted content delivery, view integration, database integration, query rewriting over heterogeneous sources, duplicate data elimination, or automatic streamlining of workflow activities that involve heterogeneous data sources. Although schema matching research has been ongoing for over 25 years, more recently a realization has emerged that schema matchers are inherently uncertain. Since 2003, work on the uncertainty in schema matching has picked up, along with research on uncertainty in other areas of data management. This lecture presents various aspects of uncertainty in schema matching within a single unified framework. We introduce basic formulations of uncertainty and provide several alternative representations of schema matching uncertainty. Then, we cover two common methods that have been proposed to deal with uncertainty in schema matching, namely ensembles, and top-K matchings, and analyze them in this context. We conclude with a set of real-world applications.
Cited By
- Shraga R, Gal A and Drusinsky D (2023). One Algorithm to Rule Them All: On the Changing Roles of Humans in Data Integration, Computer, 56:4, (102-109), Online publication date: 1-Apr-2023.
- Shraga R and Gal A (2022). PoWareMatch: A Quality-aware Deep Learning Approach to Improve Human Schema Matching, Journal of Data and Information Quality, 14:3, (1-27), Online publication date: 30-Sep-2022.
- Geisler S, Vidal M, Cappiello C, Lóscio B, Gal A, Jarke M, Lenzerini M, Missier P, Otto B, Paja E, Pernici B and Rehof J (2021). Knowledge-Driven Data Ecosystems Toward Data Transparency, Journal of Data and Information Quality, 14:1, (1-12), Online publication date: 31-Mar-2022.
- Ding G, Sun S and Wang G (2019). Schema matching based on SQL statements, Distributed and Parallel Databases, 38:1, (193-226), Online publication date: 1-Mar-2020.
- Ackerman R, Gal A, Sagi T and Shraga R A Cognitive Model of Human Bias in Matching PRICAI 2019: Trends in Artificial Intelligence, (632-646)
- Shraga R, Gal A and Roitman H What Type of a Matcher Are You? Proceedings of the Workshop on Human-In-the-Loop Data Analytics, (1-7)
- Shraga R (Artificial) Mind over Matter Proceedings of the 2018 International Conference on Management of Data, (1813-1815)
- Mendling J, Weber I, Aalst W, Brocke J, Cabanillas C, Daniel F, Debois S, Ciccio C, Dumas M, Dustdar S, Gal A, García-Bañuelos L, Governatori G, Hull R, Rosa M, Leopold H, Leymann F, Recker J, Reichert M, Reijers H, Rinderle-Ma S, Solti A, Rosemann M, Schulte S, Singh M, Slaats T, Staples M, Weber B, Weidlich M, Weske M, Xu X and Zhu L (2018). Blockchains for Business Process Management - Challenges and Opportunities, ACM Transactions on Management Information Systems, 9:1, (1-16), Online publication date: 31-Mar-2018.
- Sagi T and Gal A (2018). Non-binary evaluation measures for big data integration, The VLDB Journal — The International Journal on Very Large Data Bases, 27:1, (105-126), Online publication date: 1-Feb-2018.
- van der Aa H, Leopold H and Reijers H (2017). Comparing textual descriptions to process models - The automatic detection of inconsistencies, Information Systems, 64:C, (447-460), Online publication date: 1-Mar-2017.
Recommendations
A schema matching-based approach to XML schema clustering
iiWAS '08: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & ServicesThe relationship between XML data clustering and schema matching is bidirectional. On one side, clustering techniques have been adopted to improve matching performance, and on the other side schema matching is the backbone of the clustering technique. ...