ABSTRACT
TAX is perhaps the best known extension of the relational algebra to handle queries to XML databases. One problem with TAX (as with many existing relational DBMSs) is that the semantics of terms in a TAX DB are not taken into account when answering queries. Thus, even though TAX answers queries with 100% precision, the recall of TAX is relatively low. Our TOSS system improves the recall of TAX via the concept of a similarity enhanced ontology (SEO). Intuitively, an ontology is a set of graphs describing relationships (such as isa, partof, etc.) between terms in a DB. An SEO also evaluates how similarities between terms (e.g. "J. Ullman", "Jeff Ullman", and "Jeffrey Ullman") affect ontologies. Finally, we show how the algebra proposed in TAX can be extended to take SEOs into account. The result is a system that provides a much higher answer quality than TAX does alone (quality is defined as the square root of the product of precision and recall). We experimentally evaluate the TOSS system on the DBLP and SIGMOD bibliographic databases and show that TOSS has acceptable performance.
- S. Al-Khalifa, C. Yu, and H. V. Jagadish. Querying structured text in an xml database. In Proc. ACM SIGMOD Conf. on Management of Data, San Diego, CA, 2003.]] Google ScholarDigital Library
- P. Bonatti, Y. Deng, and V. S. Subrahmanian. An ontology-extended relational algebra. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IEEE IRI 2003), 2003.]]Google ScholarCross Ref
- P. Bonatti, M. L. Sapino, and V. S. Subrahmanian. Merging heterogeneous security orderings. Journal of Computer Security, 5(1):3--29, 1997.]] Google ScholarDigital Library
- D. Calvanese, G. D. Giacomo, and M. Lenzerini. A framework for ontology integration. In Proc. of the First Semantic Web Working Symposium, pages 303--316, 2001.]]Google Scholar
- W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string metrics for matching names and records. In Proc. of the First Workshop on Data Cleaning, Record Linkage, and Object Consolidation, 2003.]]Google Scholar
- DBLP XML records. Available at http://dblp.uni-trier.de/xml/, Nov 2003.]]Google Scholar
- G. A. Miller et. al. WordNet - a lexical database for english. Cognitive Science Laboratory, Princeton University. Available at http://www.cogsci.princeton.edu/~wn/w3wn.html, 2000.]]Google Scholar
- H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, and K. Thompson. TAX: A tree algebra for XML. In Proc. DBPL Conf, Rome, Italy, 2001.]] Google ScholarDigital Library
- M. A. Jaro. Probabilistic linkage of large public health data files. Statistics in Medicine, 14:491--498, 1995.]]Google ScholarCross Ref
- D. Maluf and G. Wiederhold. Abstraction of representation for interoperation. Lecture Notes in AI, 1315, 1997.]] Google ScholarDigital Library
- P. Mitra, G. Wiederhold, and M. Kersten. A graph-oriented model for articulation of ontology interdependencies. In Proceedings Conference on Extending Database Technology 2000 (EDBT'2000), Konstanz, Germany, 2000.]] Google ScholarDigital Library
- A. Monge and C. Elkan. The field-matching problem: algorithm and applications. In Proc. of the Second International Conference on Knowledge Discovery and Data Mining, 1996.]]Google Scholar
- SIGMOD Record in XML. Available at http://www.acm.org/sigmod/record/xml/, Nov 2002.]]Google Scholar
- V. G. Voiskunskii. Evaluation of search results: A new approach. Journal of the American Society for Information Science, 48(2), Feb 1997.]] Google ScholarDigital Library
- G. Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, pages 38--49, Mar 1992.]] Google ScholarDigital Library
- G. Wiederhold. Intelligent integration of information. In Proc. 1993 ACM SIGMOD Conf. on Management of Data, pages 434--437, 1993.]] Google ScholarDigital Library
- G. Wiederhold. Interoperation, mediation and ontologies. In International Symp. on Fifth Generation Computer Systems, Workshop on Heterogeneous Cooperative Knowledge Bases, ICOT, pages 33--48, 1994.]]Google Scholar
- Apache Xindice XML database. Available at http://xml.apache.org/xindice/.]]Google Scholar
Recommendations
Schema Versioning in Multi-temporal XML Databases
ICIS '08: Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)Schema evolution keeps only the current data and the schema version after applying schema changes. On the contrary, schema versioning creates new schema versions and preserves old schema versions and their corresponding data. Much research work has ...
X-CM: Extending Entity Relationship Model for Conceptual Modeling in XML Databases
The emergence of XML as the de facto for data exchange in the World Wide Web and the increase popularity of XML in the business application have urge momentum research on way to generate a well-formed XML document to store and maintain it in the ...
Column-oriented Database Systems and XML Compression
DATA 2014: Proceedings of 3rd International Conference on Data Management Technologies and ApplicationsWith the renewed industrial and academic interest in Column-Oriented Database Management Systems, a lot of interest has been shown in the area of software optimizations designed to improve the efficiency of queries in the Column-Oriented domain. ...
Comments