ABSTRACT
In this paper we address the problem of evaluating XPath queries over streaming XML data. We consider a practical XPath fragment called Univariate XPath, which includes the commonly used '/' and '//' axes and allows *-node tests and arbitrarily nested predicates. It is well known that this XPath fragment can be efficiently evaluated in O(|D||Q|) time in the non-streaming environment, where |D| is the document size and |Q| is the query size. However, this is not necessarily true in the streaming environment, since streaming algorithms have to satisfy stricter requirement than non-streaming algorithms, in that all data must be read sequentially in one pass. Therefore, it is not surprising that state-of-the-art stream-querying algorithms have higher time complexity than O(|D||Q|).
In this paper we revisit the XPath stream-querying problem, and show that Univariate XPath can be efficiently evaluated in O|D||Q|) time in the streaming environment. Specifically, we propose two O(|D||Q|)-time stream-querying algorithms, LQ and EQ, which are based on the lazy strategy and on the eager strategy, respectively. To the best of our knowledge, LQ and EQ are the first XPath stream-querying algorithms that achieve O(|D||Q|) time performance. Further, our algorithms achieve O(|D||Q|) time performance without trading off space performance. Instead, they have better buffering-space performance than state-of-the-art stream-querying algorithms. In particular, EQ achieves optimal buffering-space performance. Our experimental results show that our algorithms have not only good theoretical complexity but also considerable practical performance advantages over existing algorithms.
- S. Al--Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural joins: a primitive for efficient XML query pattern matching. ICDE Conference, 2002. Google ScholarDigital Library
- Apache. Apache Xerces2-Java XML Parser. http://xerces.apache.org/xerces2-j/.Google Scholar
- Z. Bar-Yossef, M. Fontoura, and V. Josifovski. On the memory requirements of XPath evaluation over XML streams. Proceedings of PODS, 2004. Google ScholarDigital Library
- Z. Bar-Yossef, M. Fontoura, and V. Josifovski. Buffering in query evaluation over XML streams. Proceedings of PODS, 2005. Google ScholarDigital Library
- C. Barton, P. Charles, D. Goyal, M. Raghavachari, M. Fontoura, and V. Josifovski. Streaming XPath processing with forward and backward axes. ICDE Conference, 2003.Google ScholarCross Ref
- S. Boag, D. Chamberlin, M. F. Ferandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language. W3C, http://www.w3.org/TR/xquery/, 2003.Google Scholar
- D. Brownell and D. Megginson. SAX: Simple API for XML. SAX Project Organization, http://www.saxproject.org/.Google Scholar
- N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. SIGMOD, 2002. Google ScholarDigital Library
- C. Y. Chan, P. Felber, M. N. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPat expressions. ICDE Conference, 2002. Google ScholarDigital Library
- Y. Chen, S. B. Davidson, and Y. Zheng. An efficient XPath query processor for XML streams. ICDE, 2006. Google ScholarDigital Library
- B. Choi. What are real DTDs like? WebDB Workshop, 2002.Google Scholar
- J. Clark. XML Transformations (XSLT) Version 1.0. W3C, http://www.w3.org/TR/xslt/, 1999.Google Scholar
- J. Clark and S. DeRose. XML Path Language (XPath) Version 1.0. W3C, http://www.w3.org/TR/xpath/, 1999.Google Scholar
- Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. M. Fischer. Path sharing and predicate evaluation for high-performance XML filtering. ACM Transactions on Database Systems (TODS), 28:467--516, 2003. Google ScholarDigital Library
- A. L. Diaz and D. Lovell. IBM's XML generator. http://www.alphaworks.ibm.com/tech/xmlgenerator.Google Scholar
- M. Fernandez and et al. Galax: an implementation of XQuery. http://www.galaxquery.org/.Google Scholar
- D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. J. Carey, A. Sundararajan,and G. Agrawal. The BEA/XQRL streaming XQuery processor. VLDB Conference, 2003. Google ScholarDigital Library
- G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing XPath queries. VLDB Conference, 2002. Google ScholarDigital Library
- G. Gottlob, C. Koch, and R. Pichler. The complexity of XPath query evaluation. Proceedings of PODS, 2003. Google ScholarDigital Library
- T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing XML streams with deterministic automata and stream indexes. ACM Transactions on Database Systems (TODS), 29:752--788, 2004. Google ScholarDigital Library
- A. K. Gupta and D. Suciu. Stream processing of XPath queries with predicates. SIGMOD Conference, 2003. Google ScholarDigital Library
- V. Josifovski, M. Fontoura, and A. Barta. Querying XML streams. VLDB Journal, 14(2):197--210, 2005. Google ScholarDigital Library
- M. Kay. Saxon: the XSLT and XQuery processor. http://saxon.sourceforge.net/.Google Scholar
- C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier. Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. VLDB Conference, 2004. Google ScholarDigital Library
- B. Ludascher, P. Mukhopadhyay, and Y. Papakonstantinou. A transducer-based XML query processor. VLDB, 2002. Google ScholarDigital Library
- F. Peng and S. S. Chawathe. XSQ: A streaming XPath engine. http://www.cs.umd.edu/projects/xsq/.Google Scholar
- F. Peng and S. S. Chawathe. XSQ: A streaming XPath engine. ACM Transactions on Database Systems (TODS), 30:577--623, 2005. Google ScholarDigital Library
- A. Schmidt and et al. XMark: an XML benchmark project. http://monetdb.cwi.nl/xml/.Google Scholar
- L. Segoufin. Typing and querying XML documents: some complexity bounds. Proceedings of PODS, 2003. Google ScholarDigital Library
- D. Suciu. XML data repository. http://www.cs.washington.edu/research/xmldatasets/.Google Scholar
- W3C. Section 1.2.2, XML query use cases. http://www.w3.org/TR/xquery-use-cases/, 2006.Google Scholar
Index Terms
Efficient algorithms for evaluating xpath over streams
Recommendations
Filtering XPath expressions for XML access control
XPath is a standard for specifying parts of XML documents and a suitable language for both query processing and access control of XML. In this paper, we use the XPath expression for representing user queries and access control for XML. And we propose an ...
Visual Evaluation of XPath Queries
ICCIS '13: Proceedings of the 2013 International Conference on Computational and Information SciencesOver the past one decade, due to its simplicity and flexibility, Extensible Markup Language (XML) is rapidly gaining in popularity as a universal data format for data exchange and integration on the web. In this paper, we present a novel framework to ...
A Synthetic, Trend-Based Benchmark for XPath
Database Systems for Advanced ApplicationsInterest in querying XML is increasing as it becomes an important medium for data representation and exchange. A core component in most XML query languages is XPath. This paper describes a benchmark for comparing the performance of XPath query ...
Comments