skip to main content
10.1145/1247480.1247512acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Efficient algorithms for evaluating xpath over streams

Published:11 June 2007Publication History

ABSTRACT

In this paper we address the problem of evaluating XPath queries over streaming XML data. We consider a practical XPath fragment called Univariate XPath, which includes the commonly used '/' and '//' axes and allows *-node tests and arbitrarily nested predicates. It is well known that this XPath fragment can be efficiently evaluated in O(|D||Q|) time in the non-streaming environment, where |D| is the document size and |Q| is the query size. However, this is not necessarily true in the streaming environment, since streaming algorithms have to satisfy stricter requirement than non-streaming algorithms, in that all data must be read sequentially in one pass. Therefore, it is not surprising that state-of-the-art stream-querying algorithms have higher time complexity than O(|D||Q|).

In this paper we revisit the XPath stream-querying problem, and show that Univariate XPath can be efficiently evaluated in O|D||Q|) time in the streaming environment. Specifically, we propose two O(|D||Q|)-time stream-querying algorithms, LQ and EQ, which are based on the lazy strategy and on the eager strategy, respectively. To the best of our knowledge, LQ and EQ are the first XPath stream-querying algorithms that achieve O(|D||Q|) time performance. Further, our algorithms achieve O(|D||Q|) time performance without trading off space performance. Instead, they have better buffering-space performance than state-of-the-art stream-querying algorithms. In particular, EQ achieves optimal buffering-space performance. Our experimental results show that our algorithms have not only good theoretical complexity but also considerable practical performance advantages over existing algorithms.

References

  1. S. Al--Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural joins: a primitive for efficient XML query pattern matching. ICDE Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apache. Apache Xerces2-Java XML Parser. http://xerces.apache.org/xerces2-j/.Google ScholarGoogle Scholar
  3. Z. Bar-Yossef, M. Fontoura, and V. Josifovski. On the memory requirements of XPath evaluation over XML streams. Proceedings of PODS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z. Bar-Yossef, M. Fontoura, and V. Josifovski. Buffering in query evaluation over XML streams. Proceedings of PODS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Barton, P. Charles, D. Goyal, M. Raghavachari, M. Fontoura, and V. Josifovski. Streaming XPath processing with forward and backward axes. ICDE Conference, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Boag, D. Chamberlin, M. F. Ferandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language. W3C, http://www.w3.org/TR/xquery/, 2003.Google ScholarGoogle Scholar
  7. D. Brownell and D. Megginson. SAX: Simple API for XML. SAX Project Organization, http://www.saxproject.org/.Google ScholarGoogle Scholar
  8. N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Y. Chan, P. Felber, M. N. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPat expressions. ICDE Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Chen, S. B. Davidson, and Y. Zheng. An efficient XPath query processor for XML streams. ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Choi. What are real DTDs like? WebDB Workshop, 2002.Google ScholarGoogle Scholar
  12. J. Clark. XML Transformations (XSLT) Version 1.0. W3C, http://www.w3.org/TR/xslt/, 1999.Google ScholarGoogle Scholar
  13. J. Clark and S. DeRose. XML Path Language (XPath) Version 1.0. W3C, http://www.w3.org/TR/xpath/, 1999.Google ScholarGoogle Scholar
  14. Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. M. Fischer. Path sharing and predicate evaluation for high-performance XML filtering. ACM Transactions on Database Systems (TODS), 28:467--516, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. L. Diaz and D. Lovell. IBM's XML generator. http://www.alphaworks.ibm.com/tech/xmlgenerator.Google ScholarGoogle Scholar
  16. M. Fernandez and et al. Galax: an implementation of XQuery. http://www.galaxquery.org/.Google ScholarGoogle Scholar
  17. D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. J. Carey, A. Sundararajan,and G. Agrawal. The BEA/XQRL streaming XQuery processor. VLDB Conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing XPath queries. VLDB Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Gottlob, C. Koch, and R. Pichler. The complexity of XPath query evaluation. Proceedings of PODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing XML streams with deterministic automata and stream indexes. ACM Transactions on Database Systems (TODS), 29:752--788, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. K. Gupta and D. Suciu. Stream processing of XPath queries with predicates. SIGMOD Conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Josifovski, M. Fontoura, and A. Barta. Querying XML streams. VLDB Journal, 14(2):197--210, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Kay. Saxon: the XSLT and XQuery processor. http://saxon.sourceforge.net/.Google ScholarGoogle Scholar
  24. C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier. Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. VLDB Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Ludascher, P. Mukhopadhyay, and Y. Papakonstantinou. A transducer-based XML query processor. VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Peng and S. S. Chawathe. XSQ: A streaming XPath engine. http://www.cs.umd.edu/projects/xsq/.Google ScholarGoogle Scholar
  27. F. Peng and S. S. Chawathe. XSQ: A streaming XPath engine. ACM Transactions on Database Systems (TODS), 30:577--623, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Schmidt and et al. XMark: an XML benchmark project. http://monetdb.cwi.nl/xml/.Google ScholarGoogle Scholar
  29. L. Segoufin. Typing and querying XML documents: some complexity bounds. Proceedings of PODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Suciu. XML data repository. http://www.cs.washington.edu/research/xmldatasets/.Google ScholarGoogle Scholar
  31. W3C. Section 1.2.2, XML query use cases. http://www.w3.org/TR/xquery-use-cases/, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Efficient algorithms for evaluating xpath over streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
        June 2007
        1210 pages
        ISBN:9781595936868
        DOI:10.1145/1247480
        • General Chairs:
        • Lizhu Zhou,
        • Tok Wang Ling,
        • Program Chair:
        • Beng Chin Ooi

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader