skip to main content
10.5555/1182635.1164150acmconferencesArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

An incrementally maintainable index for approximate lookups in hierarchical data

Published:01 September 2006Publication History

ABSTRACT

Several recent papers argue for approximate lookups in hierarchical data and propose index structures that support approximate searches in large sets of hierarchical data. These index structures must be updated if the underlying data changes. Since the performance of a full index reconstruction is prohibitive, the index must be updated incrementally.We propose a persistent and incrementally maintainable index for approximate lookups in hierarchical data. The index is based on small tree patterns, called pq-grams. It supports efficient updates in response to structure and value changes in hierarchical data and is based on the log of tree edit operations. We prove the correctness of the incremental maintenance for sequences of edit operations. Our algorithms identify a small set of pq-grams that must be updated to maintain the index. The experimental results with synthetic and real data confirm the scalability of our approach.

References

  1. {1} S. Al-Khalifa, H. V. Jagadish, J. M. Patel, Y. Wu, N. Koudas, and D. Srivastava. Structural joins: A primitive for efficient XML query pattern matching. In Proc. of ICDE, pages 141-152. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {2} N. Augsten, M. Böhlen, and J. Gamper. Approximate matching of hierarchical data using pq-grams. In Proc. of VLDB, pages 301-312. Morgan Kaufmann Publishers Inc., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {3} N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: Optimal XML pattern matching. In Proc. of SIGMOD, pages 310-321. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {4} G. Cobéna, S. Abiteboul, and A. Marian. Detecting changes in XML documents. In Proc. of ICDE, pages 41-52. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proc. of VLDB, pages 341-350. Morgan Kaufmann Publishers Inc., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} M. Garofalakis and A. Kumar. XML stream processing using tree-edit distance embeddings. ACM Trans. on Database Systems, 30(1):279-332, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Approximate XML joins. In Proc. of SIGMOD, pages 287-298. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} S. Guha, N. Koudas, D. Srivastava, and T. Yu. Index-based approximate XML joins. In Proc. of ICDE, pages 708-710. IEEE Computer Society, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  9. {9} H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-tree: Indexing XML data for efficient structural joins. In Proc. of ICDE, pages 253-263. IEEE Computer Society, 2003.Google ScholarGoogle Scholar
  10. {10} R. M. Karp and M. O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} R. Kaushik, P. Bohannon, J. F. Naughton, and P. Shenoy. Updates for structure indexes. In Proc. of VLDB, pages 239-250. Morgan Kaufmann Publishers Inc., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} K.-H. Lee, Y.-C. Choy, and S.-B. Cho. An efficient algorithm to compute differences between structured documents. IEEE Transactions on Knowledge and Data Engineering (TKDE), 16(8):965-979, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In Proc. of VLDB, pages 361-370. Morgan Kaufmann Publishers Inc., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {14} N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Approximate XML query answers. In Proc. of SIGMOD, pages 263-274. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {15} C. Qun, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. of SIGMOD, pages 134-144. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {16} R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the HOPI index for complex XML document collections. In ICDE, pages 360-371. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {17} E. Ukkonen. Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science, 92(1):191-211, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {18} M. Weis and F. Naumann. DogmatiX tracks down duplicates in XML. In Proc. of SIGMOD, pages 431-442. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {19} R. Yang, P. Kalnis, and A. K. H. Tung. Similarity evaluation on tree-structured data. In Proc. of SIGMOD, pages 754-765. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6):1245-1262, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An incrementally maintainable index for approximate lookups in hierarchical data

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader