ABSTRACT
Several recent papers argue for approximate lookups in hierarchical data and propose index structures that support approximate searches in large sets of hierarchical data. These index structures must be updated if the underlying data changes. Since the performance of a full index reconstruction is prohibitive, the index must be updated incrementally.We propose a persistent and incrementally maintainable index for approximate lookups in hierarchical data. The index is based on small tree patterns, called pq-grams. It supports efficient updates in response to structure and value changes in hierarchical data and is based on the log of tree edit operations. We prove the correctness of the incremental maintenance for sequences of edit operations. Our algorithms identify a small set of pq-grams that must be updated to maintain the index. The experimental results with synthetic and real data confirm the scalability of our approach.
- {1} S. Al-Khalifa, H. V. Jagadish, J. M. Patel, Y. Wu, N. Koudas, and D. Srivastava. Structural joins: A primitive for efficient XML query pattern matching. In Proc. of ICDE, pages 141-152. IEEE Computer Society, 2002. Google ScholarDigital Library
- {2} N. Augsten, M. Böhlen, and J. Gamper. Approximate matching of hierarchical data using pq-grams. In Proc. of VLDB, pages 301-312. Morgan Kaufmann Publishers Inc., 2005. Google ScholarDigital Library
- {3} N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: Optimal XML pattern matching. In Proc. of SIGMOD, pages 310-321. ACM Press, 2002. Google ScholarDigital Library
- {4} G. Cobéna, S. Abiteboul, and A. Marian. Detecting changes in XML documents. In Proc. of ICDE, pages 41-52. IEEE Computer Society, 2002. Google ScholarDigital Library
- {5} B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proc. of VLDB, pages 341-350. Morgan Kaufmann Publishers Inc., 2001. Google ScholarDigital Library
- {6} M. Garofalakis and A. Kumar. XML stream processing using tree-edit distance embeddings. ACM Trans. on Database Systems, 30(1):279-332, 2005. Google ScholarDigital Library
- {7} S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Approximate XML joins. In Proc. of SIGMOD, pages 287-298. ACM Press, 2002. Google ScholarDigital Library
- {8} S. Guha, N. Koudas, D. Srivastava, and T. Yu. Index-based approximate XML joins. In Proc. of ICDE, pages 708-710. IEEE Computer Society, 2003.Google ScholarCross Ref
- {9} H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-tree: Indexing XML data for efficient structural joins. In Proc. of ICDE, pages 253-263. IEEE Computer Society, 2003.Google Scholar
- {10} R. M. Karp and M. O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. Google ScholarDigital Library
- {11} R. Kaushik, P. Bohannon, J. F. Naughton, and P. Shenoy. Updates for structure indexes. In Proc. of VLDB, pages 239-250. Morgan Kaufmann Publishers Inc., 2002. Google ScholarDigital Library
- {12} K.-H. Lee, Y.-C. Choy, and S.-B. Cho. An efficient algorithm to compute differences between structured documents. IEEE Transactions on Knowledge and Data Engineering (TKDE), 16(8):965-979, 2004. Google ScholarDigital Library
- {13} Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In Proc. of VLDB, pages 361-370. Morgan Kaufmann Publishers Inc., 2001. Google ScholarDigital Library
- {14} N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Approximate XML query answers. In Proc. of SIGMOD, pages 263-274. ACM Press, 2004. Google ScholarDigital Library
- {15} C. Qun, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. of SIGMOD, pages 134-144. ACM Press, 2003. Google ScholarDigital Library
- {16} R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the HOPI index for complex XML document collections. In ICDE, pages 360-371. IEEE Computer Society, 2005. Google ScholarDigital Library
- {17} E. Ukkonen. Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science, 92(1):191-211, 1992. Google ScholarDigital Library
- {18} M. Weis and F. Naumann. DogmatiX tracks down duplicates in XML. In Proc. of SIGMOD, pages 431-442. ACM Press, 2005. Google ScholarDigital Library
- {19} R. Yang, P. Kalnis, and A. K. H. Tung. Similarity evaluation on tree-structured data. In Proc. of SIGMOD, pages 754-765. ACM Press, 2005. Google ScholarDigital Library
- {20} K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6):1245-1262, 1989. Google ScholarDigital Library
Index Terms
- An incrementally maintainable index for approximate lookups in hierarchical data
Recommendations
Szeged index, edge Szeged index, and semi-star trees
A semi-star tree is a star tree whose some edges may be replaced by paths of length more than one. In this paper we present some increasing and decreasing transformations for Szeged index of the semi-star trees. Then we introduce palm semi-star tree and ...
An Efficient Structural Index for Graph-Structured Data
ICIS '08: Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)To speed up queries over XML and semi-structured data, a number of structural indexes have been proposed. The structural index is usually a labeled directed graph defined by partitioning nodes in the XML datagraph in to equivalence classes and storing ...
Comparison between the zeroth-order Randić index and the sum-connectivity index
The zeroth-order Randić index and the sum-connectivity index are very popular topological indices in mathematical chemistry. These two indices are based on vertex degrees of graphs and attracted a lot of attention in recent years. Recently Li and Li (...
Comments