Article

An incrementally maintainable index for approximate lookups in hierarchical data

Authors:
Nikolaus Augsten

Free University of Bozen-Bolzano, Bozen, Italy

Free University of Bozen-Bolzano, Bozen, Italy
View Profile

,
Michael Böhlen

Free University of Bozen-Bolzano, Bozen, Italy

Free University of Bozen-Bolzano, Bozen, Italy
View Profile

,
Johann Gamper

Free University of Bozen-Bolzano, Bozen, Italy

Free University of Bozen-Bolzano, Bozen, Italy
View Profile

Authors Info & Claims

VLDB '06: Proceedings of the 32nd international conference on Very large data basesSeptember 2006Pages 247–258

Published:01 September 2006Publication History

VLDB '06: Proceedings of the 32nd international conference on Very large data bases

Pages 247–258

ABSTRACT

Several recent papers argue for approximate lookups in hierarchical data and propose index structures that support approximate searches in large sets of hierarchical data. These index structures must be updated if the underlying data changes. Since the performance of a full index reconstruction is prohibitive, the index must be updated incrementally.We propose a persistent and incrementally maintainable index for approximate lookups in hierarchical data. The index is based on small tree patterns, called pq-grams. It supports efficient updates in response to structure and value changes in hierarchical data and is based on the log of tree edit operations. We prove the correctness of the incremental maintenance for sequences of edit operations. Our algorithms identify a small set of pq-grams that must be updated to maintain the index. The experimental results with synthetic and real data confirm the scalability of our approach.

References

{1} S. Al-Khalifa, H. V. Jagadish, J. M. Patel, Y. Wu, N. Koudas, and D. Srivastava. Structural joins: A primitive for efficient XML query pattern matching. In Proc. of ICDE, pages 141-152. IEEE Computer Society, 2002. Google ScholarDigital Library
{2} N. Augsten, M. Böhlen, and J. Gamper. Approximate matching of hierarchical data using pq-grams. In Proc. of VLDB, pages 301-312. Morgan Kaufmann Publishers Inc., 2005. Google ScholarDigital Library
{3} N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: Optimal XML pattern matching. In Proc. of SIGMOD, pages 310-321. ACM Press, 2002. Google ScholarDigital Library
{4} G. Cobéna, S. Abiteboul, and A. Marian. Detecting changes in XML documents. In Proc. of ICDE, pages 41-52. IEEE Computer Society, 2002. Google ScholarDigital Library
{5} B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proc. of VLDB, pages 341-350. Morgan Kaufmann Publishers Inc., 2001. Google ScholarDigital Library
{6} M. Garofalakis and A. Kumar. XML stream processing using tree-edit distance embeddings. ACM Trans. on Database Systems, 30(1):279-332, 2005. Google ScholarDigital Library
{7} S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Approximate XML joins. In Proc. of SIGMOD, pages 287-298. ACM Press, 2002. Google ScholarDigital Library
{8} S. Guha, N. Koudas, D. Srivastava, and T. Yu. Index-based approximate XML joins. In Proc. of ICDE, pages 708-710. IEEE Computer Society, 2003.Google ScholarCross Ref
{9} H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-tree: Indexing XML data for efficient structural joins. In Proc. of ICDE, pages 253-263. IEEE Computer Society, 2003.Google Scholar
{10} R. M. Karp and M. O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. Google ScholarDigital Library
{11} R. Kaushik, P. Bohannon, J. F. Naughton, and P. Shenoy. Updates for structure indexes. In Proc. of VLDB, pages 239-250. Morgan Kaufmann Publishers Inc., 2002. Google ScholarDigital Library
{12} K.-H. Lee, Y.-C. Choy, and S.-B. Cho. An efficient algorithm to compute differences between structured documents. IEEE Transactions on Knowledge and Data Engineering (TKDE), 16(8):965-979, 2004. Google ScholarDigital Library
{13} Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In Proc. of VLDB, pages 361-370. Morgan Kaufmann Publishers Inc., 2001. Google ScholarDigital Library
{14} N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Approximate XML query answers. In Proc. of SIGMOD, pages 263-274. ACM Press, 2004. Google ScholarDigital Library
{15} C. Qun, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. of SIGMOD, pages 134-144. ACM Press, 2003. Google ScholarDigital Library
{16} R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the HOPI index for complex XML document collections. In ICDE, pages 360-371. IEEE Computer Society, 2005. Google ScholarDigital Library
{17} E. Ukkonen. Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science, 92(1):191-211, 1992. Google ScholarDigital Library
{18} M. Weis and F. Naumann. DogmatiX tracks down duplicates in XML. In Proc. of SIGMOD, pages 431-442. ACM Press, 2005. Google ScholarDigital Library
{19} R. Yang, P. Kalnis, and A. K. H. Tung. Similarity evaluation on tree-structured data. In Proc. of SIGMOD, pages 754-765. ACM Press, 2005. Google ScholarDigital Library
{20} K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6):1245-1262, 1989. Google ScholarDigital Library

Index Terms

An incrementally maintainable index for approximate lookups in hierarchical data

Recommendations

Szeged index, edge Szeged index, and semi-star trees

A semi-star tree is a star tree whose some edges may be replaced by paths of length more than one. In this paper we present some increasing and decreasing transformations for Szeged index of the semi-star trees. Then we introduce palm semi-star tree and ...
Read More
An Efficient Structural Index for Graph-Structured Data
ICIS '08: Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)

To speed up queries over XML and semi-structured data, a number of structural indexes have been proposed. The structural index is usually a labeled directed graph defined by partitioning nodes in the XML datagraph in to equivalence classes and storing ...
Read More
Comparison between the zeroth-order Randić index and the sum-connectivity index

The zeroth-order Randić index and the sum-connectivity index are very popular topological indices in mathematical chemistry. These two indices are based on vertex degrees of graphs and attracted a lot of attention in recent years. Recently Li and Li (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
VLDB '06: Proceedings of the 32nd international conference on Very large data bases
September 2006
1269 pages
Editors:
Umeshwar Dayal,
Khu-Yong Whang,
David Lomet,
Gustavo Alonso,
Guy Lohman,
Martin Kersten,
Sang K. Cha,
Young-Kuk Kim
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2006
Check for updates
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 324
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An incrementally maintainable index for approximate lookups in hierarchical data

VLDB '06: Proceedings of the 32nd international conference on Very large data bases

ABSTRACT

References

Cited By

Index Terms

Recommendations

Szeged index, edge Szeged index, and semi-star trees

An Efficient Structural Index for Graph-Structured Data

Comparison between the zeroth-order Randić index and the sum-connectivity index