ABSTRACT
Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much more challenging than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional “atomic” insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. These operations allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NP-hard, in this paper we present a heuristic change detection algorithm that yields close to “minimal” descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of computing a minimum-cost edge cover of a bipartite graph. We study the quality of the solution produced by our algorithm, as well as the running time, both analytically and experimentally.
- CGM97.S. Chawathe and H. Garcia-Molina. Meaningful change detection in structured data. Available at URL http://w~ra-db, stanford, edu, 1997. Extended version.Google Scholar
- CGMH+94.S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The Tsimmis project: Integration of heterogeneous information sources. In Proceedings of lOOth Anniversary Meeting of the information Processing Society of Japan, pages 7-18, Tokyo, Japan, October 1994.Google Scholar
- CRGMW96.S. Chawathe, A. Rajaraman, H. Garcia- Mofina, and J. Widom. Change detection in hierarchically structured information. In Proceedings o} the A CM SIGMOD International Conference on Management o} Data, pages 493-504, Montreal, Quebec, June 1996. Google ScholarDigital Library
- HHS+.M. Haertel, D. Hayes, R. Stallman, L. Tower, P. Eggert., and W. Davison. The GNU diff program. Texinfo system documentation. Available by anonymous FTP from prep. ai .mit. edu.Google Scholar
- Law76.E. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.Google Scholar
- LGM96.W. Labio and H. Garcia-Molina. Efficient snapshot differential algorithms for data warehousing. In Proceedings of the International Conference on Very Large Data Bases, Bombay, India, September 1996. Google ScholarDigital Library
- Mye86.E. Myers. An O(UO) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986.Google ScholarCross Ref
- PS82.C. Papadimitriou and K. Steiglitz. Combinatorial Optimization. Prentice-Hall, 1982. Google ScholarDigital Library
- Rot.E. Rothberg. The wmatch program for finding a maximum-weight matching for undirected graphs. Live OR collection. Available at URL http ://w~. orsoc, org. uk.Google Scholar
- SWZS94.D. Shasha, J. Wang, K. Zhang, and F. Shih. Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man, and Cybernetics, 24(4):668-678, April 1994.Google ScholarCross Ref
- SZ90.D. Shasha and K. Zhang. Fast algorithms for the unit cost editing distance between trees. Journal o} Algorithms, 11:581-621, 1990. Google ScholarDigital Library
- Wag75.R. Wagner. On the complexity of the extended string-to-string correction problem. In Seventh A CM Symposium on the Theory o} Computation, 1975. Google ScholarDigital Library
- WF74.R. Wagner and M. Fischer. The string-to-string correction problem. Journal o} the Association of Computing Machinery, 21(1):168-173, January 1974. Google ScholarDigital Library
- WMG90.S. Wu, U. Manber, and G.Myers. An O(NP) sequence comparison algorithm, ln}ormation Processing Letters, 35:317-323, September 1990. Google ScholarDigital Library
- WU95.J. Widom and J. Ullman. The C3 project: Changes, consistency, and configurations in heterogeneous distributed information systems. Unpublished manuscript; available at URL http://www-db, stan~ord, edu, 1995.Google Scholar
- ZS89.K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18(6):1245-1262, 1989. Google ScholarDigital Library
- ZWS95.K. Zhang, J. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science, 1995.Google Scholar
Index Terms
- Meaningful change detection in structured data
Recommendations
Meaningful change detection in structured data
Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such ...
Efficient change detection in tree-structured data
HSI'03: Proceedings of the 2nd international conference on Human.society@internetWe present X-tree Diff, a change detection algorithm for tree-structured data such as XML/HTML documents. X-tree Diff uses a specially designed data structure, called X-tree. Nodes of X-tree have a special hash-valued field representing the structure ...
A change detection system for unordered XML data using a relational model
The dramatic increase in the evolution of XML data available on the Internet requires a change detection system to keep track of important changes occurring during their life time. In this paper, we introduce a novel approach of detecting changes ...
Comments