skip to main content
10.1145/253260.253266acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

Meaningful change detection in structured data

Authors Info & Claims
Published:01 June 1997Publication History

ABSTRACT

Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much more challenging than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional “atomic” insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. These operations allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NP-hard, in this paper we present a heuristic change detection algorithm that yields close to “minimal” descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of computing a minimum-cost edge cover of a bipartite graph. We study the quality of the solution produced by our algorithm, as well as the running time, both analytically and experimentally.

References

  1. CGM97.S. Chawathe and H. Garcia-Molina. Meaningful change detection in structured data. Available at URL http://w~ra-db, stanford, edu, 1997. Extended version.Google ScholarGoogle Scholar
  2. CGMH+94.S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The Tsimmis project: Integration of heterogeneous information sources. In Proceedings of lOOth Anniversary Meeting of the information Processing Society of Japan, pages 7-18, Tokyo, Japan, October 1994.Google ScholarGoogle Scholar
  3. CRGMW96.S. Chawathe, A. Rajaraman, H. Garcia- Mofina, and J. Widom. Change detection in hierarchically structured information. In Proceedings o} the A CM SIGMOD International Conference on Management o} Data, pages 493-504, Montreal, Quebec, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. HHS+.M. Haertel, D. Hayes, R. Stallman, L. Tower, P. Eggert., and W. Davison. The GNU diff program. Texinfo system documentation. Available by anonymous FTP from prep. ai .mit. edu.Google ScholarGoogle Scholar
  5. Law76.E. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.Google ScholarGoogle Scholar
  6. LGM96.W. Labio and H. Garcia-Molina. Efficient snapshot differential algorithms for data warehousing. In Proceedings of the International Conference on Very Large Data Bases, Bombay, India, September 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mye86.E. Myers. An O(UO) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  8. PS82.C. Papadimitriou and K. Steiglitz. Combinatorial Optimization. Prentice-Hall, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rot.E. Rothberg. The wmatch program for finding a maximum-weight matching for undirected graphs. Live OR collection. Available at URL http ://w~. orsoc, org. uk.Google ScholarGoogle Scholar
  10. SWZS94.D. Shasha, J. Wang, K. Zhang, and F. Shih. Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man, and Cybernetics, 24(4):668-678, April 1994.Google ScholarGoogle ScholarCross RefCross Ref
  11. SZ90.D. Shasha and K. Zhang. Fast algorithms for the unit cost editing distance between trees. Journal o} Algorithms, 11:581-621, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wag75.R. Wagner. On the complexity of the extended string-to-string correction problem. In Seventh A CM Symposium on the Theory o} Computation, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. WF74.R. Wagner and M. Fischer. The string-to-string correction problem. Journal o} the Association of Computing Machinery, 21(1):168-173, January 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. WMG90.S. Wu, U. Manber, and G.Myers. An O(NP) sequence comparison algorithm, ln}ormation Processing Letters, 35:317-323, September 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. WU95.J. Widom and J. Ullman. The C3 project: Changes, consistency, and configurations in heterogeneous distributed information systems. Unpublished manuscript; available at URL http://www-db, stan~ord, edu, 1995.Google ScholarGoogle Scholar
  16. ZS89.K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18(6):1245-1262, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. ZWS95.K. Zhang, J. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science, 1995.Google ScholarGoogle Scholar

Index Terms

  1. Meaningful change detection in structured data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
          June 1997
          594 pages
          ISBN:0897919114
          DOI:10.1145/253260

          Copyright © 1997 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 1997

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          SIGMOD '97 Paper Acceptance Rate42of202submissions,21%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader