skip to main content
10.1145/775047.775058acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Efficiently mining frequent trees in a forest

Published:23 July 2002Publication History

ABSTRACT

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER). We conduct detailed experiments to test the performance and scalability of these methods. We find that TREEMINER outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.

References

  1. S. Abiteboul, H. Kaplan, and T. Milo. Compact labeling schemes for ancestor queries. In ACM Symp. on Discrete Algorithms, January 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abiteboul and V. Vianu. Regular path expressions with constraints. In ACM Int'l Conf. on Principles of Database Systems, May 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal et al. Fast discovery of association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data Mining, pages 307--328. AAAI Press, Menlo Park, CA, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal and R. Srikant. Mining sequential patterns. In 11th Intl. Conf. on Data Engg., 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In 2nd SIAM Int'l Conference on Data Mining, April 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M.S. Chen, J.S. Park, and P.S. Yu. Data mining for path traversal patterns in a web environment. In International Conference on Distributed Computing Systems, 1996.]]Google ScholarGoogle Scholar
  7. Z. Chen et al. Counting twig matches in a tree. In 17th Intl. Conf. on Data Engineering, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Cole, R. Hariharan, and P. Indyk. Tree pattern matching and subset matching in deterministic o(n log3n)-time. In l0th Symposium on Discrete Algorithms, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Cook and L. Holder. Substructure discovery using minimal description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In 4th Intl. Conf. Knowledge Discovery and Data Mining, August 1998.]]Google ScholarGoogle Scholar
  11. M. Fernandez and D. Suciu. Optimizing regular path expressions using graph schemas. In IEEE Int'l Conf. on Data Engineering, February 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In 4th European Conference on Principles of Knowledge Discovery and Data Mining, September 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Kilpelainen and H. Mannila. Ordered and unordered tree inclusion. SIAM J. of Computing, 24(2):340--356, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In 1st IEEE Int'l Conf. on Data Mining, November 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In 27th Int'l Conf. on Very Large Data Bases, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Punin, M. Krishnamoorthy, and M. Zaki. LOGML: Log markup language for web usage mining. In ACM SIGKDD Workshop on Mining Log Data Across All Customer TouchPoints, August 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Cooley, B. Mobasher, and J. Srivastava. Web Mining: Information and Pattern Discovery on the World Wide Web. In 8th IEEE Intl. Conf. on Tools with AI, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Shamir and D. Tsur. Faster subtree isomorphism. Journal of Algorithms, 33:267--280, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Shapiro and K. Zhang. Comparing multiple rna secondary strutures using tree comparisons. Computer Applications in Biosciences, 6(4):309--318, 1990.]]Google ScholarGoogle Scholar
  20. K. Wang and H. Liu. Discovering typical structures of documents: A road map approach. In ACM SIGIR Conference on Information Retrieval, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Yoshida and H. Motoda. CLIP: Concept learning from inference patterns. Artificial Intelligence, 75(1):63--92, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. J. Zaki. Efficiently mining trees in a forest. Tech. Report 01--7, CS Dept., RPI, July 2001.]]Google ScholarGoogle Scholar
  23. C. Zhang et al. On supporting containment queries in relational database managment systems. In ACM Int'l Conf. on Management of Data, May 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficiently mining frequent trees in a forest

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
            July 2002
            719 pages
            ISBN:158113567X
            DOI:10.1145/775047

            Copyright © 2002 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 July 2002

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader