Article

Efficiently mining frequent trees in a forest

Author:
Mohammed J. Zaki

Rensselaer Polytechnic Institute, Troy NY

Rensselaer Polytechnic Institute, Troy NY
View Profile

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2002Pages 71–80https://doi.org/10.1145/775047.775058

Published:23 July 2002Publication History

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 71–80

ABSTRACT

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER). We conduct detailed experiments to test the performance and scalability of these methods. We find that TREEMINER outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.

References

S. Abiteboul, H. Kaplan, and T. Milo. Compact labeling schemes for ancestor queries. In ACM Symp. on Discrete Algorithms, January 2001.]] Google ScholarDigital Library
S. Abiteboul and V. Vianu. Regular path expressions with constraints. In ACM Int'l Conf. on Principles of Database Systems, May 1997.]] Google ScholarDigital Library
R. Agrawal et al. Fast discovery of association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data Mining, pages 307--328. AAAI Press, Menlo Park, CA, 1996.]] Google ScholarDigital Library
R. Agrawal and R. Srikant. Mining sequential patterns. In 11th Intl. Conf. on Data Engg., 1995.]] Google ScholarDigital Library
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In 2nd SIAM Int'l Conference on Data Mining, April 2002.]] Google ScholarDigital Library
M.S. Chen, J.S. Park, and P.S. Yu. Data mining for path traversal patterns in a web environment. In International Conference on Distributed Computing Systems, 1996.]]Google Scholar
Z. Chen et al. Counting twig matches in a tree. In 17th Intl. Conf. on Data Engineering, 2001.]] Google ScholarDigital Library
R. Cole, R. Hariharan, and P. Indyk. Tree pattern matching and subset matching in deterministic o(n log3n)-time. In l0th Symposium on Discrete Algorithms, 1999.]] Google ScholarDigital Library
D. Cook and L. Holder. Substructure discovery using minimal description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994.]]Google ScholarCross Ref
L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In 4th Intl. Conf. Knowledge Discovery and Data Mining, August 1998.]]Google Scholar
M. Fernandez and D. Suciu. Optimizing regular path expressions using graph schemas. In IEEE Int'l Conf. on Data Engineering, February 1998.]] Google ScholarDigital Library
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In 4th European Conference on Principles of Knowledge Discovery and Data Mining, September 2000.]] Google ScholarDigital Library
P. Kilpelainen and H. Mannila. Ordered and unordered tree inclusion. SIAM J. of Computing, 24(2):340--356, 1995.]] Google ScholarDigital Library
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In 1st IEEE Int'l Conf. on Data Mining, November 2001.]] Google ScholarDigital Library
Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In 27th Int'l Conf. on Very Large Data Bases, 2001.]] Google ScholarDigital Library
J. Punin, M. Krishnamoorthy, and M. Zaki. LOGML: Log markup language for web usage mining. In ACM SIGKDD Workshop on Mining Log Data Across All Customer TouchPoints, August 2001.]] Google ScholarDigital Library
R. Cooley, B. Mobasher, and J. Srivastava. Web Mining: Information and Pattern Discovery on the World Wide Web. In 8th IEEE Intl. Conf. on Tools with AI, 1997.]] Google ScholarDigital Library
R. Shamir and D. Tsur. Faster subtree isomorphism. Journal of Algorithms, 33:267--280, 1999.]] Google ScholarDigital Library
B. Shapiro and K. Zhang. Comparing multiple rna secondary strutures using tree comparisons. Computer Applications in Biosciences, 6(4):309--318, 1990.]]Google Scholar
K. Wang and H. Liu. Discovering typical structures of documents: A road map approach. In ACM SIGIR Conference on Information Retrieval, 1998.]] Google ScholarDigital Library
K. Yoshida and H. Motoda. CLIP: Concept learning from inference patterns. Artificial Intelligence, 75(1):63--92, 1995.]] Google ScholarDigital Library
M. J. Zaki. Efficiently mining trees in a forest. Tech. Report 01--7, CS Dept., RPI, July 2001.]]Google Scholar
C. Zhang et al. On supporting containment queries in relational database managment systems. In ACM Int'l Conf. on Management of Data, May 2001.]] Google ScholarDigital Library

Index Terms

Efficiently mining frequent trees in a forest

Recommendations

Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

Mining frequent trees is very useful in domains like bioinformatics, Web mining, mining semistructured data, etc. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel ...
Read More
Efficiently Mining Frequent Embedded Unordered Trees
Advances in Mining Graphs, Trees and Sequences

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. In this paper we introduce SLEUTH, an efficient algorithm for mining frequent, unordered, embedded subtrees in a database of labeled ...
Read More
Efficiently Mining Frequent Embedded Unordered Trees
Advances in Mining Graphs, Trees and Sequences

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. In this paper we introduce SLEUTH, an efficient algorithm for mining frequent, unordered, embedded subtrees in a database of labeled ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
July 2002
719 pages
ISBN:158113567X
DOI:10.1145/775047
Conference Chair:
Osmar R. Zaïane
University of Alberta, Canada
,
General Chair:
Randy Goebel
University of Alberta, Canada
,
Program Chairs:
David Hand
Imperial College, UK
,
Daniel Keim
AT&T
,
Raymond Ng
University of British Columbia, Canada
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 271
  Total Citations
  View Citations
- 1,258
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.