Article

Carpenter: finding closed patterns in long biological datasets

Authors:
Feng Pan

Natl. University of Singapore

Natl. University of Singapore
View Profile

,
Gao Cong

Natl. University of Singapore

Natl. University of Singapore
View Profile

,
Anthony K. H. Tung

Natl. University of Singapore

Natl. University of Singapore
View Profile

,
Jiong Yang

University of Illinois, Urbana, Champaign

University of Illinois, Urbana, Champaign
View Profile

,
Mohammed J. Zaki

Rensselaer Polytechnic Institute

Rensselaer Polytechnic Institute
View Profile

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2003Pages 637–642https://doi.org/10.1145/956750.956832

Published:24 August 2003Publication History

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 637–642

ABSTRACT

The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain 10,000-100,000 columns but only 100-1000 rows.Such datasets pose a great challenge for existing (closed) frequent pattern discovery algorithms, since they have an exponential dependence on the average row length. In this paper, we describe a new algorithm called CARPENTER that is specially designed to handle datasets having a large number of attributes and relatively small number of rows. Several experiments on real bioinformatics datasets show that CARPENTER is orders of magnitude better than previous closed pattern mining algorithms like CLOSET and CHARM.

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 487--499, Santiago, Chile, Sept. 1994.]] Google ScholarDigital Library
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent closed itemsets with counting inference. In SIGKDD Explorations, 2(2), Dec. 2000.]] Google ScholarDigital Library
K. Beyer and R. Ramakrishnan. Botton-up computation of sparse and iceberg cubes. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), pages 359--370, Philadelphia, PA, June 1999.]] Google ScholarDigital Library
J. Han, J. Pei, and Y. Yin. Mining partial periodicity using frequent pattern trees. In Computing Science Techniqcal Report TR-99-10, Simon Fraser University, July 1999.]]Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In Proc. AAAI'94 Workshop Knowledge Discovery in Databases (KDD'94), pages 181--192, Seattle, WA, July 1994.]]Google Scholar
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. 7th Int. Conf. Database Theory (ICDT'99), pages 398--416, Jerusalem, Israel, Jan. 1999.]] Google ScholarDigital Library
J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. 2000 ACM-SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DMKD'00), pages 11--20, Dallas, TX, May 2000.]]Google Scholar
P. Shenoy, J. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah. Turbo-charging vertical mining of large databases. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00), pages 22--23, Dallas, TX, May 2000.]] Google ScholarDigital Library
J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D. C., Aug 2003.]] Google ScholarDigital Library
M. Zaki and C. Hsiao. Charm: An efficient algorithm for closed association rule mining. In Proc. of SDM 2002, 2002.]]Google Scholar
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD'97), pages 283--286, Newport Beach, CA, Aug. 1997.]]Google Scholar

Index Terms

Carpenter: finding closed patterns in long biological datasets
1. Information systems
  1. Information systems applications

Recommendations

TSP: Mining top-k closed sequential patterns

Sequential pattern mining has been studied extensively in the data mining community. Most previous studies require the specification of a min_support threshold for mining a complete set of sequential patterns satisfying the threshold. However, in ...
Read More
TSP: Mining top-k closed sequential patterns

Sequential pattern mining has been studied extensively in the data mining community. Most previous studies require the specification of a min_support threshold for mining a complete set of sequential patterns satisfying the threshold. However, in ...
Read More
Mining frequent closed patterns in pointset databases

In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Conference Chair:
Lise Getoor
University of Maryland, College Park
,
General Chair:
Ted Senator
DARPA
,
Program Chairs:
Pedro Domingos
University of Washington
,
Christos Faloutsos
Carnegie Mellon University
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
closed pattern
frequent pattern
row enumeration
Qualifiers
- Article
Conference

Acceptance Rates
KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 127
  Total Citations
  View Citations
- 912
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.