Turbo-charging vertical mining of large databases

Authors:
Pradeep Shenoy

Lucent Bell Labs, 600 Mountain Avenue, Murray Hill, NJ, Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA

Lucent Bell Labs, 600 Mountain Avenue, Murray Hill, NJ, Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
View Profile

,
Jayant R. Haritsa

Database Systems Lab, SERC, Indian Institue of science, Bangalore 560012,INDIA, Lucent Bell Labs, 600 Mountain Avenue, Murray Hill, NJ

Database Systems Lab, SERC, Indian Institue of science, Bangalore 560012,INDIA, Lucent Bell Labs, 600 Mountain Avenue, Murray Hill, NJ
View Profile

,
S. Sudarshan

Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA

Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
View Profile

,
Gaurav Bhalotia

Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA

Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
View Profile

,
Mayank Bawa

Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA

Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
View Profile

,
Devavrat Shah

Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA

Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
View Profile

SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of dataMay 2000Pages 22–33https://doi.org/10.1145/342009.335376

Published:16 May 2000Publication History

SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data

Pages 22–33

ABSTRACT

In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular characteristics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called “snakes” and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and horizontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practically infeasible, horizontal mining algorithm.

References

1.R. Agrawal, T. Imielinski, and A. Swamy. Mining association rules between sets of items in large databases. In Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 1993. Google ScholarDigital Library
2.R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of 20th Intl. Conf. Very Large Databases (VLDB), September 1994. Google ScholarDigital Library
3.B. Dunkel and N. Soparkar. Data organization and access for efficient data mining. In Proc. of 15th Intl. Conf. on Data Engineering (ICDE), 1999. Google ScholarDigital Library
4.G. Gardarin, P. Pucheral, and F. Wu. Bitmap based algorithms for mining association rules. Technical report 1998-18, University of Versailles, 1998. (http://www.prism.uvsq.fr/rapports/1998/ document_1998_18.ps.gz)Google Scholar
5.S.W. Golomb. Run-length encoding. IEEE Trans. on Information Theory, 12(3), 3uly 1966.Google Scholar
6.M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proc. of 1st Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1995.Google Scholar
7.A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. of 21st Intl. Conf. on Very Large Databases (VLDB), 199 5. Google ScholarDigital Library
8.P. Shenoy, 3. Haritsa, S. Sudarshan, M. Bawa, G. Bhalotia, and D. Shah. Turbo-charging vertical mining of large databases. Technical Report TR-2000-02, DSL, Indian Institute of Science, 2000. (http://dsl.serc.iisc.ernet.in/pub/TR/TR-2000-02.ps)Google ScholarDigital Library
9.S-J. Yen and A.L.P. Chen. An efficient approach to discovering knowledge from large databases. In Proc. of 4th Intl. Conf. on Parallel and Distributed Information Systems (PDIS), 1996. Google ScholarDigital Library
10.M. 3. Zaki. Scalable Data Mining for Rules. PhD thesis, Dept. of Computer Science, University of Rochester, July 1998. Google ScholarDigital Library
11.M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. of 3rd Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1997.Google Scholar

Index Terms

Turbo-charging vertical mining of large databases
1. Information systems
  1. Data management systems
    1. Database management system engines
  2. Information systems applications
    1. Data mining

Recommendations

Turbo-charging vertical mining of large databases

In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this ...
Read More
Efficiently mining frequent itemsets from very large databases
Read More
Mining frequent itemsets in large databases: The hierarchical partitioning approach

Although many methods have been proposed to enhance the efficiencies of data mining, little research has been devoted to the issue of scalability - that is, the problem of mining frequent itemsets when the size of the database is very large. This study ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
May 2000
604 pages
ISBN:1581132174
DOI:10.1145/342009
Chairmen:
Maggie Dunham
Southern Methodist Univ.
,
Jeffrey F. Naughton
Univ. of Wisconsin-Madison
,
Weidong Chen
Southern Methodist Univ.
,
Nick Koudas
AT &T Labs
ACM SIGMOD Record Volume 29, Issue 2
June 2000
609 pages
ISSN:0163-5808
DOI:10.1145/335191
Editors:
Weidong Chen
Southern Methodist Univ., Dallas, TX
,
Jeffrey Naughton
Univ. of Wisconsin-Madison, Madison
,
Philip A. Bernstein
Microsoft
Issue’s Table of Contents
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 May 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SIGMOD '00 Paper Acceptance Rate42of248submissions,17%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 194
  Total Citations
  View Citations
- 175
  Total Downloads
- Downloads (Last 12 months)69
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Turbo-charging vertical mining of large databases

SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Turbo-charging vertical mining of large databases

Efficiently mining frequent itemsets from very large databases

Mining frequent itemsets in large databases: The hierarchical partitioning approach