skip to main content
10.1145/342009.335376acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

Turbo-charging vertical mining of large databases

Authors Info & Claims
Published:16 May 2000Publication History

ABSTRACT

In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular characteristics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called “snakes” and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and horizontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practically infeasible, horizontal mining algorithm.

References

  1. 1.R. Agrawal, T. Imielinski, and A. Swamy. Mining association rules between sets of items in large databases. In Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of 20th Intl. Conf. Very Large Databases (VLDB), September 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.B. Dunkel and N. Soparkar. Data organization and access for efficient data mining. In Proc. of 15th Intl. Conf. on Data Engineering (ICDE), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.G. Gardarin, P. Pucheral, and F. Wu. Bitmap based algorithms for mining association rules. Technical report 1998-18, University of Versailles, 1998. (http://www.prism.uvsq.fr/rapports/1998/ document_1998_18.ps.gz)Google ScholarGoogle Scholar
  5. 5.S.W. Golomb. Run-length encoding. IEEE Trans. on Information Theory, 12(3), 3uly 1966.Google ScholarGoogle Scholar
  6. 6.M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proc. of 1st Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1995.Google ScholarGoogle Scholar
  7. 7.A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. of 21st Intl. Conf. on Very Large Databases (VLDB), 199 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.P. Shenoy, 3. Haritsa, S. Sudarshan, M. Bawa, G. Bhalotia, and D. Shah. Turbo-charging vertical mining of large databases. Technical Report TR-2000-02, DSL, Indian Institute of Science, 2000. (http://dsl.serc.iisc.ernet.in/pub/TR/TR-2000-02.ps)Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.S-J. Yen and A.L.P. Chen. An efficient approach to discovering knowledge from large databases. In Proc. of 4th Intl. Conf. on Parallel and Distributed Information Systems (PDIS), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.M. 3. Zaki. Scalable Data Mining for Rules. PhD thesis, Dept. of Computer Science, University of Rochester, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. of 3rd Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1997.Google ScholarGoogle Scholar

Index Terms

  1. Turbo-charging vertical mining of large databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
        May 2000
        604 pages
        ISBN:1581132174
        DOI:10.1145/342009

        Copyright © 2000 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 May 2000

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        SIGMOD '00 Paper Acceptance Rate42of248submissions,17%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader