skip to main content
research-article
Open Access

Efficient Discovery of the Most Interesting Associations

Published:01 June 2013Publication History
Skip Abstract Section

Abstract

Self-sufficient itemsets have been proposed as an effective approach to summarizing the key associations in data. However, their computation appears highly demanding, as assessing whether an itemset is self-sufficient requires consideration of all pairwise partitions of the itemset into pairs of subsets as well as consideration of all supersets. This article presents the first published algorithm for efficiently discovering self-sufficient itemsets. This branch-and-bound algorithm deploys two powerful pruning mechanisms based on upper bounds on itemset value and statistical significance level. It demonstrates that finding top-k productive and nonredundant itemsets, with postprocessing to identify those that are not independently productive, can efficiently identify small sets of key associations. We present extensive evaluation of the strengths and limitations of the technique, including comparisons with alternative approaches to finding the most interesting associations.

References

  1. R. Agrawal, T. Imielinski, and A. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data. 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. 2000. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (CL’00). Springer-Verlag, Berlin, 972--986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. J. Bayardo, Jr., R. Agrawal, and D. Gunopulos. 2000. Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery 4, 2--3, 217--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Brijs, G. Swinnen, K. Vanhoof, and G.Wets. 1999. Using association rules for product assortment decisions: A case study. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 254--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Calders and B. Goethals. 2002. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKKD’02). Springer, Berlin, 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Calders and B. Goethals. 2007. Non-derivable itemset mining. Data Mining and Knowledge Discovery 14, 1, 171--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. De Bie. 2011. Maximum entropy models and subjective interestingness: An application to tiles in binary databases. Data Mining and Knowledge Discovery 23, 3, 407--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. W. C. Fu, W. K. Renfrew, and J. Tang. 2000. Mining N-most interesting itemsets. In Proceedings of the 12th International Symposium on Foundations of Intelligent Systems. 59--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Gallo, T. De Bie, and N. Cristianini. 2007. MINI: Mining informative non-redundant itemsets. In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’07). Lecture Notes in Computer Science, Joost Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, Dunja Mladenic, and Andrzej Skowron (Eds.), Vol. 4702. Springer, Berlin/Heidelberg, 438--445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Geng and H. J. Hamilton. 2006. Interestingness measures for data mining: A survey. Computing Surveys 38, 3, 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Geurts, G. Wets, T. Brijs, and K. Vanhoof. 2003. Profiling high frequency accident locations using association rules. In Proceedings of the 82nd Annual Transportation Research Board.Google ScholarGoogle Scholar
  12. B. Goethals. 2012. Frequent Itemset Mining Implementations Repository. Retrieved April 26, 2014, from http://fimi.ua.ac.be/.Google ScholarGoogle Scholar
  13. W. Hämäläinen. 2010. Efficient Search for Statistically Significant Dependency Rules in Binary Data. Ph.D. Dissertation. Department of Computer Science, University of Helsinki.Google ScholarGoogle Scholar
  14. W. Hämäläinen. 2012. Kingfisher: An efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowledge and Information Systems 32, 2, 383--414.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Han, H. Cheng, D. Xin, and X. Yan. 2007. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery 15, 1, 55--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Hanhijärvi, M. Ojala, N. Vuokko, K. Puolamäki, N. Tatti, and H. Mannila. 2009. Tell me something I don’t know: Randomization strategies for iterative data mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 379--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Jaroszewicz, T. Scheffer, and D. A. Simovici. 2009. Scalable pattern mining with Bayesian networks as background knowledge. Data Mining and Knowledge Discovery 18, 1, 56--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. T. Jaynes. 1982. On the rationale of maximum-entropy methods. Proceedings of the IEEE 70, 9, 939--952.Google ScholarGoogle ScholarCross RefCross Ref
  19. K.-N. Kontonasios and T. De Bie. 2010. An information-theoretic approach to finding noisy tiles in binary databases. In Proceedings of the 10th SIAM International Conference on Data Mining (SDM’10). SIAM, Columbus, OH, 153--164.Google ScholarGoogle Scholar
  20. J. Lijffijt, P. Papapetrou, and K. Puolamaki. 2012. A statistical significance testing approach to mining the most informative set of patterns. Data Mining and Knowledge Discovery 28, 1, 238--263. DOI: http://dx.doi.org/10.1007/s10618-012-0298-2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Mampaey, N. Tatti, and J. Vreeken. 2011. Tell me what I need to know: Succinctly summarizing data with itemsets. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 573--581. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Mampaey, J. Vreeken, and N. Tatti. 2012. Summarizing data succinctly with the most informative itemsets. ACM Transactions on Knowledge Discovery from Data 6, 4, 1--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. K. Novak, N. Lavrac, and G. I. Webb. 2009. Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup discovery. Journal of Machine Learning Research 10, 377--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. 1999a. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory (ICDT’99). 398--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. 1999b. Efficient mining of association rules using closed itemset lattices. Information Systems 24, 1, 25--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Piatetsky-Shapiro. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, Gregory Piatetsky-Shapiro and J. Frawley (Eds.). AAAI/MIT Press, Menlo Park, CA, 229--248.Google ScholarGoogle Scholar
  27. J. Rissanen. 1978. Modeling by shortest data description. Automatica 14, 1, 465--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Rymon. 1992. Search through systematic set enumeration. In Proceedings of KR-92. 268--275.Google ScholarGoogle Scholar
  29. A. Siebes, J. Vreeken, and M. van Leeuwen. 2006. Item sets that compress. In Proceedings of the 6th SIAM International Conference on Data Mining (SDM’06). SIAM, Bethesda, MD, 393--404.Google ScholarGoogle ScholarCross RefCross Ref
  30. N. Tatti. 2008. Maximum entropy based significance of itemsets. Knowledge and Information Systems 17, 1, 57--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Tatti and M. Mampaey. 2010. Using background knowledge to rank itemsets. Data Mining and Knowledge Discovery 21, 2, 293--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Tatti and J. Vreeken. 2012. Comparing apples and oranges—measuring differences between exploratory data mining results. Data Mining and Knowledge Discovery 25, 2, 173--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Tew, C. Giraud-Carrier, K. Tanner, and S. Burton. 2014. Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Mining and Knowledge Discovery 28, 4, 1004--1045. DOI: http://dx.doi.org/10.1007/s10618-013-0326-x Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Vreeken, M. van Leeuwen, and A. Siebes. 2011. Krimp: Mining itemsets that compress. Data Mining and Knowledge Discovery 23, 1, 169--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Wang and S. Parthasarathy. 2006. Summarizing itemset patterns using probabilistic models. In Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’06). 730--735. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. G. I. Webb. 1995. OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research 3, 431--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. I. Webb. 2000. Efficient search for association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00). ACM, New York, NY, 99--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. I. Webb. 2006. Discovering significant rules. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, New York, NY, 434--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. I. Webb. 2007. Discovering significant patterns. Machine Learning 68, 1, 1--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. I. Webb. 2008. Layered critical values: A powerful direct-adjustment approach to discovering significant patterns. Machine Learning 71, 2--3, 307--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. G. I. Webb. 2010. Self-sufficient itemsets: An approach to screening potentially interesting associations between items. Transactions on Knowledge Discovery from Data 4, 3:1--3:20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. G. I. Webb. 2011. Filtered-top-k association discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1, 3, 183--192. DOI: http://dx.doi.org/10.1002/widm.28Google ScholarGoogle ScholarCross RefCross Ref
  43. G. I. Webb and S. Zhang. 2005. K-Optimal rule discovery. Data Mining and Knowledge Discovery 10, 1, 39--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Wu, C. Zhang, and S. Zhang. 2004. Efficient mining of both positive and negative association rules. ACM Transactions on Information Systems 22, 3, 381--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. J. Zaki. 2000. Generating non-redundant association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00). ACM, New York, NY, 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. J. Zaki and C. J. Hsiao. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining. 457--473.Google ScholarGoogle Scholar
  47. A. Zimmermann. 2013. Objectively evaluating interestingness measures for frequent itemset mining. In Proceedings of the Emerging Trends in Knowledge Discovery and Data Mining International Workshops (PAKDD’13), 354--366. http://link.springer.com/chapter/10.1007%2F978-3-642-40319-4_31.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Discovery of the Most Interesting Associations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 8, Issue 3
      June 2014
      160 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2630992
      Issue’s Table of Contents

      Copyright © 2013 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Revised: 1 September 2013
      • Accepted: 1 September 2013
      • Published: 1 June 2013
      • Received: 1 September 2012
      Published in tkdd Volume 8, Issue 3

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader