research-article

Open Access

Efficient Discovery of the Most Interesting Associations

Authors:
Geoffrey I. Webb

Monash University, Australia

Monash University, Australia
View Profile

,
Jilles Vreeken

University of Antwerp, Belgium

University of Antwerp, Belgium
View Profile

Authors Info & Claims

ACM Transactions on Knowledge Discovery from Data Volume 8 Issue 3Article No.: 15pp 1–31https://doi.org/10.1145/2601433

Published:01 June 2013Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Self-sufficient itemsets have been proposed as an effective approach to summarizing the key associations in data. However, their computation appears highly demanding, as assessing whether an itemset is self-sufficient requires consideration of all pairwise partitions of the itemset into pairs of subsets as well as consideration of all supersets. This article presents the first published algorithm for efficiently discovering self-sufficient itemsets. This branch-and-bound algorithm deploys two powerful pruning mechanisms based on upper bounds on itemset value and statistical significance level. It demonstrates that finding top-k productive and nonredundant itemsets, with postprocessing to identify those that are not independently productive, can efficiently identify small sets of key associations. We present extensive evaluation of the strengths and limitations of the technique, including comparisons with alternative approaches to finding the most interesting associations.

References

R. Agrawal, T. Imielinski, and A. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data. 207--216. Google ScholarDigital Library
Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. 2000. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (CL’00). Springer-Verlag, Berlin, 972--986. Google ScholarDigital Library
R. J. Bayardo, Jr., R. Agrawal, and D. Gunopulos. 2000. Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery 4, 2--3, 217--240. Google ScholarDigital Library
R. Brijs, G. Swinnen, K. Vanhoof, and G.Wets. 1999. Using association rules for product assortment decisions: A case study. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 254--260. Google ScholarDigital Library
T. Calders and B. Goethals. 2002. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKKD’02). Springer, Berlin, 74--85. Google ScholarDigital Library
T. Calders and B. Goethals. 2007. Non-derivable itemset mining. Data Mining and Knowledge Discovery 14, 1, 171--206. Google ScholarDigital Library
T. De Bie. 2011. Maximum entropy models and subjective interestingness: An application to tiles in binary databases. Data Mining and Knowledge Discovery 23, 3, 407--446. Google ScholarDigital Library
A. W. C. Fu, W. K. Renfrew, and J. Tang. 2000. Mining N-most interesting itemsets. In Proceedings of the 12th International Symposium on Foundations of Intelligent Systems. 59--67. Google ScholarDigital Library
A. Gallo, T. De Bie, and N. Cristianini. 2007. MINI: Mining informative non-redundant itemsets. In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’07). Lecture Notes in Computer Science, Joost Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, Dunja Mladenic, and Andrzej Skowron (Eds.), Vol. 4702. Springer, Berlin/Heidelberg, 438--445. Google ScholarDigital Library
L. Geng and H. J. Hamilton. 2006. Interestingness measures for data mining: A survey. Computing Surveys 38, 3, 9. Google ScholarDigital Library
K. Geurts, G. Wets, T. Brijs, and K. Vanhoof. 2003. Profiling high frequency accident locations using association rules. In Proceedings of the 82nd Annual Transportation Research Board.Google Scholar
B. Goethals. 2012. Frequent Itemset Mining Implementations Repository. Retrieved April 26, 2014, from http://fimi.ua.ac.be/.Google Scholar
W. Hämäläinen. 2010. Efficient Search for Statistically Significant Dependency Rules in Binary Data. Ph.D. Dissertation. Department of Computer Science, University of Helsinki.Google Scholar
W. Hämäläinen. 2012. Kingfisher: An efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowledge and Information Systems 32, 2, 383--414.Google ScholarCross Ref
J. Han, H. Cheng, D. Xin, and X. Yan. 2007. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery 15, 1, 55--86. Google ScholarDigital Library
S. Hanhijärvi, M. Ojala, N. Vuokko, K. Puolamäki, N. Tatti, and H. Mannila. 2009. Tell me something I don’t know: Randomization strategies for iterative data mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 379--388. Google ScholarDigital Library
S. Jaroszewicz, T. Scheffer, and D. A. Simovici. 2009. Scalable pattern mining with Bayesian networks as background knowledge. Data Mining and Knowledge Discovery 18, 1, 56--100. Google ScholarDigital Library
E. T. Jaynes. 1982. On the rationale of maximum-entropy methods. Proceedings of the IEEE 70, 9, 939--952.Google ScholarCross Ref
K.-N. Kontonasios and T. De Bie. 2010. An information-theoretic approach to finding noisy tiles in binary databases. In Proceedings of the 10th SIAM International Conference on Data Mining (SDM’10). SIAM, Columbus, OH, 153--164.Google Scholar
J. Lijffijt, P. Papapetrou, and K. Puolamaki. 2012. A statistical significance testing approach to mining the most informative set of patterns. Data Mining and Knowledge Discovery 28, 1, 238--263. DOI: http://dx.doi.org/10.1007/s10618-012-0298-2 Google ScholarDigital Library
M. Mampaey, N. Tatti, and J. Vreeken. 2011. Tell me what I need to know: Succinctly summarizing data with itemsets. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 573--581. Google ScholarDigital Library
M. Mampaey, J. Vreeken, and N. Tatti. 2012. Summarizing data succinctly with the most informative itemsets. ACM Transactions on Knowledge Discovery from Data 6, 4, 1--44. Google ScholarDigital Library
P. K. Novak, N. Lavrac, and G. I. Webb. 2009. Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup discovery. Journal of Machine Learning Research 10, 377--403. Google ScholarDigital Library
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. 1999a. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory (ICDT’99). 398--416. Google ScholarDigital Library
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. 1999b. Efficient mining of association rules using closed itemset lattices. Information Systems 24, 1, 25--46. Google ScholarDigital Library
G. Piatetsky-Shapiro. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, Gregory Piatetsky-Shapiro and J. Frawley (Eds.). AAAI/MIT Press, Menlo Park, CA, 229--248.Google Scholar
J. Rissanen. 1978. Modeling by shortest data description. Automatica 14, 1, 465--471. Google ScholarDigital Library
R. Rymon. 1992. Search through systematic set enumeration. In Proceedings of KR-92. 268--275.Google Scholar
A. Siebes, J. Vreeken, and M. van Leeuwen. 2006. Item sets that compress. In Proceedings of the 6th SIAM International Conference on Data Mining (SDM’06). SIAM, Bethesda, MD, 393--404.Google ScholarCross Ref
N. Tatti. 2008. Maximum entropy based significance of itemsets. Knowledge and Information Systems 17, 1, 57--77. Google ScholarDigital Library
N. Tatti and M. Mampaey. 2010. Using background knowledge to rank itemsets. Data Mining and Knowledge Discovery 21, 2, 293--309. Google ScholarDigital Library
N. Tatti and J. Vreeken. 2012. Comparing apples and oranges—measuring differences between exploratory data mining results. Data Mining and Knowledge Discovery 25, 2, 173--207. Google ScholarDigital Library
C. Tew, C. Giraud-Carrier, K. Tanner, and S. Burton. 2014. Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Mining and Knowledge Discovery 28, 4, 1004--1045. DOI: http://dx.doi.org/10.1007/s10618-013-0326-x Google ScholarDigital Library
J. Vreeken, M. van Leeuwen, and A. Siebes. 2011. Krimp: Mining itemsets that compress. Data Mining and Knowledge Discovery 23, 1, 169--214. Google ScholarDigital Library
C. Wang and S. Parthasarathy. 2006. Summarizing itemset patterns using probabilistic models. In Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’06). 730--735. Google ScholarDigital Library
G. I. Webb. 1995. OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research 3, 431--465. Google ScholarDigital Library
G. I. Webb. 2000. Efficient search for association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00). ACM, New York, NY, 99--107. Google ScholarDigital Library
G. I. Webb. 2006. Discovering significant rules. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, New York, NY, 434--443. Google ScholarDigital Library
G. I. Webb. 2007. Discovering significant patterns. Machine Learning 68, 1, 1--33. Google ScholarDigital Library
G. I. Webb. 2008. Layered critical values: A powerful direct-adjustment approach to discovering significant patterns. Machine Learning 71, 2--3, 307--323. Google ScholarDigital Library
G. I. Webb. 2010. Self-sufficient itemsets: An approach to screening potentially interesting associations between items. Transactions on Knowledge Discovery from Data 4, 3:1--3:20. Google ScholarDigital Library
G. I. Webb. 2011. Filtered-top-k association discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1, 3, 183--192. DOI: http://dx.doi.org/10.1002/widm.28Google ScholarCross Ref
G. I. Webb and S. Zhang. 2005. K-Optimal rule discovery. Data Mining and Knowledge Discovery 10, 1, 39--79. Google ScholarDigital Library
X. Wu, C. Zhang, and S. Zhang. 2004. Efficient mining of both positive and negative association rules. ACM Transactions on Information Systems 22, 3, 381--405. Google ScholarDigital Library
M. J. Zaki. 2000. Generating non-redundant association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00). ACM, New York, NY, 34--43. Google ScholarDigital Library
M. J. Zaki and C. J. Hsiao. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining. 457--473.Google Scholar
A. Zimmermann. 2013. Objectively evaluating interestingness measures for frequent itemset mining. In Proceedings of the Emerging Trends in Knowledge Discovery and Data Mining International Workshops (PAKDD’13), 354--366. http://link.springer.com/chapter/10.1007&percnt;2F978-3-642-40319-4_31.Google ScholarDigital Library

Index Terms

Efficient Discovery of the Most Interesting Associations
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Re-mining item associations: Methodology and a case study in apparel retailing

Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not ...
Read More
A unified approach for discovery of interesting association rules in medical databases
ICDM'06: Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining

Association rule discovery is an important technique for mining knowledge from large databases. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules and to improve the overall efficiency of ...
Read More
On discovery of soft associations with "most" fuzzy quantifier for item promotion applications

In item promotion applications, there is a strong need for tools that can help to unlock the hidden profit within each individual customer's transaction history. Discovering association patterns based on the data mining technique is helpful for this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 8, Issue 3
June 2014
160 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2630992
Issue’s Table of Contents

Copyright © 2013 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Revised: 1 September 2013
- Accepted: 1 September 2013
- Published: 1 June 2013
- Received: 1 September 2012
Published in tkdd Volume 8, Issue 3

Check for updates
Author Tags
Association mining
interestingness
itemset mining
statistical association mining
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 1,407
  Total Downloads
- Downloads (Last 12 months)53
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient Discovery of the Most Interesting Associations

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Re-mining item associations: Methodology and a case study in apparel retailing

A unified approach for discovery of interesting association rules in medical databases

On discovery of soft associations with "most" fuzzy quantifier for item promotion applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient Discovery of the Most Interesting Associations

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Re-mining item associations: Methodology and a case study in apparel retailing

A unified approach for discovery of interesting association rules in medical databases

On discovery of soft associations with "most" fuzzy quantifier for item promotion applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media