skip to main content
10.1145/956750.956764acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Information-theoretic co-clustering

Published:24 August 2003Publication History

ABSTRACT

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

References

  1. P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In KDD'03. AAAI Press, 1998.]]Google ScholarGoogle Scholar
  2. Y. Cheng and G. Church. Biclustering of expression data. In Proceedings ISMB, pages 93-103. AAAI Press, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, New York, USA, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of The 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-2001), pages 269-274, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. S. Dhillon, J. Fan, and Y. Guan. Efficient clustering of very large document collections. In R. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, and R. Namburu, editors, Data Mining for Scientific and Engineering Applications, pages 357-381. Kluwer Academic Publishers, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. I. S. Dhillon, S. Mallela, and R. Kumar. A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research(JMLR): Special Issue on Variable and Feature Selection, 3:1265-1287, March 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1):143-175, January 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semi-supervised learning. In ECML-01, pages 121-132, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. A. Fisher. On the interpretation of x2 from the contingency tables, and the calculation of p. J. Royal Stat. Soc., 85:87-94, 1922.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. N. Friedman, O. Mosenzon, N. Slonim, and N. Tishby. Multivariate information bottleneck. In UAI-2001, pages 152-161, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337):123-129, March 1972.]]Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Hofmann. Probabilistic latent semantic indexing. In Proc. ACM SIGIR. ACM Press, August 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Hofmann and J. Puzicha. Latent class models for collaborative filtering. In Proceedings of the International Joint Conference in Artificial Intelligence (IJCAI), 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ken Lang. News Weeder: Learning to filter netnews. In ICML-95, pages 331-339, 1995.]]Google ScholarGoogle Scholar
  16. A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. www.cs.cmu.edu/ mccallum/bow, 1996.]]Google ScholarGoogle Scholar
  17. B. Mirkin. Mathematical Classification and Clustering. Kluwer Academic Publishers, 1996.]]Google ScholarGoogle ScholarCross RefCross Ref
  18. N. Slonim, N. Friedman, and N. Tishby. Agglomerative multivariate information bottleneck. In NIPS-14, 2001.]]Google ScholarGoogle Scholar
  19. N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR, pages 208-215, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing, pages 368-377, 1999.]]Google ScholarGoogle Scholar

Index Terms

  1. Information-theoretic co-clustering

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
              August 2003
              736 pages
              ISBN:1581137370
              DOI:10.1145/956750

              Copyright © 2003 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 24 August 2003

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

              Upcoming Conference

              KDD '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader