skip to main content
article
Free Access

Latent dirichlet allocation

Authors Info & Claims
Published:01 March 2003Publication History
Skip Abstract Section

Abstract

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

References

  1. M. Abramowitz and I. Stegun, editors. Handbook of Mathematical Functions. Dover, New York, 1970. Google ScholarGoogle Scholar
  2. D. Aldous. Exchangeability and related topics. In École d'été de probabilités de Saint-Flour, XIII-- 1983, pages 1-198. Springer, Berlin, 1985.Google ScholarGoogle Scholar
  3. H. Attias. A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems 12, 2000.Google ScholarGoogle Scholar
  4. L. Avery. Caenorrhabditis genetic center bibliography. 2002. URL http://elegans.swmed.edu/wli/cgcbib.Google ScholarGoogle Scholar
  5. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, New York, 1999. Google ScholarGoogle Scholar
  6. D. Blei and M. Jordan. Modeling annotated data. Technical Report UCB//CSD-02-1202, U.C. Berkeley Computer Science Division, 2002.Google ScholarGoogle Scholar
  7. B. de Finetti. Theory of probability. Vol. 1-2. John Wiley & Sons Ltd., Chichester, 1990. Reprint of the 1975 translation.Google ScholarGoogle Scholar
  8. S. Deerwester, S. Dumais, T. Landauer, G. Furnas, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6): 391-407, 1990.Google ScholarGoogle Scholar
  9. P. Diaconis. Recent progress on de Finetti's notions of exchangeability. In Bayesian statistics, 3 (Valencia, 1987), pages 111-125. Oxford Univ. Press, New York, 1988.Google ScholarGoogle Scholar
  10. J. Dickey. Multiple hypergeometric functions: Probabilistic interpretations and statistical uses. Journal of the American Statistical Association, 78: 628-637, 1983.Google ScholarGoogle Scholar
  11. J. Dickey, J. Jiang, and J. Kadane. Bayesian methods for censored categorical data. Journal of the American Statistical Association, 82: 773-781, 1987.Google ScholarGoogle Scholar
  12. A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian data analysis. Chapman & Hall, London, 1995.Google ScholarGoogle Scholar
  13. T. Griffiths and M. Steyvers. A probabilistic approach to semantic representation. In Proceedings of the 24th Annual Conference of the Cognitive Science Society, 2002.Google ScholarGoogle Scholar
  14. D. Harman. Overview of the first text retrieval conference (TREC-1). In Proceedings of the First Text Retrieval Conference (TREC-1), pages 1-20, 1992.Google ScholarGoogle Scholar
  15. D. Heckerman and M. Meila. An experimental comparison of several clustering and initialization methods. Machine Learning, 42: 9-29, 2001. Google ScholarGoogle Scholar
  16. T. Hofmann. Probabilistic latent semantic indexing. Proceedings of the Twenty-Second Annual International SIGIR Conference, 1999. Google ScholarGoogle Scholar
  17. F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1997. Google ScholarGoogle Scholar
  18. T. Joachims. Making large-scale SVM learning practical. In Advances in Kernel Methods - Support Vector Learning. M.I.T. Press, 1999. Google ScholarGoogle Scholar
  19. M. Jordan, editor. Learning in Graphical Models. MIT Press, Cambridge, MA, 1999. Google ScholarGoogle Scholar
  20. M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. Introduction to variational methods for graphical models. Machine Learning, 37: 183-233, 1999. Google ScholarGoogle Scholar
  21. R. Kass and D. Steffey. Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). Journal of the American Statistical Association, 84 (407): 717-726, 1989.Google ScholarGoogle Scholar
  22. M. Leisink and H. Kappen. General lower bounds based on computer generated higher order expansions. In Uncertainty in Artificial Intelligence, Proceedings of the Eighteenth Conference, 2002. Google ScholarGoogle Scholar
  23. T. Minka. Estimating a Dirichlet distribution. Technical report, M.I.T., 2000.Google ScholarGoogle Scholar
  24. T. P. Minka and J. Lafferty. Expectation-propagation for the generative aspect model. In Uncertainty in Artificial Intelligence (UAI), 2002. Google ScholarGoogle Scholar
  25. C. Morris. Parametric empirical Bayes inference: Theory and applications. Journal of the American Statistical Association, 78(381): 47-65, 1983. With discussion.Google ScholarGoogle Scholar
  26. K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61-67, 1999.Google ScholarGoogle Scholar
  27. K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3): 103-134, 2000. Google ScholarGoogle Scholar
  28. C. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala. Latent semantic indexing: A probabilistic analysis. pages 159-168, 1998. Google ScholarGoogle Scholar
  29. A. Popescul, L. Ungar, D. Pennock, and S. Lawrence. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In Uncertainty in Artificial Intelligence, Proceedings of the Seventeenth Conference, 2001. Google ScholarGoogle Scholar
  30. J. Rennie. Improving multi-class text classification with naive Bayes. Technical Report AITR-2001- 004, M.I.T., 2001.Google ScholarGoogle Scholar
  31. G. Ronning. Maximum likelihood estimation of Dirichlet distributions. Journal of Statistcal Computation and Simulation, 34(4): 215-221, 1989.Google ScholarGoogle Scholar
  32. G. Salton and M. McGill, editors. Introduction to Modern Information Retrieval. McGraw-Hill, 1983. Google ScholarGoogle Scholar

Index Terms

  1. Latent dirichlet allocation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image The Journal of Machine Learning Research
          The Journal of Machine Learning Research  Volume 3, Issue
          3/1/2003
          1437 pages
          ISSN:1532-4435
          EISSN:1533-7928
          Issue’s Table of Contents

          Publisher

          JMLR.org

          Publication History

          • Published: 1 March 2003
          Published in jmlr Volume 3, Issue

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader