skip to main content
10.1145/1143844.1143966acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Accelerated training of conditional random fields with stochastic gradient methods

Published:25 June 2006Publication History

ABSTRACT

We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limited-memory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques.

References

  1. Barndorff-Nielsen, O. E. (1978). Information and Exponential Families in Statistical Theory. Wiley, Chichester.Google ScholarGoogle Scholar
  2. Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society B, 48(3), 259--302.Google ScholarGoogle Scholar
  3. Blake, A., Rother, C., Brown, M., Perez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive GMMRF model. In Proc. European Conf. on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  4. Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222--1239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Collins, M. (2002). Discriminative training methods for hidden markov models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Griewank, A. (2000). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Frontiers in Applied Mathematics. Philadelphia: SIAM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hirschman, L., Yeh, A., Blaschke, C., & Valencia, A. (2005). Overview of BioCreAtivE:critical assessment of information extraction for biology. BMC Bioinformatics, 6(Suppl 1).Google ScholarGoogle Scholar
  8. Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., & Collier, N. (2004). Introduction to the bio-entity recognition task at JNLPBA. In Proceeding of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), 70 -- 75. Geneva, Switzerland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kolmogorov, V. (2004). Convergent tree-reweighted message passing for energy minimization. Tech. Rep. MSR-TR-2004-90, Microsoft Research, Cambridge, UK.Google ScholarGoogle Scholar
  10. Kumar, S., & Hebert, M. (2003). Man-made structure detection in natural images using a causal multiscale random field. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kumar, S., & Hebert, M. (2004). Discriminative fields for modeling spatial dependencies in natural images. In Advances in Neural Information Processing Systems 16.Google ScholarGoogle Scholar
  12. Lafferty, J. D., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic modeling for segmenting and labeling sequence data. In Proc. Intl. Conf. Machine Learning, vol. 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lipton, R. J., & Tarjan, R. E. (1979). A separator theorem for planar graphs. SIAM Journal of Applied Mathematics, 36, 177--189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Parise, S., & Welling, M. (2005). Learning in markov random fields: An empirical study. In Joint Statistical Meeting.Google ScholarGoogle Scholar
  15. Pearlmutter, B. A. (1994). Fast exact multiplication by the Hessian. Neural Computation, 6(1), 147--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sang, E. F. T. K., & Buchholz, S. (2000). Introduction to the CoNLL-2000 shared task: Chunking. In In Proceedings of CoNLL-2000, 127 -- 132. Lisbon, Portugal. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Schraudolph, N. N. (1999). Local gain adaptation in stochastic gradient descent. In Proc. Intl. Conf. Artificial Neural Networks, 569--574. Edinburgh, Scotland: IEE, London.Google ScholarGoogle ScholarCross RefCross Ref
  18. Schraudolph, N. N. (2002). Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7), 1723--1738. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Schraudolph, N. N., & Graepel, T. (2003). Combining conjugate direction methods with stochastic approximation of gradients. In C. M. Bishop, & B. J. Frey, eds., Proc. 9th Intl. Workshop Artificial Intelligence and Statistics, 7--13. Key West, Florida. ISBN 0-9727358-0-1.Google ScholarGoogle Scholar
  20. Settles, B. (2004). Biomedical named intity recognition using conditional random fields and rich feature sets. In Proceedings of COLING 2004, International Joint Workshop On Natural Language Processing in Biomedicine and its Applications (NLPBA). Geneva, Switzerland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL, 213--220. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Vishwanathan, S. V. N., Schraudolph, N. N., & Smola, A. J. (2006). Online SVM with multiclass classification and SMD step size adaptation. Journal of Machine Learning Research. To appear.Google ScholarGoogle Scholar
  23. Weiss, Y. (2001). Comparing the mean field method and belief propagation for approximate inference in MRFs. In D. Saad, & M. Opper, eds., Advanced Mean Field Methods. MIT Press.Google ScholarGoogle Scholar
  24. Winkler, G. (1995). Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yedidia, J., Freeman, W., & Weiss, Y. (2003). Understanding belief propagation and its generalizations. In Exploring Artificial Intelligence in the New Millennium, chap. 8, 239--236, Science & Technology Books. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accelerated training of conditional random fields with stochastic gradient methods

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                ICML '06: Proceedings of the 23rd international conference on Machine learning
                June 2006
                1154 pages
                ISBN:1595933832
                DOI:10.1145/1143844

                Copyright © 2006 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 25 June 2006

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader