Article

Accelerated training of conditional random fields with stochastic gradient methods

Authors:
S. V. N. Vishwanathan

National ICT Australia, Australia and Australian National University, Australia

National ICT Australia, Australia and Australian National University, Australia
View Profile

,
Nicol N. Schraudolph

National ICT Australia, Australia and Australian National University, Australia

National ICT Australia, Australia and Australian National University, Australia
View Profile

,
Mark W. Schmidt

University of British Columbia, Canada

University of British Columbia, Canada
View Profile

,
Kevin P. Murphy

University of British Columbia, Canada

University of British Columbia, Canada
View Profile

ICML '06: Proceedings of the 23rd international conference on Machine learningJune 2006Pages 969–976https://doi.org/10.1145/1143844.1143966

Published:25 June 2006Publication History

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 969–976

ABSTRACT

We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limited-memory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques.

References

Barndorff-Nielsen, O. E. (1978). Information and Exponential Families in Statistical Theory. Wiley, Chichester.Google Scholar
Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society B, 48(3), 259--302.Google Scholar
Blake, A., Rother, C., Brown, M., Perez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive GMMRF model. In Proc. European Conf. on Computer Vision.Google ScholarCross Ref
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222--1239. Google ScholarDigital Library
Collins, M. (2002). Discriminative training methods for hidden markov models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Google ScholarDigital Library
Griewank, A. (2000). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Frontiers in Applied Mathematics. Philadelphia: SIAM. Google ScholarDigital Library
Hirschman, L., Yeh, A., Blaschke, C., & Valencia, A. (2005). Overview of BioCreAtivE:critical assessment of information extraction for biology. BMC Bioinformatics, 6(Suppl 1).Google Scholar
Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., & Collier, N. (2004). Introduction to the bio-entity recognition task at JNLPBA. In Proceeding of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), 70 -- 75. Geneva, Switzerland. Google ScholarDigital Library
Kolmogorov, V. (2004). Convergent tree-reweighted message passing for energy minimization. Tech. Rep. MSR-TR-2004-90, Microsoft Research, Cambridge, UK.Google Scholar
Kumar, S., & Hebert, M. (2003). Man-made structure detection in natural images using a causal multiscale random field. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. Google ScholarDigital Library
Kumar, S., & Hebert, M. (2004). Discriminative fields for modeling spatial dependencies in natural images. In Advances in Neural Information Processing Systems 16.Google Scholar
Lafferty, J. D., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic modeling for segmenting and labeling sequence data. In Proc. Intl. Conf. Machine Learning, vol. 18. Google ScholarDigital Library
Lipton, R. J., & Tarjan, R. E. (1979). A separator theorem for planar graphs. SIAM Journal of Applied Mathematics, 36, 177--189.Google ScholarDigital Library
Parise, S., & Welling, M. (2005). Learning in markov random fields: An empirical study. In Joint Statistical Meeting.Google Scholar
Pearlmutter, B. A. (1994). Fast exact multiplication by the Hessian. Neural Computation, 6(1), 147--160. Google ScholarDigital Library
Sang, E. F. T. K., & Buchholz, S. (2000). Introduction to the CoNLL-2000 shared task: Chunking. In In Proceedings of CoNLL-2000, 127 -- 132. Lisbon, Portugal. Google ScholarDigital Library
Schraudolph, N. N. (1999). Local gain adaptation in stochastic gradient descent. In Proc. Intl. Conf. Artificial Neural Networks, 569--574. Edinburgh, Scotland: IEE, London.Google ScholarCross Ref
Schraudolph, N. N. (2002). Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7), 1723--1738. Google ScholarDigital Library
Schraudolph, N. N., & Graepel, T. (2003). Combining conjugate direction methods with stochastic approximation of gradients. In C. M. Bishop, & B. J. Frey, eds., Proc. 9th Intl. Workshop Artificial Intelligence and Statistics, 7--13. Key West, Florida. ISBN 0-9727358-0-1.Google Scholar
Settles, B. (2004). Biomedical named intity recognition using conditional random fields and rich feature sets. In Proceedings of COLING 2004, International Joint Workshop On Natural Language Processing in Biomedicine and its Applications (NLPBA). Geneva, Switzerland. Google ScholarDigital Library
Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL, 213--220. Association for Computational Linguistics. Google ScholarDigital Library
Vishwanathan, S. V. N., Schraudolph, N. N., & Smola, A. J. (2006). Online SVM with multiclass classification and SMD step size adaptation. Journal of Machine Learning Research. To appear.Google Scholar
Weiss, Y. (2001). Comparing the mean field method and belief propagation for approximate inference in MRFs. In D. Saad, & M. Opper, eds., Advanced Mean Field Methods. MIT Press.Google Scholar
Winkler, G. (1995). Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Springer Verlag. Google ScholarDigital Library
Yedidia, J., Freeman, W., & Weiss, Y. (2003). Understanding belief propagation and its generalizations. In Exploring Artificial Intelligence in the New Millennium, chap. 8, 239--236, Science & Technology Books. Google ScholarDigital Library

Index Terms

Accelerated training of conditional random fields with stochastic gradient methods
1. Mathematics of computing
  1. Probability and statistics
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes

Recommendations

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

In this paper, we generalize the well-known Nesterov's accelerated gradient (AG) method, originally designed for convex smooth optimization, to solve nonconvex and possibly stochastic optimization problems. We demonstrate that by properly specifying the ...
Read More
Stochastic conditional gradient methods: from convex minimization to submodular maximization

This paper considers stochastic optimization problems for a large class of objective functions, including convex and continuous submodular. Stochastic proximal gradient methods have been widely used to solve such problems; however, their applicability ...
Read More
Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent Optimization

Hidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Program Chairs:
William Cohen,
Andrew Moore
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 168
  Total Citations
  View Citations
- 1,164
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerated training of conditional random fields with stochastic gradient methods

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

Stochastic conditional gradient methods: from convex minimization to submodular maximization

Hierarchical hidden conditional random fields for information extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accelerated training of conditional random fields with stochastic gradient methods

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

Stochastic conditional gradient methods: from convex minimization to submodular maximization

Hierarchical hidden conditional random fields for information extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media