ABSTRACT
We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limited-memory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques.
- Barndorff-Nielsen, O. E. (1978). Information and Exponential Families in Statistical Theory. Wiley, Chichester.Google Scholar
- Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society B, 48(3), 259--302.Google Scholar
- Blake, A., Rother, C., Brown, M., Perez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive GMMRF model. In Proc. European Conf. on Computer Vision.Google ScholarCross Ref
- Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222--1239. Google ScholarDigital Library
- Collins, M. (2002). Discriminative training methods for hidden markov models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Google ScholarDigital Library
- Griewank, A. (2000). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Frontiers in Applied Mathematics. Philadelphia: SIAM. Google ScholarDigital Library
- Hirschman, L., Yeh, A., Blaschke, C., & Valencia, A. (2005). Overview of BioCreAtivE:critical assessment of information extraction for biology. BMC Bioinformatics, 6(Suppl 1).Google Scholar
- Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., & Collier, N. (2004). Introduction to the bio-entity recognition task at JNLPBA. In Proceeding of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), 70 -- 75. Geneva, Switzerland. Google ScholarDigital Library
- Kolmogorov, V. (2004). Convergent tree-reweighted message passing for energy minimization. Tech. Rep. MSR-TR-2004-90, Microsoft Research, Cambridge, UK.Google Scholar
- Kumar, S., & Hebert, M. (2003). Man-made structure detection in natural images using a causal multiscale random field. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. Google ScholarDigital Library
- Kumar, S., & Hebert, M. (2004). Discriminative fields for modeling spatial dependencies in natural images. In Advances in Neural Information Processing Systems 16.Google Scholar
- Lafferty, J. D., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic modeling for segmenting and labeling sequence data. In Proc. Intl. Conf. Machine Learning, vol. 18. Google ScholarDigital Library
- Lipton, R. J., & Tarjan, R. E. (1979). A separator theorem for planar graphs. SIAM Journal of Applied Mathematics, 36, 177--189.Google ScholarDigital Library
- Parise, S., & Welling, M. (2005). Learning in markov random fields: An empirical study. In Joint Statistical Meeting.Google Scholar
- Pearlmutter, B. A. (1994). Fast exact multiplication by the Hessian. Neural Computation, 6(1), 147--160. Google ScholarDigital Library
- Sang, E. F. T. K., & Buchholz, S. (2000). Introduction to the CoNLL-2000 shared task: Chunking. In In Proceedings of CoNLL-2000, 127 -- 132. Lisbon, Portugal. Google ScholarDigital Library
- Schraudolph, N. N. (1999). Local gain adaptation in stochastic gradient descent. In Proc. Intl. Conf. Artificial Neural Networks, 569--574. Edinburgh, Scotland: IEE, London.Google ScholarCross Ref
- Schraudolph, N. N. (2002). Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7), 1723--1738. Google ScholarDigital Library
- Schraudolph, N. N., & Graepel, T. (2003). Combining conjugate direction methods with stochastic approximation of gradients. In C. M. Bishop, & B. J. Frey, eds., Proc. 9th Intl. Workshop Artificial Intelligence and Statistics, 7--13. Key West, Florida. ISBN 0-9727358-0-1.Google Scholar
- Settles, B. (2004). Biomedical named intity recognition using conditional random fields and rich feature sets. In Proceedings of COLING 2004, International Joint Workshop On Natural Language Processing in Biomedicine and its Applications (NLPBA). Geneva, Switzerland. Google ScholarDigital Library
- Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL, 213--220. Association for Computational Linguistics. Google ScholarDigital Library
- Vishwanathan, S. V. N., Schraudolph, N. N., & Smola, A. J. (2006). Online SVM with multiclass classification and SMD step size adaptation. Journal of Machine Learning Research. To appear.Google Scholar
- Weiss, Y. (2001). Comparing the mean field method and belief propagation for approximate inference in MRFs. In D. Saad, & M. Opper, eds., Advanced Mean Field Methods. MIT Press.Google Scholar
- Winkler, G. (1995). Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Springer Verlag. Google ScholarDigital Library
- Yedidia, J., Freeman, W., & Weiss, Y. (2003). Understanding belief propagation and its generalizations. In Exploring Artificial Intelligence in the New Millennium, chap. 8, 239--236, Science & Technology Books. Google ScholarDigital Library
Index Terms
- Accelerated training of conditional random fields with stochastic gradient methods
Recommendations
Accelerated gradient methods for nonconvex nonlinear and stochastic programming
In this paper, we generalize the well-known Nesterov's accelerated gradient (AG) method, originally designed for convex smooth optimization, to solve nonconvex and possibly stochastic optimization problems. We demonstrate that by properly specifying the ...
Stochastic conditional gradient methods: from convex minimization to submodular maximization
This paper considers stochastic optimization problems for a large class of objective functions, including convex and continuous submodular. Stochastic proximal gradient methods have been widely used to solve such problems; however, their applicability ...
Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent OptimizationHidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...
Comments