ABSTRACT
Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for classification, have been extensively used in statistics and machine learning. In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. This class of methods, related to online algorithms such as perceptron, are both efficient and very simple to implement. We obtain numerical rate of convergence for such algorithms, and discuss its implications. Experiments on text data will be provided to demonstrate numerical and statistical consequences of our theoretical findings.
- Cesa-Bianchi, N. (1999). Analysis of two gradient-based algorithms for on-line reression. Journal of Computer and System Sciences, 59, 392--411. Google ScholarDigital Library
- Collins, M. (2002). Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. Proc. EMNLP'02. Google ScholarDigital Library
- Freund, Y., & Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37, 277--296. Google ScholarDigital Library
- Kivinen, J., Smola, A., & Williamson, R. (2002). Large margin classification for moving targets. Lecture Notes in Artificial Intelligence (ALT 2002) (pp. 113--127). Springer. Google ScholarDigital Library
- Kivinen, J., & Warmuth, M. (2001). Relative loss bounds for multidimensional regression problems. Machine Learning, 45, 301--329. Google ScholarDigital Library
- Kushner, H. J., & Yin, G. G. (1997). Stochastic approximation algorithms and applications. New York: Springer-Verlag.Google Scholar
- Li, F., & Yang, Y. (2003). A loss function analysis for classification methods in text categorization. ICML 03 (pp. 472--479).Google Scholar
- Polyak, B. T., & Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30, 838--855. Google ScholarDigital Library
- Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. New York: Spartan.Google Scholar
- Zhang, T., & Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval, 4, 5--31. Google ScholarDigital Library
Recommendations
Stochastic gradient descent for large-scale linear nonparallel SVM
WI '17: Proceedings of the International Conference on Web IntelligenceIn recent years, nonparallel support vector machine (NPSVM) is proposed as a nonparallel hyperplane classifier with superior performance than standard SVM and existing nonparallel classifiers such as the twin support vector machine (TWSVM). With the ...
Constrained stochastic gradient descent for large-scale least squares problem
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningThe least squares problem is one of the most important regression problems in statistics, machine learning and data mining. In this paper, we present the Constrained Stochastic Gradient Descent (CSGD) algorithm to solve the large-scale least squares ...
Spectral conjugate gradient methods with sufficient descent property for large-scale unconstrained optimization
Dedicated to Professor Michael J.D. Powell on the occasion of his 70th birthdayA class of new spectral conjugate gradient methods are proposed in this paper. First, we modify the spectral Perry's conjugate gradient method, which is the best spectral conjugate gradient algorithm SCG by Birgin and Martinez [E.G. Birgin and J.M. ...
Comments