Article

Solving large scale linear prediction problems using stochastic gradient descent algorithms

Author:
Tong Zhang

IBM T. J. Watson Research Center, Yorktown Heights, NY

IBM T. J. Watson Research Center, Yorktown Heights, NY
View Profile

ICML '04: Proceedings of the twenty-first international conference on Machine learningJuly 2004https://doi.org/10.1145/1015330.1015332

Published:04 July 2004Publication History

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for classification, have been extensively used in statistics and machine learning. In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. This class of methods, related to online algorithms such as perceptron, are both efficient and very simple to implement. We obtain numerical rate of convergence for such algorithms, and discuss its implications. Experiments on text data will be provided to demonstrate numerical and statistical consequences of our theoretical findings.

References

Cesa-Bianchi, N. (1999). Analysis of two gradient-based algorithms for on-line reression. Journal of Computer and System Sciences, 59, 392--411. Google ScholarDigital Library
Collins, M. (2002). Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. Proc. EMNLP'02. Google ScholarDigital Library
Freund, Y., & Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37, 277--296. Google ScholarDigital Library
Kivinen, J., Smola, A., & Williamson, R. (2002). Large margin classification for moving targets. Lecture Notes in Artificial Intelligence (ALT 2002) (pp. 113--127). Springer. Google ScholarDigital Library
Kivinen, J., & Warmuth, M. (2001). Relative loss bounds for multidimensional regression problems. Machine Learning, 45, 301--329. Google ScholarDigital Library
Kushner, H. J., & Yin, G. G. (1997). Stochastic approximation algorithms and applications. New York: Springer-Verlag.Google Scholar
Li, F., & Yang, Y. (2003). A loss function analysis for classification methods in text categorization. ICML 03 (pp. 472--479).Google Scholar
Polyak, B. T., & Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30, 838--855. Google ScholarDigital Library
Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. New York: Spartan.Google Scholar
Zhang, T., & Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval, 4, 5--31. Google ScholarDigital Library

Recommendations

Stochastic gradient descent for large-scale linear nonparallel SVM
WI '17: Proceedings of the International Conference on Web Intelligence

In recent years, nonparallel support vector machine (NPSVM) is proposed as a nonparallel hyperplane classifier with superior performance than standard SVM and existing nonparallel classifiers such as the twin support vector machine (TWSVM). With the ...
Read More
Constrained stochastic gradient descent for large-scale least squares problem
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

The least squares problem is one of the most important regression problems in statistics, machine learning and data mining. In this paper, we present the Constrained Stochastic Gradient Descent (CSGD) algorithm to solve the large-scale least squares ...
Read More
Spectral conjugate gradient methods with sufficient descent property for large-scale unconstrained optimization
Dedicated to Professor Michael J.D. Powell on the occasion of his 70th birthday

A class of new spectral conjugate gradient methods are proposed in this paper. First, we modify the spectral Perry's conjugate gradient method, which is the best spectral conjugate gradient algorithm SCG by Birgin and Martinez [E.G. Birgin and J.M. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '04: Proceedings of the twenty-first international conference on Machine learning
July 2004
934 pages
ISBN:1581138385
DOI:10.1145/1015330
Conference Chair:
Carla Brodley
Purdue University/Tufts University
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 July 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 581
  Total Citations
  View Citations
- 5,413
  Total Downloads
- Downloads (Last 12 months)257
- Downloads (Last 6 weeks)33
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Stochastic gradient descent for large-scale linear nonparallel SVM

Constrained stochastic gradient descent for large-scale least squares problem

Spectral conjugate gradient methods with sufficient descent property for large-scale unconstrained optimization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Stochastic gradient descent for large-scale linear nonparallel SVM

Constrained stochastic gradient descent for large-scale least squares problem

Spectral conjugate gradient methods with sufficient descent property for large-scale unconstrained optimization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media