Article

Free Access

A training algorithm for optimal margin classifiers

Authors:
Bernhard E. Boser

EECS Department, University of California, Berkeley, CA

EECS Department, University of California, Berkeley, CA
View Profile

,
Isabelle M. Guyon

AT&T Bell Laboratories, 50 Fremont Street, 6th Floor, San Francisco, CA

AT&T Bell Laboratories, 50 Fremont Street, 6th Floor, San Francisco, CA
View Profile

,
Vladimir N. Vapnik

AT&T Bell Laboratories, Crawford Corner Road, Holmdel, NJ

AT&T Bell Laboratories, Crawford Corner Road, Holmdel, NJ
View Profile

COLT '92: Proceedings of the fifth annual workshop on Computational learning theoryJuly 1992Pages 144–152https://doi.org/10.1145/130385.130401

Published:01 July 1992Publication History

COLT '92: Proceedings of the fifth annual workshop on Computational learning theory

Pages 144–152

ABSTRACT

A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

References

ABR64.M.A. Aizerman, E.M. Braverman, and L.I. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821-837, 1964.Google Scholar
BH89.E.B. Baum and D. Haussler. What size net gives valid generalization? Neural Computation, 1(1):151-160, 1989. Google ScholarDigital Library
BL88.D.S. Broomhead and D. Lowe. Multivariate functional interpolation and adaptive networks. Complex Systems, 2:321 - 355, 1988.Google Scholar
CBD+90.Yann Le Cun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Larry D. Jackel. Handwritten digit recognition with a back-propagation network. In David S. Touretzky, editor, Neural Information Processing Systems, volume 2, pages 396-404. Morgan Kaufmann Publishers, San Mateo, CA, 1990. Google ScholarDigital Library
CH53.R. Courant and D. Hilbert. Methods of mathematical physics. Interscience, New York, 1953.Google Scholar
DH73.R.O. Duda and P.E. Hart. Pattern Classification And Scene Analysis. Wiley and Son, 1973.Google Scholar
GBD92.S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4 (1):1 - 58, 1992. Google ScholarDigital Library
GPP+89.I. Guyon, I. Poujaud, L. Personnaz, G. Dreyfus, J. Denker, and Y. LeCun. Comparing different neural network architectures for classifying handwritten digits. In Proc. Int. Joint Conf. Neural Networks. Int. Joint Conference on Neural Networks, 1989.Google ScholarCross Ref
GVB+92.Isabelle Guyon, Vladimir Vapnik, Bernhard Boser, Leon Bottou, and Sara Solla. Structural risk minimization for character recognition. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992. To appear.Google Scholar
HLW88.David Haussler, Nick Littlestone, and Manfred Warmuth. Predicting 0,1-functions on randomly drawn points. In Proceedings of the 29th Annual Symposium on the Foundations of Computer Science, pages 100-109. IEEE, 1988.Google Scholar
KM87.W. Krauth and M. Mezard. Learning algorithms with optimal stability in neural networks. J. Phys. A: Math. gen., 20:L745, 1987.Google Scholar
Loo72.F.A. Lootsma, editor. Numerical Methods for Non-linear Optimization. Academic Press, London, 1972.Google Scholar
Lue84.David Luenberger. Linear and Nonlinear Programming. Addison-Wesley, 1984.Google Scholar
Mac92.D. MacKay. A practical bayesian framework for backprop networks. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992. To appear.Google Scholar
MD89.J. Moody and C. Darken. Fast learning in networks of locally tuned processing units. Neural Computation, i (2):281 - 294, 1989.Google Scholar
MGB+92.N. Matte, I. Guyon, L. Bottou, J. Denker, and V. Vapnik. Computer-aided cleaning of large databases for character recognition. In Digest ICPR. ICPR, Amsterdam, August 1992.Google Scholar
Moo92.J. Moody. Generalization, weight decay, and architecture selection for nonlinear learning systems. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufrnann Publishers, San Mateo, CA, 1992. To appear.Google Scholar
NY83.A.S. Nemirovsky and D. Do Yudin. Problem Complexzty and Method Efficiency in Optimization. Wiley, New York, 1983.Google Scholar
Omo91.S.M. Omohundro. Bumptrees for efficient function, constraint and classification learning. In R.P. Lippmann and et al., editors, NIPS-90, San Mateo CA, 1991. IEEE, Morgan Kaufmann. Google ScholarDigital Library
PG90.T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to nmltilayer networks. Science, 247:978 - 982, February 1990.Google Scholar
Pog75.T. Poggio. On optimal nonlinear associative recall. Biol. Cybernetics, Vol. 19:201-209, 1975.Google ScholarCross Ref
Ros62.F. Rosenblatt. Princzples of neurodynamics. Spartan Books, New York, 1962.Google Scholar
SLD92.P. Simard, Y. LeCun, and J. Denker. Tangent prop--a formalism for specifying selected invariances in an adaptive network. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992. To appear.Google Scholar
TLS89.N. Tishby, E. Levin, and S. A. Solla. Consistent inference of probabilities in layered networks: Predictions and generalization. In Proceedings of the International Joint Conference on Neural Networks, Washington DC, 1989.Google ScholarCross Ref
Vap82.Vladimir Vapnik. Estimation of Dependences Based on Empirical Data. Springer Verlag, New York, 1982. Google ScholarDigital Library
VC74.V.N. Vapnik and A.Ya. Chervonenkis. The theory of pattern recognition. Nauka, Moscow, 1974.Google Scholar

Index Terms

A training algorithm for optimal margin classifiers
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Convex large margin training techniques: unsupervised, semi-supervised, and robust support vector machines
Read More
Bootstrapped training of event extraction classifiers
EACL '12: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Most event extraction systems are trained with supervised learning and rely on a collection of annotated documents. Due to the domain-specificity of this task, event extraction systems must be retrained with new annotated data for each domain. In this ...
Read More
Maximum margin partial label learning

Partial label learning aims to learn from training examples each associated with a set of candidate labels, among which only one label is valid for the training example. The basic strategy to learn from partial label examples is disambiguation, i.e. by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
COLT '92: Proceedings of the fifth annual workshop on Computational learning theory
July 1992
452 pages
ISBN:089791497X
DOI:10.1145/130385
Chairman:
David Haussler
Copyright © 1992 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 1992
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate35of71submissions,49%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6,246
  Total Citations
  View Citations
- 12,815
  Total Downloads
- Downloads (Last 12 months)4,348
- Downloads (Last 6 weeks)591
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A training algorithm for optimal margin classifiers

COLT '92: Proceedings of the fifth annual workshop on Computational learning theory

ABSTRACT

References

Cited By

Index Terms

Recommendations

Convex large margin training techniques: unsupervised, semi-supervised, and robust support vector machines

Bootstrapped training of event extraction classifiers

Maximum margin partial label learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A training algorithm for optimal margin classifiers

COLT '92: Proceedings of the fifth annual workshop on Computational learning theory

ABSTRACT

References

Cited By

Index Terms

Recommendations

Convex large margin training techniques: unsupervised, semi-supervised, and robust support vector machines

Bootstrapped training of event extraction classifiers

Maximum margin partial label learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media