skip to main content
10.1145/130385.130401acmconferencesArticle/Chapter ViewAbstractPublication PagescoltConference Proceedingsconference-collections
Article
Free Access

A training algorithm for optimal margin classifiers

Authors Info & Claims
Published:01 July 1992Publication History

ABSTRACT

A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

References

  1. ABR64.M.A. Aizerman, E.M. Braverman, and L.I. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821-837, 1964.Google ScholarGoogle Scholar
  2. BH89.E.B. Baum and D. Haussler. What size net gives valid generalization? Neural Computation, 1(1):151-160, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. BL88.D.S. Broomhead and D. Lowe. Multivariate functional interpolation and adaptive networks. Complex Systems, 2:321 - 355, 1988.Google ScholarGoogle Scholar
  4. CBD+90.Yann Le Cun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Larry D. Jackel. Handwritten digit recognition with a back-propagation network. In David S. Touretzky, editor, Neural Information Processing Systems, volume 2, pages 396-404. Morgan Kaufmann Publishers, San Mateo, CA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CH53.R. Courant and D. Hilbert. Methods of mathematical physics. Interscience, New York, 1953.Google ScholarGoogle Scholar
  6. DH73.R.O. Duda and P.E. Hart. Pattern Classification And Scene Analysis. Wiley and Son, 1973.Google ScholarGoogle Scholar
  7. GBD92.S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4 (1):1 - 58, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. GPP+89.I. Guyon, I. Poujaud, L. Personnaz, G. Dreyfus, J. Denker, and Y. LeCun. Comparing different neural network architectures for classifying handwritten digits. In Proc. Int. Joint Conf. Neural Networks. Int. Joint Conference on Neural Networks, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  9. GVB+92.Isabelle Guyon, Vladimir Vapnik, Bernhard Boser, Leon Bottou, and Sara Solla. Structural risk minimization for character recognition. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992. To appear.Google ScholarGoogle Scholar
  10. HLW88.David Haussler, Nick Littlestone, and Manfred Warmuth. Predicting 0,1-functions on randomly drawn points. In Proceedings of the 29th Annual Symposium on the Foundations of Computer Science, pages 100-109. IEEE, 1988.Google ScholarGoogle Scholar
  11. KM87.W. Krauth and M. Mezard. Learning algorithms with optimal stability in neural networks. J. Phys. A: Math. gen., 20:L745, 1987.Google ScholarGoogle Scholar
  12. Loo72.F.A. Lootsma, editor. Numerical Methods for Non-linear Optimization. Academic Press, London, 1972.Google ScholarGoogle Scholar
  13. Lue84.David Luenberger. Linear and Nonlinear Programming. Addison-Wesley, 1984.Google ScholarGoogle Scholar
  14. Mac92.D. MacKay. A practical bayesian framework for backprop networks. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992. To appear.Google ScholarGoogle Scholar
  15. MD89.J. Moody and C. Darken. Fast learning in networks of locally tuned processing units. Neural Computation, i (2):281 - 294, 1989.Google ScholarGoogle Scholar
  16. MGB+92.N. Matte, I. Guyon, L. Bottou, J. Denker, and V. Vapnik. Computer-aided cleaning of large databases for character recognition. In Digest ICPR. ICPR, Amsterdam, August 1992.Google ScholarGoogle Scholar
  17. Moo92.J. Moody. Generalization, weight decay, and architecture selection for nonlinear learning systems. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufrnann Publishers, San Mateo, CA, 1992. To appear.Google ScholarGoogle Scholar
  18. NY83.A.S. Nemirovsky and D. Do Yudin. Problem Complexzty and Method Efficiency in Optimization. Wiley, New York, 1983.Google ScholarGoogle Scholar
  19. Omo91.S.M. Omohundro. Bumptrees for efficient function, constraint and classification learning. In R.P. Lippmann and et al., editors, NIPS-90, San Mateo CA, 1991. IEEE, Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. PG90.T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to nmltilayer networks. Science, 247:978 - 982, February 1990.Google ScholarGoogle Scholar
  21. Pog75.T. Poggio. On optimal nonlinear associative recall. Biol. Cybernetics, Vol. 19:201-209, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ros62.F. Rosenblatt. Princzples of neurodynamics. Spartan Books, New York, 1962.Google ScholarGoogle Scholar
  23. SLD92.P. Simard, Y. LeCun, and J. Denker. Tangent prop--a formalism for specifying selected invariances in an adaptive network. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992. To appear.Google ScholarGoogle Scholar
  24. TLS89.N. Tishby, E. Levin, and S. A. Solla. Consistent inference of probabilities in layered networks: Predictions and generalization. In Proceedings of the International Joint Conference on Neural Networks, Washington DC, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  25. Vap82.Vladimir Vapnik. Estimation of Dependences Based on Empirical Data. Springer Verlag, New York, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. VC74.V.N. Vapnik and A.Ya. Chervonenkis. The theory of pattern recognition. Nauka, Moscow, 1974.Google ScholarGoogle Scholar

Index Terms

  1. A training algorithm for optimal margin classifiers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        COLT '92: Proceedings of the fifth annual workshop on Computational learning theory
        July 1992
        452 pages
        ISBN:089791497X
        DOI:10.1145/130385

        Copyright © 1992 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 July 1992

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate35of71submissions,49%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader