Abstract
The notion of embedding a class of dichotomies in a class of linear half spaces is central to the support vector machines paradigm. We examine the question of determining the minimal Euclidean dimension and the maximal margin that can be obtained when the embedded class has a finite VC dimension. We show that an overwhelming majority of the family of finite concept classes of any constant VC dimension cannot be embedded in low-dimensional half spaces. (In fact, we show that the Euclidean dimension must be almost as high as the size of the instance space.) We strengthen this result even further by showing that an overwhelming majority of the family of finite concept classes of any constant VC dimension cannot be embedded in half spaces (of arbitrarily high Euclidean dimension) with a large margin. (In fact, the margin cannot be substantially larger than the margin achieved by the trivial embedding.) Furthermore, these bounds are robust in the sense that allowing each image half space to err on a small fraction of the instances does not imply a significant weakening of these dimension and margin bounds. Our results indicate that any universal learning machine, which transforms data into the Euclidean space and then applies linear (or large margin) classification, cannot enjoy any meaningful generalization guarantees that are based on either VC dimension or margins considerations. This failure of generalization bounds applies even to classes for which "straight forward" empirical risk minimization does enjoy meaningful generalization guarantees.
- N. Alon, P. Frankl, and V. Rödl. Geometrical realization of set systems and probabilistic communication complexity. In Proceedings of the 26th Symposium on Foundations of Computer Science, pages 277-280. 1985.]]Google Scholar
- Rosa I. Arriaga and Santosh Vempala. An algorithmic theory of learning: Robust concepts and random projection. In Proceedings of the 40'th Annual Symposium on the Foundations of Computer Science, pages 616-623, 1999.]] Google Scholar
- Shai Ben-David. A priori generalization bounds for kernel based learning: Can sparsity save the day? Technical Report, Cornell University, 2001.]]Google Scholar
- Béla Bollobás. Extremal Graph Theory. Academic Press, 1978.]]Google Scholar
- Jürgen Forster. A linear bound on the unbounded error probabilistic communication complexity. In Proceedings of the 16th Annual Conference on Computational Complexity, pages 100-106, 2001.]] Google Scholar
- Jürgen Forster, Niels Schmitt, and Hans Ulrich Simon. Estimating the optimal margins of embeddings in euclidean half spaces. In Proceedings of the 14th Annual Workshop on Computational Learning Theory, pages 402-415. Springer Verlag, 2001.]] Google Scholar
- Yoav Freund and Robert Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277-296, 1999.]] Google Scholar
- Wolfgang Maass and György Turán. Lower bound methods and separation results for on-line learning models. Machine Learning, 9:107-145, 1992.]] Google Scholar
- Llew Mason, Peter L. Bartlett, and Jonathan Baxter. Improved generalization through explicit optimization of margins. Machine Learning, 38(3):243-255, 2000.]] Google Scholar
- Marvin L. Minsky and Seymour A. Papert. Perceptrons. The MIT Press, Cambrigde MA, third edition, 1988.]]Google Scholar
- A. B. J. Novikoff. On convergence proofs for perceptrons. In Proceedings of the Symposium of Mathematical Theory of Automata, pages 615-622, 1962.]]Google Scholar
- Ramamohan Paturi and Janos Simon. Probabilistic communication complexity. Journal of Computer and System Sciences, 33(1):106-123, 1986.]] Google Scholar
- F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psych. Rev., 65:386-407, 1958.]]Google Scholar
- F. Rosenblatt. Principles and Neurodynamics: Perceptrons and the Theory of Brain Mechanisms . Spartan Books, Washington, D.C., 1962.]]Google Scholar
- Vladimir Vapnik. Statistical Learning Theory. Wiley Series on Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons, 1998.]]Google Scholar
- K. Zarankiewicz. Problem P 101. Colloq. Math., 2:301, 1951.]]Google Scholar
Index Terms
- Limitations of learning via embeddings in euclidean half spaces
Recommendations
Limitations of Learning via Embeddings in Euclidean Half-Spaces
COLT '01/EuroCOLT '01: Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning TheoryThis paper considers the embeddability of general concept classes in Euclidean half spaces. By embedding in half spaces we refer to a mapping from some concept class to half spaces so that the labeling given to points in the instance space is retained. ...
Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces
Concept classes can canonically be represented by matrices with entries 1 and ý1. We use the singular value decomposition of this matrix to determine the optimal margins of embeddings of the concept classes of singletons and of half intervals in ...
Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces
COLT '01/EuroCOLT '01: Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning TheoryConcept classes can canonically be represented by matrices with entries 1 and -1. We use the singular value decomposition of this matrix to determine the optimal margins of embeddings of the concept classes of singletons and of half intervals in ...
Comments