article

Free Access

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

The Journal of Machine Learning Research Volume 5pp 73–99

Published:01 December 2004Publication History

The Journal of Machine Learning Research

Abstract

We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a low-dimensional "effective subspace" for X which retains the statistical relationship between X and Y. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y. We present experiments that compare the performance of the method with conventional methods.

References

Daniel Alpay. The Schur Algorithm, Reproducing Kernel Spaces and System Theory. American Mathematical Society, 2001.Google Scholar
Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 69(3):337-404, 1950.Google ScholarCross Ref
Francis R. Bach and Michael I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1-48, 2002. Google ScholarDigital Library
Francis R. Bach and Michael I. Jordan. Beyond independent components: trees and clusters. Journal of Machine Learning Research, 2003a. In press. Google ScholarDigital Library
Francis R. Bach and Michael I. Jordan. Learning graphical models with Mercer kernels. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15. MIT Press, 2003b.Google Scholar
Charles R. Baker. Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186:273-289, 1973.Google ScholarCross Ref
Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995. Google ScholarDigital Library
Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Fifth Annual ACM Workshop on Computational Learning Theory, pages 144-152. ACM Press, 1992. Google ScholarDigital Library
Leo Breiman and Jerome H. Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80:580-598, 1985.Google ScholarCross Ref
R. Dennis Cook. Regression Graphics. Wiley Inter-Science, 1998.Google Scholar
R. Dennis Cook and Hakbae Lee. Dimension reduction in regression with a binary response. Journal of the American Statistical Association, 94:1187-1200, 1999.Google ScholarCross Ref
R. Dennis Cook and S. Weisberg. Discussion of Li (1991). Journal of the American Statistical Association, 86:328-332, 1991.Google Scholar
R. Dennis Cook and Xiangrong Yin. Dimension reduction and visualization in discriminant analysis (with discussion). Australian & New Zealand Journal of Statistics, 43(2):147-199, 2001.Google ScholarCross Ref
Jerome H. Friedman and Werner Stuetzle. Projection pursuit regression. Journal of the American Statistical Association, 76:817-823, 1981.Google ScholarCross Ref
Wing Kam Fung, Xuming He, Li Liu, and Peide Shi. Dimension reduction based on canonical correlation. Statistica Sinica, 12(4):1093-1114, 2002.Google Scholar
Otis W. Gilley and R. Kelly Pace. On the Harrison and Rubingeld data. Journal of Environmental Economics Management, 31:403-405, 1996.Google ScholarCross Ref
Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182, 2003. Google ScholarDigital Library
David Harrison and Daniel L. Rubinfeld. Hedonic housing prices and the demand for clean air. Journal of Environmental Economics Management, 5:81-102, 1978.Google ScholarCross Ref
Trevor Hastie and Robert Tibshirani. Generalized additive models. Statistical Science, 1: 297-318, 1986.Google ScholarCross Ref
Inge S. Helland. On the structure of partial least squares. Communications in Statistics - Simulations and Computation, 17(2):581-607, 1988.Google Scholar
Agnar Höskuldsson. PLS regression methods. Journal of Chemometrics, 2:211-228, 1988.Google ScholarCross Ref
Marian Hristache, Anatoli Juditsky, Jörg Polzehl, and Vladimir Spokoiny. Structure adaptive approach for dimension reduction. The Annals of Statistics, 29(6):1537-1566, 2001.Google ScholarCross Ref
Ker-Chau Li. Sliced inverse regression for dimension reduction (with discussion). Journal of the American Statistical Association, 86:316-342, 1991.Google ScholarCross Ref
Ker-Chau Li. On principal Hessian directions for data visualization and dimension reduction: Another application of Stein's lemma. Journal of the American Statistical Association , 87:1025-1039, 1992.Google ScholarCross Ref
Ker-Chau Li, Heng-Hui Lue, and Chun-Houh Chen. Interactive tree-structured regression via principal Hessian directions. Journal of the American Statistical Association, 95(450): 547-560, 2000.Google ScholarCross Ref
Patrick M. Murphy and David W. Aha. UCI repository of machine learning databases. Technical report, University of California, Irvine, Department of Information and Computer Science. http://www.ics.uci.edu/~mlearn/MLRepository.html, 1994.Google Scholar
Radford M. Neal. Bayesian Learning for Neural Networks. Springer Verlag, 1996. Google ScholarDigital Library
Danh V. Nguyen and David M. Rocke. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18(1):39-50, 2002.Google Scholar
Michael Reed and Barry Simon. Functional Analysis. Academic Press, 1980.Google Scholar
Alexander M. Samarov. Exploring regression structure using nonparametric functional estimation. Journal of the American Statistical Association, 88(423):836-847, 1993.Google ScholarCross Ref
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299-1319, 1998. Google ScholarDigital Library
Kari Torkkola. Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3:1415-1438, 2003. Google ScholarDigital Library
Nikolai N. Vakhania, Vazha I. Tarieladze, and Sergei A. Chobanyan. Probability Distributions on Banach Spaces. D. Reidel Publishing Company, 1987.Google ScholarCross Ref
Vladimir N. Vapnik, Steven E. Golowich, and Alexander J. Smola. Support vector method for function approximation, regression estimation, and signal processing. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281-287. MIT Press, 1997.Google Scholar
Francesco Vivarelli and Christopher K.I. Williams. Discovering hidden features with Gaussian process regression. In Michael Kearns, Sara Solla, and David Cohn, editors, Advances in Neural Processing Systems, volume 11, pages 613-619. MIT Press, 1999. Google ScholarDigital Library
Sanford Weisberg. Dimension reduction regression in R. Journal of Statistical Software, 7 (1), 2002.Google ScholarCross Ref

Index Terms

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces
1. Computing methodologies
  1. Machine learning
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Local Fisher discriminant analysis for supervised dimensionality reduction
ICML '06: Proceedings of the 23rd international conference on Machine learning

Dimensionality reduction is one of the important preprocessing steps in high-dimensional data analysis. In this paper, we consider the supervised dimensionality reduction problem where samples are accompanied with class labels. Traditional Fisher ...
Read More
A unified framework for semi-supervised dimensionality reduction

In practice, many applications require a dimensionality reduction method to deal with the partially labeled problem. In this paper, we propose a semi-supervised dimensionality reduction framework, which can efficiently handle the unlabeled data. Under ...
Read More
Two-stage multiple kernel learning for supervised dimensionality reduction

In supervised dimensionality reduction methods for pattern recognition tasks, the information of the class labels is considered through the process of reducing the input dimensionality, to improve the classification accuracy. Using nonlinear mappings ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

The Journal of Machine Learning Research Volume 5, Issue
12/1/2004
1571 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
JMLR.org
Publication History
- Published: 1 December 2004
Published in jmlr Volume 5, Issue
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 111
  Total Citations
  View Citations
- 1,640
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

Local Fisher discriminant analysis for supervised dimensionality reduction

A unified framework for semi-supervised dimensionality reduction

Two-stage multiple kernel learning for supervised dimensionality reduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

Local Fisher discriminant analysis for supervised dimensionality reduction

A unified framework for semi-supervised dimensionality reduction

Two-stage multiple kernel learning for supervised dimensionality reduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media