skip to main content
article
Free Access

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

Published:01 December 2004Publication History
Skip Abstract Section

Abstract

We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a low-dimensional "effective subspace" for X which retains the statistical relationship between X and Y. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y. We present experiments that compare the performance of the method with conventional methods.

References

  1. Daniel Alpay. The Schur Algorithm, Reproducing Kernel Spaces and System Theory. American Mathematical Society, 2001.Google ScholarGoogle Scholar
  2. Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 69(3):337-404, 1950.Google ScholarGoogle ScholarCross RefCross Ref
  3. Francis R. Bach and Michael I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1-48, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Francis R. Bach and Michael I. Jordan. Beyond independent components: trees and clusters. Journal of Machine Learning Research, 2003a. In press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Francis R. Bach and Michael I. Jordan. Learning graphical models with Mercer kernels. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15. MIT Press, 2003b.Google ScholarGoogle Scholar
  6. Charles R. Baker. Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186:273-289, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  7. Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Fifth Annual ACM Workshop on Computational Learning Theory, pages 144-152. ACM Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Leo Breiman and Jerome H. Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80:580-598, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Dennis Cook. Regression Graphics. Wiley Inter-Science, 1998.Google ScholarGoogle Scholar
  11. R. Dennis Cook and Hakbae Lee. Dimension reduction in regression with a binary response. Journal of the American Statistical Association, 94:1187-1200, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  12. R. Dennis Cook and S. Weisberg. Discussion of Li (1991). Journal of the American Statistical Association, 86:328-332, 1991.Google ScholarGoogle Scholar
  13. R. Dennis Cook and Xiangrong Yin. Dimension reduction and visualization in discriminant analysis (with discussion). Australian & New Zealand Journal of Statistics, 43(2):147-199, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jerome H. Friedman and Werner Stuetzle. Projection pursuit regression. Journal of the American Statistical Association, 76:817-823, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  15. Wing Kam Fung, Xuming He, Li Liu, and Peide Shi. Dimension reduction based on canonical correlation. Statistica Sinica, 12(4):1093-1114, 2002.Google ScholarGoogle Scholar
  16. Otis W. Gilley and R. Kelly Pace. On the Harrison and Rubingeld data. Journal of Environmental Economics Management, 31:403-405, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  17. Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. David Harrison and Daniel L. Rubinfeld. Hedonic housing prices and the demand for clean air. Journal of Environmental Economics Management, 5:81-102, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  19. Trevor Hastie and Robert Tibshirani. Generalized additive models. Statistical Science, 1: 297-318, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  20. Inge S. Helland. On the structure of partial least squares. Communications in Statistics - Simulations and Computation, 17(2):581-607, 1988.Google ScholarGoogle Scholar
  21. Agnar Höskuldsson. PLS regression methods. Journal of Chemometrics, 2:211-228, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  22. Marian Hristache, Anatoli Juditsky, Jörg Polzehl, and Vladimir Spokoiny. Structure adaptive approach for dimension reduction. The Annals of Statistics, 29(6):1537-1566, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ker-Chau Li. Sliced inverse regression for dimension reduction (with discussion). Journal of the American Statistical Association, 86:316-342, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  24. Ker-Chau Li. On principal Hessian directions for data visualization and dimension reduction: Another application of Stein's lemma. Journal of the American Statistical Association , 87:1025-1039, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ker-Chau Li, Heng-Hui Lue, and Chun-Houh Chen. Interactive tree-structured regression via principal Hessian directions. Journal of the American Statistical Association, 95(450): 547-560, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  26. Patrick M. Murphy and David W. Aha. UCI repository of machine learning databases. Technical report, University of California, Irvine, Department of Information and Computer Science. http://www.ics.uci.edu/~mlearn/MLRepository.html, 1994.Google ScholarGoogle Scholar
  27. Radford M. Neal. Bayesian Learning for Neural Networks. Springer Verlag, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Danh V. Nguyen and David M. Rocke. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18(1):39-50, 2002.Google ScholarGoogle Scholar
  29. Michael Reed and Barry Simon. Functional Analysis. Academic Press, 1980.Google ScholarGoogle Scholar
  30. Alexander M. Samarov. Exploring regression structure using nonparametric functional estimation. Journal of the American Statistical Association, 88(423):836-847, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  31. Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299-1319, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kari Torkkola. Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3:1415-1438, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nikolai N. Vakhania, Vazha I. Tarieladze, and Sergei A. Chobanyan. Probability Distributions on Banach Spaces. D. Reidel Publishing Company, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  34. Vladimir N. Vapnik, Steven E. Golowich, and Alexander J. Smola. Support vector method for function approximation, regression estimation, and signal processing. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281-287. MIT Press, 1997.Google ScholarGoogle Scholar
  35. Francesco Vivarelli and Christopher K.I. Williams. Discovering hidden features with Gaussian process regression. In Michael Kearns, Sara Solla, and David Cohn, editors, Advances in Neural Processing Systems, volume 11, pages 613-619. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sanford Weisberg. Dimension reduction regression in R. Journal of Statistical Software, 7 (1), 2002.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image The Journal of Machine Learning Research
        The Journal of Machine Learning Research  Volume 5, Issue
        12/1/2004
        1571 pages
        ISSN:1532-4435
        EISSN:1533-7928
        Issue’s Table of Contents

        Publisher

        JMLR.org

        Publication History

        • Published: 1 December 2004
        Published in jmlr Volume 5, Issue

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader