skip to main content
10.1145/1273496.1273592acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Self-taught learning: transfer learning from unlabeled data

Published:20 June 2007Publication History

ABSTRACT

We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task. Such unlabeled data is significantly easier to obtain than in typical semi-supervised or transfer learning settings, making self-taught learning widely applicable to many practical learning problems. We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. These features form a succinct input representation and significantly improve classification performance. When using an SVM for classification, we further show how a Fisher kernel can be learned for this representation.

References

  1. Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR, 6, 1817--1853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baxter, J. (1997). Theoretical models of learning to learn. In T. Mitchell and S. Thrun (Eds.), Learning to learn. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blei, D., Ng, A. Y., & Jordan, M. (2002). Latent dirichlet allocation. NIPS.Google ScholarGoogle Scholar
  4. Caruana, R. (1997). Multitask learning. ML Journal, 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. J. Am. Soc. Info. Sci., 41, 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  6. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Stat., 32, 407--499.Google ScholarGoogle ScholarCross RefCross Ref
  7. Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. CVPR Workshop on Gen.-Model Based Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.Google ScholarGoogle ScholarCross RefCross Ref
  9. Holub, A., Welling, M., & Perona, P. (2005). Combining generative models and Fisher kernels for object class recognition. ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hoyer, P. O. (2004). Non-negative matrix factorization with sparseness constraints. JMLR, 5, 1457--1469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jaakkola, T., & Haussler, D. (1998). Exploiting generative models in discriminative classifiers. NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2007). Efficient sparse coding algorithms. NIPS.Google ScholarGoogle Scholar
  14. Ng, A. Y. (2004). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nigam, K., McCallum, A., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39, 103--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607--609.Google ScholarGoogle ScholarCross RefCross Ref
  17. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290.Google ScholarGoogle Scholar
  18. Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319--2323.Google ScholarGoogle ScholarCross RefCross Ref
  20. Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? NIPS.Google ScholarGoogle Scholar
  21. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B., 58, 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  22. Tsuda, K., Kin, T., & Asai, K. (2002). Marginalized kernels for biological sequences. Bioinformatics, 18.Google ScholarGoogle Scholar
  23. Zhang, H., Berg, A., Maire, M., & Malik, J. (2006). SVMKNN: Discriminative nearest neighbor classification for visual category recognition. CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Self-taught learning: transfer learning from unlabeled data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICML '07: Proceedings of the 24th international conference on Machine learning
        June 2007
        1233 pages
        ISBN:9781595937933
        DOI:10.1145/1273496

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate140of548submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader