ABSTRACT
We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task. Such unlabeled data is significantly easier to obtain than in typical semi-supervised or transfer learning settings, making self-taught learning widely applicable to many practical learning problems. We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. These features form a succinct input representation and significantly improve classification performance. When using an SVM for classification, we further show how a Fisher kernel can be learned for this representation.
- Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR, 6, 1817--1853. Google ScholarDigital Library
- Baxter, J. (1997). Theoretical models of learning to learn. In T. Mitchell and S. Thrun (Eds.), Learning to learn. Google ScholarDigital Library
- Blei, D., Ng, A. Y., & Jordan, M. (2002). Latent dirichlet allocation. NIPS.Google Scholar
- Caruana, R. (1997). Multitask learning. ML Journal, 28. Google ScholarDigital Library
- Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. J. Am. Soc. Info. Sci., 41, 391--407.Google ScholarCross Ref
- Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Stat., 32, 407--499.Google ScholarCross Ref
- Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. CVPR Workshop on Gen.-Model Based Vision. Google ScholarDigital Library
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.Google ScholarCross Ref
- Holub, A., Welling, M., & Perona, P. (2005). Combining generative models and Fisher kernels for object class recognition. ICCV. Google ScholarDigital Library
- Hoyer, P. O. (2004). Non-negative matrix factorization with sparseness constraints. JMLR, 5, 1457--1469. Google ScholarDigital Library
- Jaakkola, T., & Haussler, D. (1998). Exploiting generative models in discriminative classifiers. NIPS. Google ScholarDigital Library
- Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR. Google ScholarDigital Library
- Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2007). Efficient sparse coding algorithms. NIPS.Google Scholar
- Ng, A. Y. (2004). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. ICML. Google ScholarDigital Library
- Nigam, K., McCallum, A., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39, 103--134. Google ScholarDigital Library
- Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607--609.Google ScholarCross Ref
- Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290.Google Scholar
- Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. CVPR. Google ScholarDigital Library
- Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319--2323.Google ScholarCross Ref
- Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? NIPS.Google Scholar
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B., 58, 267--288.Google ScholarCross Ref
- Tsuda, K., Kin, T., & Asai, K. (2002). Marginalized kernels for biological sequences. Bioinformatics, 18.Google Scholar
- Zhang, H., Berg, A., Maire, M., & Malik, J. (2006). SVMKNN: Discriminative nearest neighbor classification for visual category recognition. CVPR. Google ScholarDigital Library
- Self-taught learning: transfer learning from unlabeled data
Recommendations
Self-taught clustering
ICML '08: Proceedings of the 25th international conference on Machine learningThis paper focuses on a new clustering task, called self-taught clustering. Self-taught clustering is an instance of unsupervised transfer learning, which aims at clustering a small collection of target unlabeled data with the help of a large amount of ...
Supervised self-taught learning: actively transferring knowledge from unlabeled data
IJCNN'09: Proceedings of the 2009 international joint conference on Neural NetworksWe consider the task of Self-taught Learning (STL) from unlabeled data. In contrast to semi-supervised learning, which requires unlabeled data to have the same set of class labels as labeled data, STL can transfer knowledge from different types of ...
Comments