skip to main content
announcement

Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains

Authors Info & Claims
Published:18 November 2016Publication History
Skip Abstract Section

Abstract

In recent years, deep neural networks have been successfully applied to model visual concepts and have achieved competitive performance on many tasks. Despite their impressive performance, traditional deep networks are subjected to the decayed performance under the condition of lacking sufficient training data. This problem becomes extremely severe for deep networks trained on a very small dataset, making them overfitting by capturing nonessential or noisy information in the training set. Toward this end, we propose a novel generalized deep transfer networks (DTNs), capable of transferring label information across heterogeneous domains, textual domain to visual domain. The proposed framework has the ability to adequately mitigate the problem of insufficient training images by bringing in rich labels from the textual domain. Specifically, to share the labels between two domains, we build parameter- and representation-shared layers. They are able to generate domain-specific and shared interdomain features, making this architecture flexible and powerful in capturing complex information from different domains jointly. To evaluate the proposed method, we release a new dataset extended from NUS-WIDE at http://imag.njust.edu.cn/NUS-WIDE-128.html. Experimental results on this dataset show the superior performance of the proposed DTNs compared to existing state-of-the-art methods.

References

  1. Jimmy Ba and Brendan Frey. 2013. Adaptive dropout for training deep neural networks. In Advances in Neural Information Processing Systems 26 (NIPS’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1, 1--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yoshua Bengio. 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the International Conference on Machine Learning (ICML’12). 17--36.Google ScholarGoogle Scholar
  4. Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2012. Unsupervised feature learning and deep learning: A review and new perspectives. arXiv:1206.5538v1.Google ScholarGoogle Scholar
  5. Minmin Chen, Zhixiang Xu, Fei Sha, and Kilian Q. Weinberger. 2012. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the International Conference on Machine Learning (ICML’12).Google ScholarGoogle Scholar
  6. Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world Web image database from National University of Singapore. In Proceedings of the Conference on Image and Video Retrieval (CIVR’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’09).Google ScholarGoogle ScholarCross RefCross Ref
  8. Jeff Donahue, Judy Hoffman, Erik Rodner, Kate Saenko, and Trevor Darrell. 2013. Semi-supervised domain adaptation with instance constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lixin Duan, Dong Xu, and Ivor W. Tsang. 2012. Learning with augmented features for heterogeneous domain adaptation. In Proceedings of the International Conference on Machine Learning (ICML’12).Google ScholarGoogle Scholar
  10. Fangxiang Feng, Ruifan Li, and Xiaojie Wang. 2015. Deep correspondence restricted Boltzmann machine for cross-modal retrieval. Neurocomputing 154, 50--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 7--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shenghua Gao, Yuting Zhang, Kui Jia, Jiwen Lu, and Yingying Zhang. 2015. Single sample face recognition via learning deep supervised autoencoders. IEEE Transactions on Information Forensics and Security 10, 10, 2108--2118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Efstratios Gavves, Thomas Mensink, Tatiana Tommasi, Cees G. M. Snoek, and Tinne Tuytelaars. 2015. Active transfer learning with zero-shot priors: Reusing past datasets for future tasks. In Proceedings of the International Conference on Computer Vision (ICCV’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the International Conference on Machine Learning (ICML’11).Google ScholarGoogle Scholar
  15. Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7, 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chaoqun Hong, Jun Yu, Jian Wan, Dacheng Tao, and Meng Wang. 2015. Multimodal deep autoencoder for human pose recovery. IEEE Transactions on Image Processing 24, 12, 5659--5670.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xu Jia, Efstratios Gavves, Basura Fernando, and Tinne Tuytelaars. 2015. Guiding long-short term memory for image caption generation. arXiv:1509.04942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yu-Gang Jiang, Chong-Wah Ngo, and Shih-Fu Chang. 2009. Semantic context transfer across heterogeneous sources for domain adaptive video search. In Proceedings of the 17th ACM International Conference on Multimedia (MM’09). 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alexander Kalmanovich and Gal Chechik. 2014. Gradual training of deep denoising auto encoders. arXiv:1412.6257.Google ScholarGoogle Scholar
  21. Meina Kan, Shiguang Shan, Hong Chang, and Xilin Chen. 2014. Stacked progressive auto-encoders (SPAE) for face recognition across poses. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chetak Kandaswamy, Lynette M. Silva, Luis Alexandre, Ricardo Sousa, Jorge M. Santos, and Joaquim Marques de Sá.2014. Improving transfer learning accuracy by reusing stacked denoising autoencoders. In Proceedings of the International Conference on Systems, Man, and Cybernetics (SMC’14). 1380--1387.Google ScholarGoogle Scholar
  23. Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarGoogle ScholarCross RefCross Ref
  24. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553, 436--444.Google ScholarGoogle Scholar
  26. Zechao Li, Jing Liu, Jinhui Tang, and Hanqing Lu. 2015. Robust structured subspace learning for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 10, 2085--2098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zechao Li and Jinhui Tang. 2015. Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Transactions on Multimedia 17, 11, 1989--1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer joint matching for unsupervised domain adaptation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal deep learning. In Proceedings of the International Conference on Machine Learning (ICML’11).Google ScholarGoogle Scholar
  30. Jie Ni, Qiang Qiu, and Rama Chellappa. 2013. Subspace interpolation via dictionary learning for unsupervised domain adaptation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xinyu Ou, Lingyu Yan, Hefei Ling, Cong Liu, and Maolin Liu. 2014. Inductive transfer deep hashing for image retrieval. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 969--972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10, 1345--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Guo-Jun Qi, Charu Aggarwal, and Thomas Huang. 2011. Towards semantic knowledge propagation from text corpus to Web images. In Proceedings of the International Conference on World Wide Web (WWW’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng. 2007. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the International Conference on Machine Learning (ICML’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Antti Rasmus, Harri Valpola, and Tapani Raiko. 2015. Lateral connections in denoising autoencoders support supervised learning. arXiv:1504.08215.Google ScholarGoogle Scholar
  36. Suman Deb Roy, Tao Mei, Wenjun Zeng, and Shipeng Li. 2012. SocialTransfer: Cross-domain transfer learning from social streams for media applications. In Proceedings of the 20th ACM International Conference on Multimedia (MM’12). 649--658. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2014. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ruslan Salakhutdinov and Geoffrey E. Hinton. 2009. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’09).Google ScholarGoogle Scholar
  39. Xiangbo Shu, Guo-Jun Qi, Jinhui Tang, and Jingdong Wang. 2015. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In Proceedings of the 23rd ACM International Conference on Multimedia (MM’15). 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. In Advances in Neural Information Processing Systems 26 (NIPS’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kihyuk Sohn, Wenling Shang, and Honglak Lee. 2014. Improved multimodal deep learning with variation of information. In Advances in Neural Information Processing Systems 27 (NIPS’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiaonan Song, Jianguang Zhang, Yahong Han, and Jianmin Jiang. 2016. Semi-supervised feature selection via hierarchical regression for Web image classification. Multimedia Systems 22, 1, 41--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Nitish Srivastava and Ruslan Salakhutdinov. 2012. Multimodal learning with deep Boltzmann machines. In Advances in Neural Information Processing Systems 25 (NIPS’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by kNN-sparse graph-based label propagation over noisily tagged Web images. ACM Transactions on Intelligent Systems and Technology 2, 2, 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jinhui Tang, Lu Jin, Zechao Li, and Shenghua Gao. 2015. RGB-D object recognition via incorporating latent data structure and prior knowledge. IEEE Transactions on Multimedia 17, 11, 1899--1908.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jinhui Tang, Xiangbo Shu, Guo-Jun Qi, Zechao Li, Meng Wang, Shuicheng Yan, and Ramesh Jain. 2016. Tri-clustered tensor completion for social-aware image tag refinement. IEEE Transaction on Pattern Analysis and Machine Intelligence PP, 99, 1. DOI:http://dx.doi.org/10.1109/TPAMI.2016.2608882Google ScholarGoogle Scholar
  47. Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the International Conference on Machine Learning (ICML’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371--3408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarGoogle ScholarCross RefCross Ref
  50. Wen Wang, Zhen Cui, Hong Chang, Shiguang Shan, and Xilin Chen. 2014a. Deeply coupled auto-encoder networks for cross-view classification. arXiv:1402.2031.Google ScholarGoogle Scholar
  51. Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014b. Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment 7, 8, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert. 2012. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade. Springer, 639--655.Google ScholarGoogle Scholar
  53. Pei Xu, Mao Ye, Xue Li, Qihe Liu, Yi Yang, and Jian Ding. 2014. Dynamic background learning through deep auto-encoder networks. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, and Ming-Hsuan Yang. 2015. Boosted multifeature learning for cross-domain transfer. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 3, 35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Xu Zhang, Felix Xinnan Yu, Shih-Fu Chang, and Shengjin Wang. 2015b. Deep transfer network: Unsupervised domain adaptation. arXiv:1503.00591.Google ScholarGoogle Scholar
  56. Yi Zhang, Jinchang Ren, and Jianmin Jiang. 2015a. Combining MLC and SVM classifiers for learning based decision making: Analysis and evaluations. Computational Intelligence and Neuroscience 2015, Article No. 44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yin Zhu, Yuqiang Chen, Zhongqi Lu, Sinno Jialin Pan, Gui-Rong Xue, Yong Yu, and Qiang Yang. 2011. Heterogeneous transfer learning for image classification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’11). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains

              Recommendations

              Reviews

              Rajeev Gupta

              This paper is about the transfer of learning knowledge from one domain to another domain. Specifically, the paper considers knowledge transfer from textual domains to image domains. Input consists of some tagged images and their corresponding labels. The authors' goal is to "transfer the labels from the tag set to the images in the target domain for visual concept classification." The authors propose a method that combines learning across two domains. In this method, a multi-level neural network is created in which the first few ( L 1) layers are for individual domains, whereas the last few layers ( L 2) are used to transfer knowledge from the source domain to the target domain. The authors propose three methods for transferring the knowledge: (1) Representation shared: a cost term is defined as a function of corresponding differences between input values to various stages; (2) Parameter shared: a cost term is defined based on corresponding differences of state parameters, weights, and labels; and (3) Generalized scheme: a sum of the above two costs. In all of these methods a neural network is used to minimize these cost metrics. The authors compare the proposed scheme with other algorithms, such as support vector machines (SVM), stacked autoencoders (SAE) (using image representations only), heterogeneous transfer learning (HTL), translator from text to image (TTI), and so on, by using accuracy as a performance parameter. These experiments prove that the proposed algorithms can work well in cases when there is an insufficient amount of labeled training samples by utilizing the co-occurrence information between texts and images. The authors take an important problem and solve it well with proven performance. However, there are a couple of issues. First, the paper is difficult to read. It would have been better if the authors had included the example text and images (which are in the performance results section) while introducing the problem. Second, the algorithm still needs a lot of training data and it won't work well for correlated data. For example, if the training data has images of cats of a particular size, then it won't be good at predicting images of different sizes of cats. Online Computing Reviews Service

              Access critical reviews of Computing literature here

              Become a reviewer for Computing Reviews.

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Multimedia Computing, Communications, and Applications
                ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 12, Issue 4s
                Special Section on Trust Management for Multimedia Big Data and Special Section on Best Papers of ACM Multimedia 2015
                November 2016
                242 pages
                ISSN:1551-6857
                EISSN:1551-6865
                DOI:10.1145/2997658
                Issue’s Table of Contents

                Copyright © 2016 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 18 November 2016
                • Revised: 1 September 2016
                • Accepted: 1 September 2016
                • Received: 1 January 2016
                Published in tomm Volume 12, Issue 4s

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • announcement
                • Research
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader