announcement

Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains

Authors:
Jinhui Tang

Nanjing University of Science and Technology, Nanjing, P.R. China

Nanjing University of Science and Technology, Nanjing, P.R. China
View Profile

,
Xiangbo Shu

Nanjing University of Science and Technology, Nanjing, P.R. China

Nanjing University of Science and Technology, Nanjing, P.R. China

0000-0003-4902-4663
View Profile

,
Zechao Li

Nanjing University of Science and Technology, Nanjing, P.R. China

Nanjing University of Science and Technology, Nanjing, P.R. China
View Profile

,
Guo-Jun Qi

University of Central Florida, Orlando, FL

University of Central Florida, Orlando, FL
View Profile

,
Jingdong Wang

Microsoft Research Asia, Beijing, P. R. China

Microsoft Research Asia, Beijing, P. R. China
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 12 Issue 4sArticle No.: 68pp 1–22https://doi.org/10.1145/2998574

Published:18 November 2016Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

In recent years, deep neural networks have been successfully applied to model visual concepts and have achieved competitive performance on many tasks. Despite their impressive performance, traditional deep networks are subjected to the decayed performance under the condition of lacking sufficient training data. This problem becomes extremely severe for deep networks trained on a very small dataset, making them overfitting by capturing nonessential or noisy information in the training set. Toward this end, we propose a novel generalized deep transfer networks (DTNs), capable of transferring label information across heterogeneous domains, textual domain to visual domain. The proposed framework has the ability to adequately mitigate the problem of insufficient training images by bringing in rich labels from the textual domain. Specifically, to share the labels between two domains, we build parameter- and representation-shared layers. They are able to generate domain-specific and shared interdomain features, making this architecture flexible and powerful in capturing complex information from different domains jointly. To evaluate the proposed method, we release a new dataset extended from NUS-WIDE at http://imag.njust.edu.cn/NUS-WIDE-128.html. Experimental results on this dataset show the superior performance of the proposed DTNs compared to existing state-of-the-art methods.

References

Jimmy Ba and Brendan Frey. 2013. Adaptive dropout for training deep neural networks. In Advances in Neural Information Processing Systems 26 (NIPS’13). Google ScholarDigital Library
Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1, 1--127. Google ScholarDigital Library
Yoshua Bengio. 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the International Conference on Machine Learning (ICML’12). 17--36.Google Scholar
Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2012. Unsupervised feature learning and deep learning: A review and new perspectives. arXiv:1206.5538v1.Google Scholar
Minmin Chen, Zhixiang Xu, Fei Sha, and Kilian Q. Weinberger. 2012. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the International Conference on Machine Learning (ICML’12).Google Scholar
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world Web image database from National University of Singapore. In Proceedings of the Conference on Image and Video Retrieval (CIVR’09). Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’09).Google ScholarCross Ref
Jeff Donahue, Judy Hoffman, Erik Rodner, Kate Saenko, and Trevor Darrell. 2013. Semi-supervised domain adaptation with instance constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’13). Google ScholarDigital Library
Lixin Duan, Dong Xu, and Ivor W. Tsang. 2012. Learning with augmented features for heterogeneous domain adaptation. In Proceedings of the International Conference on Machine Learning (ICML’12).Google Scholar
Fangxiang Feng, Ruifan Li, and Xiaojie Wang. 2015. Deep correspondence restricted Boltzmann machine for cross-modal retrieval. Neurocomputing 154, 50--60. Google ScholarDigital Library
Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 7--16. Google ScholarDigital Library
Shenghua Gao, Yuting Zhang, Kui Jia, Jiwen Lu, and Yingying Zhang. 2015. Single sample face recognition via learning deep supervised autoencoders. IEEE Transactions on Information Forensics and Security 10, 10, 2108--2118.Google ScholarDigital Library
Efstratios Gavves, Thomas Mensink, Tatiana Tommasi, Cees G. M. Snoek, and Tinne Tuytelaars. 2015. Active transfer learning with zero-shot priors: Reusing past datasets for future tasks. In Proceedings of the International Conference on Computer Vision (ICCV’15). Google ScholarDigital Library
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the International Conference on Machine Learning (ICML’11).Google Scholar
Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7, 1527--1554. Google ScholarDigital Library
Chaoqun Hong, Jun Yu, Jian Wan, Dacheng Tao, and Meng Wang. 2015. Multimodal deep autoencoder for human pose recovery. IEEE Transactions on Image Processing 24, 12, 5659--5670.Google ScholarDigital Library
Xu Jia, Efstratios Gavves, Basura Fernando, and Tinne Tuytelaars. 2015. Guiding long-short term memory for image caption generation. arXiv:1509.04942. Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 675--678. Google ScholarDigital Library
Yu-Gang Jiang, Chong-Wah Ngo, and Shih-Fu Chang. 2009. Semantic context transfer across heterogeneous sources for domain adaptive video search. In Proceedings of the 17th ACM International Conference on Multimedia (MM’09). 155--164. Google ScholarDigital Library
Alexander Kalmanovich and Gal Chechik. 2014. Gradual training of deep denoising auto encoders. arXiv:1412.6257.Google Scholar
Meina Kan, Shiguang Shan, Hong Chang, and Xilin Chen. 2014. Stacked progressive auto-encoders (SPAE) for face recognition across poses. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarDigital Library
Chetak Kandaswamy, Lynette M. Silva, Luis Alexandre, Ricardo Sousa, Jorge M. Santos, and Joaquim Marques de Sá.2014. Improving transfer learning accuracy by reusing stacked denoising autoencoders. In Proceedings of the International Conference on Systems, Man, and Cybernetics (SMC’14). 1380--1387.Google Scholar
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarCross Ref
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’12). Google ScholarDigital Library
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553, 436--444.Google Scholar
Zechao Li, Jing Liu, Jinhui Tang, and Hanqing Lu. 2015. Robust structured subspace learning for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 10, 2085--2098. Google ScholarDigital Library
Zechao Li and Jinhui Tang. 2015. Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Transactions on Multimedia 17, 11, 1989--1999.Google ScholarDigital Library
Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer joint matching for unsupervised domain adaptation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarDigital Library
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal deep learning. In Proceedings of the International Conference on Machine Learning (ICML’11).Google Scholar
Jie Ni, Qiang Qiu, and Rama Chellappa. 2013. Subspace interpolation via dictionary learning for unsupervised domain adaptation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’13). Google ScholarDigital Library
Xinyu Ou, Lingyu Yan, Hefei Ling, Cong Liu, and Maolin Liu. 2014. Inductive transfer deep hashing for image retrieval. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 969--972. Google ScholarDigital Library
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10, 1345--1359. Google ScholarDigital Library
Guo-Jun Qi, Charu Aggarwal, and Thomas Huang. 2011. Towards semantic knowledge propagation from text corpus to Web images. In Proceedings of the International Conference on World Wide Web (WWW’11). Google ScholarDigital Library
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng. 2007. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the International Conference on Machine Learning (ICML’07). Google ScholarDigital Library
Antti Rasmus, Harri Valpola, and Tapani Raiko. 2015. Lateral connections in denoising autoencoders support supervised learning. arXiv:1504.08215.Google Scholar
Suman Deb Roy, Tao Mei, Wenjun Zeng, and Shipeng Li. 2012. SocialTransfer: Cross-domain transfer learning from social streams for media applications. In Proceedings of the 20th ACM International Conference on Multimedia (MM’12). 649--658. Google ScholarDigital Library
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2014. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211--252. Google ScholarDigital Library
Ruslan Salakhutdinov and Geoffrey E. Hinton. 2009. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’09).Google Scholar
Xiangbo Shu, Guo-Jun Qi, Jinhui Tang, and Jingdong Wang. 2015. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In Proceedings of the 23rd ACM International Conference on Multimedia (MM’15). 35--44. Google ScholarDigital Library
Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. In Advances in Neural Information Processing Systems 26 (NIPS’13). Google ScholarDigital Library
Kihyuk Sohn, Wenling Shang, and Honglak Lee. 2014. Improved multimodal deep learning with variation of information. In Advances in Neural Information Processing Systems 27 (NIPS’14). Google ScholarDigital Library
Xiaonan Song, Jianguang Zhang, Yahong Han, and Jianmin Jiang. 2016. Semi-supervised feature selection via hierarchical regression for Web image classification. Multimedia Systems 22, 1, 41--49. Google ScholarDigital Library
Nitish Srivastava and Ruslan Salakhutdinov. 2012. Multimodal learning with deep Boltzmann machines. In Advances in Neural Information Processing Systems 25 (NIPS’12). Google ScholarDigital Library
Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by kNN-sparse graph-based label propagation over noisily tagged Web images. ACM Transactions on Intelligent Systems and Technology 2, 2, 14. Google ScholarDigital Library
Jinhui Tang, Lu Jin, Zechao Li, and Shenghua Gao. 2015. RGB-D object recognition via incorporating latent data structure and prior knowledge. IEEE Transactions on Multimedia 17, 11, 1899--1908.Google ScholarDigital Library
Jinhui Tang, Xiangbo Shu, Guo-Jun Qi, Zechao Li, Meng Wang, Shuicheng Yan, and Ramesh Jain. 2016. Tri-clustered tensor completion for social-aware image tag refinement. IEEE Transaction on Pattern Analysis and Machine Intelligence PP, 99, 1. DOI:http://dx.doi.org/10.1109/TPAMI.2016.2608882Google Scholar
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the International Conference on Machine Learning (ICML’08). Google ScholarDigital Library
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371--3408. Google ScholarDigital Library
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarCross Ref
Wen Wang, Zhen Cui, Hong Chang, Shiguang Shan, and Xilin Chen. 2014a. Deeply coupled auto-encoder networks for cross-view classification. arXiv:1402.2031.Google Scholar
Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014b. Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment 7, 8, 1--12. Google ScholarDigital Library
Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert. 2012. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade. Springer, 639--655.Google Scholar
Pei Xu, Mao Ye, Xue Li, Qihe Liu, Yi Yang, and Jian Ding. 2014. Dynamic background learning through deep auto-encoder networks. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 107--116. Google ScholarDigital Library
Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, and Ming-Hsuan Yang. 2015. Boosted multifeature learning for cross-domain transfer. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 3, 35. Google ScholarDigital Library
Xu Zhang, Felix Xinnan Yu, Shih-Fu Chang, and Shengjin Wang. 2015b. Deep transfer network: Unsupervised domain adaptation. arXiv:1503.00591.Google Scholar
Yi Zhang, Jinchang Ren, and Jianmin Jiang. 2015a. Combining MLC and SVM classifiers for learning based decision making: Analysis and evaluations. Computational Intelligence and Neuroscience 2015, Article No. 44. Google ScholarDigital Library
Yin Zhu, Yuqiang Chen, Zhongqi Lu, Sinno Jialin Pan, Gui-Rong Xue, Yong Yu, and Qiang Yang. 2011. Heterogeneous transfer learning for image classification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’11). Google ScholarDigital Library

Index Terms

Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains

Recommendations

Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

In recent years, deep networks have been successfully applied to model image concepts and achieved competitive performance on many data sets. In spite of impressive performance, the conventional deep networks can be subjected to the decayed performance ...
Read More
Semi-supervised robust deep neural networks for multi-label image classification
Highlights
- Large-scale data includes many noisily labeled and unlabeled examples.
- With ...
Abstract
This paper introduces a robust method for semi-supervised training of deep neural networks for multi-label image classification. To this end, a ramp loss is utilized since it is more robust against noisy and incomplete image labels ...
Read More
Semi-supervised Bi-dictionary Learning Using Smooth Representation-Based Label Propagation
CYBERC '15: Proceedings of the 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

Due to heavy clutters and occlusions of complex background, natural images contain complex features in data structure which often cause errors in image classification. In this paper, we propose semi-supervised bi-dictionary learning for image ...
Read More

Reviews

Reviewer: Rajeev Gupta

This paper is about the transfer of learning knowledge from one domain to another domain. Specifically, the paper considers knowledge transfer from textual domains to image domains. Input consists of some tagged images and their corresponding labels. The authors' goal is to "transfer the labels from the tag set to the images in the target domain for visual concept classification." The authors propose a method that combines learning across two domains. In this method, a multi-level neural network is created in which the first few ( L 1) layers are for individual domains, whereas the last few layers ( L 2) are used to transfer knowledge from the source domain to the target domain. The authors propose three methods for transferring the knowledge: (1) Representation shared: a cost term is defined as a function of corresponding differences between input values to various stages; (2) Parameter shared: a cost term is defined based on corresponding differences of state parameters, weights, and labels; and (3) Generalized scheme: a sum of the above two costs. In all of these methods a neural network is used to minimize these cost metrics. The authors compare the proposed scheme with other algorithms, such as support vector machines (SVM), stacked autoencoders (SAE) (using image representations only), heterogeneous transfer learning (HTL), translator from text to image (TTI), and so on, by using accuracy as a performance parameter. These experiments prove that the proposed algorithms can work well in cases when there is an insufficient amount of labeled training samples by utilizing the co-occurrence information between texts and images. The authors take an important problem and solve it well with proven performance. However, there are a couple of issues. First, the paper is difficult to read. It would have been better if the authors had included the example text and images (which are in the performance results section) while introducing the problem. Second, the algorithm still needs a lot of training data and it won't work well for correlated data. For example, if the training data has images of cats of a particular size, then it won't be good at predicting images of different sizes of cats. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 12, Issue 4s
Special Section on Trust Management for Multimedia Big Data and Special Section on Best Papers of ACM Multimedia 2015
November 2016
242 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2997658
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 November 2016
- Revised: 1 September 2016
- Accepted: 1 September 2016
- Received: 1 January 2016
Published in tomm Volume 12, Issue 4s

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Heterogeneous-domain knowledge propagation
cross-domain label transfer
deep transfer network
image classification
Qualifiers
- announcement
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 123
  Total Citations
  View Citations
- 614
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation

Semi-supervised robust deep neural networks for multi-label image classification

Semi-supervised Bi-dictionary Learning Using Smooth Representation-Based Label Propagation

Reviews

Access critical reviews of Computing literature here