Abstract
In recent years, deep neural networks have been successfully applied to model visual concepts and have achieved competitive performance on many tasks. Despite their impressive performance, traditional deep networks are subjected to the decayed performance under the condition of lacking sufficient training data. This problem becomes extremely severe for deep networks trained on a very small dataset, making them overfitting by capturing nonessential or noisy information in the training set. Toward this end, we propose a novel generalized deep transfer networks (DTNs), capable of transferring label information across heterogeneous domains, textual domain to visual domain. The proposed framework has the ability to adequately mitigate the problem of insufficient training images by bringing in rich labels from the textual domain. Specifically, to share the labels between two domains, we build parameter- and representation-shared layers. They are able to generate domain-specific and shared interdomain features, making this architecture flexible and powerful in capturing complex information from different domains jointly. To evaluate the proposed method, we release a new dataset extended from NUS-WIDE at http://imag.njust.edu.cn/NUS-WIDE-128.html. Experimental results on this dataset show the superior performance of the proposed DTNs compared to existing state-of-the-art methods.
- Jimmy Ba and Brendan Frey. 2013. Adaptive dropout for training deep neural networks. In Advances in Neural Information Processing Systems 26 (NIPS’13). Google ScholarDigital Library
- Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1, 1--127. Google ScholarDigital Library
- Yoshua Bengio. 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the International Conference on Machine Learning (ICML’12). 17--36.Google Scholar
- Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2012. Unsupervised feature learning and deep learning: A review and new perspectives. arXiv:1206.5538v1.Google Scholar
- Minmin Chen, Zhixiang Xu, Fei Sha, and Kilian Q. Weinberger. 2012. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the International Conference on Machine Learning (ICML’12).Google Scholar
- Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world Web image database from National University of Singapore. In Proceedings of the Conference on Image and Video Retrieval (CIVR’09). Google ScholarDigital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’09).Google ScholarCross Ref
- Jeff Donahue, Judy Hoffman, Erik Rodner, Kate Saenko, and Trevor Darrell. 2013. Semi-supervised domain adaptation with instance constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’13). Google ScholarDigital Library
- Lixin Duan, Dong Xu, and Ivor W. Tsang. 2012. Learning with augmented features for heterogeneous domain adaptation. In Proceedings of the International Conference on Machine Learning (ICML’12).Google Scholar
- Fangxiang Feng, Ruifan Li, and Xiaojie Wang. 2015. Deep correspondence restricted Boltzmann machine for cross-modal retrieval. Neurocomputing 154, 50--60. Google ScholarDigital Library
- Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 7--16. Google ScholarDigital Library
- Shenghua Gao, Yuting Zhang, Kui Jia, Jiwen Lu, and Yingying Zhang. 2015. Single sample face recognition via learning deep supervised autoencoders. IEEE Transactions on Information Forensics and Security 10, 10, 2108--2118.Google ScholarDigital Library
- Efstratios Gavves, Thomas Mensink, Tatiana Tommasi, Cees G. M. Snoek, and Tinne Tuytelaars. 2015. Active transfer learning with zero-shot priors: Reusing past datasets for future tasks. In Proceedings of the International Conference on Computer Vision (ICCV’15). Google ScholarDigital Library
- Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the International Conference on Machine Learning (ICML’11).Google Scholar
- Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7, 1527--1554. Google ScholarDigital Library
- Chaoqun Hong, Jun Yu, Jian Wan, Dacheng Tao, and Meng Wang. 2015. Multimodal deep autoencoder for human pose recovery. IEEE Transactions on Image Processing 24, 12, 5659--5670.Google ScholarDigital Library
- Xu Jia, Efstratios Gavves, Basura Fernando, and Tinne Tuytelaars. 2015. Guiding long-short term memory for image caption generation. arXiv:1509.04942. Google ScholarDigital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 675--678. Google ScholarDigital Library
- Yu-Gang Jiang, Chong-Wah Ngo, and Shih-Fu Chang. 2009. Semantic context transfer across heterogeneous sources for domain adaptive video search. In Proceedings of the 17th ACM International Conference on Multimedia (MM’09). 155--164. Google ScholarDigital Library
- Alexander Kalmanovich and Gal Chechik. 2014. Gradual training of deep denoising auto encoders. arXiv:1412.6257.Google Scholar
- Meina Kan, Shiguang Shan, Hong Chang, and Xilin Chen. 2014. Stacked progressive auto-encoders (SPAE) for face recognition across poses. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarDigital Library
- Chetak Kandaswamy, Lynette M. Silva, Luis Alexandre, Ricardo Sousa, Jorge M. Santos, and Joaquim Marques de Sá.2014. Improving transfer learning accuracy by reusing stacked denoising autoencoders. In Proceedings of the International Conference on Systems, Man, and Cybernetics (SMC’14). 1380--1387.Google Scholar
- Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’12). Google ScholarDigital Library
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553, 436--444.Google Scholar
- Zechao Li, Jing Liu, Jinhui Tang, and Hanqing Lu. 2015. Robust structured subspace learning for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 10, 2085--2098. Google ScholarDigital Library
- Zechao Li and Jinhui Tang. 2015. Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Transactions on Multimedia 17, 11, 1989--1999.Google ScholarDigital Library
- Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer joint matching for unsupervised domain adaptation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarDigital Library
- Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal deep learning. In Proceedings of the International Conference on Machine Learning (ICML’11).Google Scholar
- Jie Ni, Qiang Qiu, and Rama Chellappa. 2013. Subspace interpolation via dictionary learning for unsupervised domain adaptation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’13). Google ScholarDigital Library
- Xinyu Ou, Lingyu Yan, Hefei Ling, Cong Liu, and Maolin Liu. 2014. Inductive transfer deep hashing for image retrieval. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 969--972. Google ScholarDigital Library
- Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10, 1345--1359. Google ScholarDigital Library
- Guo-Jun Qi, Charu Aggarwal, and Thomas Huang. 2011. Towards semantic knowledge propagation from text corpus to Web images. In Proceedings of the International Conference on World Wide Web (WWW’11). Google ScholarDigital Library
- Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng. 2007. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the International Conference on Machine Learning (ICML’07). Google ScholarDigital Library
- Antti Rasmus, Harri Valpola, and Tapani Raiko. 2015. Lateral connections in denoising autoencoders support supervised learning. arXiv:1504.08215.Google Scholar
- Suman Deb Roy, Tao Mei, Wenjun Zeng, and Shipeng Li. 2012. SocialTransfer: Cross-domain transfer learning from social streams for media applications. In Proceedings of the 20th ACM International Conference on Multimedia (MM’12). 649--658. Google ScholarDigital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2014. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211--252. Google ScholarDigital Library
- Ruslan Salakhutdinov and Geoffrey E. Hinton. 2009. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’09).Google Scholar
- Xiangbo Shu, Guo-Jun Qi, Jinhui Tang, and Jingdong Wang. 2015. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In Proceedings of the 23rd ACM International Conference on Multimedia (MM’15). 35--44. Google ScholarDigital Library
- Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. In Advances in Neural Information Processing Systems 26 (NIPS’13). Google ScholarDigital Library
- Kihyuk Sohn, Wenling Shang, and Honglak Lee. 2014. Improved multimodal deep learning with variation of information. In Advances in Neural Information Processing Systems 27 (NIPS’14). Google ScholarDigital Library
- Xiaonan Song, Jianguang Zhang, Yahong Han, and Jianmin Jiang. 2016. Semi-supervised feature selection via hierarchical regression for Web image classification. Multimedia Systems 22, 1, 41--49. Google ScholarDigital Library
- Nitish Srivastava and Ruslan Salakhutdinov. 2012. Multimodal learning with deep Boltzmann machines. In Advances in Neural Information Processing Systems 25 (NIPS’12). Google ScholarDigital Library
- Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by kNN-sparse graph-based label propagation over noisily tagged Web images. ACM Transactions on Intelligent Systems and Technology 2, 2, 14. Google ScholarDigital Library
- Jinhui Tang, Lu Jin, Zechao Li, and Shenghua Gao. 2015. RGB-D object recognition via incorporating latent data structure and prior knowledge. IEEE Transactions on Multimedia 17, 11, 1899--1908.Google ScholarDigital Library
- Jinhui Tang, Xiangbo Shu, Guo-Jun Qi, Zechao Li, Meng Wang, Shuicheng Yan, and Ramesh Jain. 2016. Tri-clustered tensor completion for social-aware image tag refinement. IEEE Transaction on Pattern Analysis and Machine Intelligence PP, 99, 1. DOI:http://dx.doi.org/10.1109/TPAMI.2016.2608882Google Scholar
- Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the International Conference on Machine Learning (ICML’08). Google ScholarDigital Library
- Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371--3408. Google ScholarDigital Library
- Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarCross Ref
- Wen Wang, Zhen Cui, Hong Chang, Shiguang Shan, and Xilin Chen. 2014a. Deeply coupled auto-encoder networks for cross-view classification. arXiv:1402.2031.Google Scholar
- Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014b. Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment 7, 8, 1--12. Google ScholarDigital Library
- Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert. 2012. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade. Springer, 639--655.Google Scholar
- Pei Xu, Mao Ye, Xue Li, Qihe Liu, Yi Yang, and Jian Ding. 2014. Dynamic background learning through deep auto-encoder networks. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 107--116. Google ScholarDigital Library
- Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, and Ming-Hsuan Yang. 2015. Boosted multifeature learning for cross-domain transfer. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 3, 35. Google ScholarDigital Library
- Xu Zhang, Felix Xinnan Yu, Shih-Fu Chang, and Shengjin Wang. 2015b. Deep transfer network: Unsupervised domain adaptation. arXiv:1503.00591.Google Scholar
- Yi Zhang, Jinchang Ren, and Jianmin Jiang. 2015a. Combining MLC and SVM classifiers for learning based decision making: Analysis and evaluations. Computational Intelligence and Neuroscience 2015, Article No. 44. Google ScholarDigital Library
- Yin Zhu, Yuqiang Chen, Zhongqi Lu, Sinno Jialin Pan, Gui-Rong Xue, Yong Yu, and Qiang Yang. 2011. Heterogeneous transfer learning for image classification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’11). Google ScholarDigital Library
Index Terms
- Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains
Recommendations
Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation
MM '15: Proceedings of the 23rd ACM international conference on MultimediaIn recent years, deep networks have been successfully applied to model image concepts and achieved competitive performance on many data sets. In spite of impressive performance, the conventional deep networks can be subjected to the decayed performance ...
Semi-supervised robust deep neural networks for multi-label image classification
Highlights- Large-scale data includes many noisily labeled and unlabeled examples.
- With ...
AbstractThis paper introduces a robust method for semi-supervised training of deep neural networks for multi-label image classification. To this end, a ramp loss is utilized since it is more robust against noisy and incomplete image labels ...
Semi-supervised Bi-dictionary Learning Using Smooth Representation-Based Label Propagation
CYBERC '15: Proceedings of the 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge DiscoveryDue to heavy clutters and occlusions of complex background, natural images contain complex features in data structure which often cause errors in image classification. In this paper, we propose semi-supervised bi-dictionary learning for image ...
Comments