research-article

Relaxed softmax for PU learning

Authors:
Ugo Tanielian

UPMC, Paris, France

UPMC, Paris, France
View Profile

,
Flavian Vasile

Criteo Research, Paris, France

Criteo Research, Paris, France
View Profile

RecSys '19: Proceedings of the 13th ACM Conference on Recommender SystemsSeptember 2019Pages 119–127https://doi.org/10.1145/3298689.3347034

Published:10 September 2019Publication History

RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems

Pages 119–127

ABSTRACT

In recent years, the softmax model and its fast approximations have become the de-facto loss functions for deep neural networks when dealing with multi-class prediction. This loss has been extended to language modeling and recommendation, two fields that fall into the framework of learning from Positive and Unlabeled data.

In this paper, we stress the different drawbacks of the current family of softmax losses and sampling schemes when applied in a Positive and Unlabeled learning setup. We propose both a Relaxed Softmax loss (RS) and a new negative sampling scheme based on Boltzmann formulation. We show that the new training objective is better suited for the tasks of density estimation, item similarity and next-event prediction by driving uplifts in performance on textual and recommendation datasets against classical softmax.

References

Philip Bachman, Alessandro Sordoni, and Adam Trischler. 2017. Learning algorithms for active learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 301--310. Google ScholarDigital Library
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research 3, Feb (2003), 1137--1155. Google Scholar
Yoshua Bengio and Jean-Sébastien Senécal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Transactions on Neural Networks 19, 4 (2008), 713--722. Google ScholarDigital Library
Yoshua Bengio, Jean-Sébastien Senécal, et al. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling.. In AISTATS. 1--9.Google Scholar
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022. Google ScholarDigital Library
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. CoRR abs/1607.04606 (2016). arXiv:1607.04606 http://arxiv.org/abs/1607.04606Google Scholar
Long Chen, Fajie Yuan, Joemon M Jose, and Weinan Zhang. 2018. Improving Negative Sampling for Word Representation using Self-embedded Features. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 99--107. Google ScholarDigital Library
Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 213--220. Google ScholarDigital Library
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855--864. Google ScholarDigital Library
M. Gutmann and A. Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. JMLR WCP, Vol. 9. Journal of Machine Learning Research - Proceedings Track, 297--304.Google Scholar
Tzu-Kuo Huang, Alekh Agarwal, Daniel J Hsu, John Langford, and Robert E Schapire. 2015. Efficient and parsimonious agnostic active learning. In Advances in Neural Information Processing Systems. 2755--2763. Google ScholarDigital Library
Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2014. On Using Very Large Target Vocabulary for Neural Machine Translation. CoRR abs/1412.2007 (2014). arXiv:1412.2007 http://arxiv.org/abs/1412.2007Google Scholar
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. CoRR abs/1607.01759 (2016). arXiv:1607.01759Google Scholar
Christoph Käding, Alexander Freytag, Erik Rodner, Andrea Perino, and Joachim Denzler. 2016. Large-scale active learning with approximations of expected model output changes. In German Conference on Pattern Recognition. Springer, 179--191.Google ScholarCross Ref
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188--1196. Google ScholarDigital Library
Hai Le Son, Alexandre Allauzen, and François Yvon. 2012. Measuring the influence of long range dependencies with neural network language models. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT. Association for Computational Linguistics, 1--10. Google ScholarDigital Library
Wee Sun Lee and Bing Liu. 2003. Learning with positive and unlabeled examples using weighted logistic regression. In ICML, Vol. 3. 448--455. Google ScholarDigital Library
Xiaoli Li and Bing Liu. 2003. Learning to classify texts using positive and unlabeled data. In IJCAI, Vol. 3. 587--592. Google ScholarDigital Library
Xiao-Li Li and Bing Liu. 2005. Learning from positive and unlabeled examples with different data distributions. In European Conference on Machine Learning. Springer, 218--229. Google ScholarDigital Library
Dawen Liang, Laurent Charlin, James McInerney, and David M Blei. 2016. Modeling user exposure in recommendation. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 951--961. Google ScholarDigital Library
Bing Liu, Wee Sun Lee, Philip S Yu, and Xiaoli Li. 2002. Partially supervised classification of text documents. In ICML, Vol. 2. Citeseer, 387--394. Google ScholarDigital Library
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119. Google ScholarDigital Library
Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. (2012).Google Scholar
Andriy Mnih and Yee Whye Teh. 2012. A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426 (2012). Google ScholarDigital Library
Annamalai Narayanan, Mahinthan Chandramohan, Lihui Chen, Yang Liu, and Santhoshkumar Saminathan. 2016. subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. arXiv preprint arXiv:1606.08928 (2016).Google Scholar
Ulrich Paquet and Noam Koenigstein. 2013. One-class collaborative filtering with random graphs. In Proceedings of the 22nd international conference on World Wide Web. ACM, 999--1008. Google ScholarDigital Library
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http://www.aclweb.org/anthology/D14-1162Google Scholar
Yafeng Ren, Donghong Ji, and Hongbin Zhang. 2014. Positive Unlabeled Learning for Deceptive Reviews Detection. In EMNLP. 488--498.Google Scholar
N Roy and A McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. Int. Conf. on Machine Learning. Google ScholarDigital Library
Noam Shazeer, Ryan Doherty, Colin Evans, and Chris Waterson. 2016. Swivel: Improving embeddings by noticing what's missing. arXiv preprint arXiv:1602.02215 (2016).Google Scholar
Simon Tong and Daphne Koller. 2001. Support vector machine active learning with applications to text classification. Journal of machine learning research 2, Nov (2001), 45--66. Google ScholarDigital Library
Trieu H Trinh, Andrew M Dai, Minh-Thang Luong, and Quoc V Le. 2018. Learning longer-term dependencies in rnns with auxiliary losses. arXiv preprint arXiv:1803.00144 (2018).Google Scholar
Flavian Vasile, Elena Smirnova, and Alexis Conneau. 2016. Meta-prod2vec: Product embeddings using side-information for recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 225--232. Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008. Google ScholarDigital Library
Yazhou Yang and Marco Loog. 2018. A benchmark and comparison of active learning for logistic regression. Pattern Recognition 83 (2018), 401--415.Google ScholarDigital Library
Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W Cohen. 2018. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model. In International Conference on Learning Representations (ICLR).Google Scholar
Kai Yu, Jinbo Bi, and Volker Tresp. 2006. Active learning via transductive experimental design. In Proceedings of the 23rd international conference on Machine learning. ACM, 1081--1088. Google ScholarDigital Library

Index Terms

Relaxed softmax for PU learning
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

Reinforced PU-learning with Hybrid Negative Sampling Strategies for Recommendation
The data of recommendation systems typically only contain the purchased item as positive data and other un-purchased items as unlabeled data. To train a good recommendation model, in addition to the known positive information, we also need high-quality ...
Read More
Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data ...
Read More
DCFGAN: An adversarial deep reinforcement learning framework with improved negative sampling for session-based recommender systems
Abstract
In recent years, with the development of Internet technology, recommender systems have been widely used by virtue of their ability to meet the personalized needs of users. In order to make full use of users’ interactive behaviors, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems
September 2019
635 pages
ISBN:9781450362436
DOI:10.1145/3298689
General Chairs:
Toine Bogers
Aalborg University Copenhagen, Denmark
,
Alan Said
University of Gothenburg, Sweden
,
Program Chairs:
Peter Brusilovsky
University of Pittsburgh
,
Domonkos Tikk
Gravity R&D, Hungary
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 September 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
negative sampling
positive-unlabeled learning
Qualifiers
- research-article
Conference

Acceptance Rates
RecSys '19 Paper Acceptance Rate36of189submissions,19%Overall Acceptance Rate254of1,295submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 829
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Relaxed softmax for PU learning

RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reinforced PU-learning with Hybrid Negative Sampling Strategies for Recommendation

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

DCFGAN: An adversarial deep reinforcement learning framework with improved negative sampling for session-based recommender systems