skip to main content
10.1145/2020408.2020608acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Serendipitous learning: learning beyond the predefined label space

Authors Info & Claims
Published:21 August 2011Publication History

ABSTRACT

Most traditional supervised learning methods are developed to learn a model from labeled examples and use this model to classify the unlabeled ones into the same label space predefined by the models. However, in many real world applications, the label spaces for both the labeled/training and unlabeled/testing examples can be different. To solve this problem, this paper proposes a novel notion of Serendipitous Learning (SL), which is defined to address the learning scenarios in which the label space can be enlarged during the testing phase. In particular, a large margin approach is proposed to solve SL. The basic idea is to leverage the knowledge in the labeled examples to help identify novel/unknown classes, and the large margin formulation is proposed to incorporate both the classification loss on the examples within the known categories, as well as the clustering loss on the examples in unknown categories. An efficient optimization algorithm based on CCCP and the bundle method is proposed to solve the optimization problem of the large margin formulation of SL. Moreover, an efficient online learning method is proposed to address the issue of large scale data in online learning scenario, which has been shown to have a guaranteed learning regret. An extensive set of experimental results on two synthetic datasets and two datasets from real world applications demonstrate the advantages of the proposed method over several other baseline algorithms. One limitation of the proposed method is that the number of unknown classes is given in advance. It may be possible to remove this constraint if we model it by using a non-parametric way. We also plan to do experiments on more real world applications in the future.

References

  1. J. Allan. Topic detection and tracking: event-based information organization. Kluwer Academic Publishers Norwel, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. P. Bennett and A. Demiriz. Semi-supervised support vector machines. In NIPS, pages 368--374, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Betteridge, A. Carlson, S. Hong, and E. Hruschka Jr. Toward Never Ending Language Learning. In AAAI, 2009.Google ScholarGoogle Scholar
  4. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Duda, P. Hart, and D. Stork. Pattern classification. Wiley-Interscience, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Hazan, A. Kalai, S. Kale, and A. Agarwal. Logarithmic regret algorithms for online convex optimization. In COLT, pages 499--513, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Joachims. Transductive inference for text classification using support vector machines. In ICML, pages 200--209, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Liu, Y. Gong, W. Xu, and S. Zhu. Document clustering with cluster refinement and model selection capabilities. In SIGIR, pages 191--198, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization : Algorithms and Complexity. Dover Publications, July 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Preston, C. E. Brodley, R. Khardon, D. Sulla-Menashe, and M. A. Friedl. Redefining class definitions using constraint-based clustering: an application to remote sensing of the earth's surface. In KDD, pages 823--832, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Scholkopf and A. Smola. Learning with kernels. MIT press Cambridge, Mass, 2002.Google ScholarGoogle Scholar
  13. A. Smola, S. Vishwanathan, and Q. Le. Bundle methods for machine learning. NIPS, 20, 2008.Google ScholarGoogle Scholar
  14. Q. Tao, D. Chu, and J. Wang. Recursive support vector machines for dimensionality reduction. IEEE Transactions on Neural Networks, 19(1):189--193, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. H. Teo, S. V. N. Vishwanathan, A. J. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. Journal of Machine Learning Research, 11:311--365, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Valizadegan and R. Jin. Generalized Maximum Margin Clustering and Unsupervised Kernel Learning. NIPS, 2006.Google ScholarGoogle Scholar
  17. A. S. Vishwanathan, A. J. Smola, and S. V. N. Vishwanathan. Kernel methods for missing variables. In AISTAT, pages 325--332, 2005.Google ScholarGoogle Scholar
  18. L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum margin clustering. NIPS, 17:1537--1544, 2005.Google ScholarGoogle Scholar
  19. L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class support vector machines. In AAAI, pages 904--910, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Yang, R. Jin, A. K. Jain, Y. Zhou, and W. Tong. Unsupervised transfer classification: application to text categorization. In KDD, pages 1159--1168, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Yang, J. Z. 0003, J. G. Carbonell, and C. Jin. Topic-conditioned novelty detection. In KDD, pages 688--693, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Zhang, I. W. Tsang, and J. T. Kwok. Maximum margin clustering made practical. IEEE Transactions on Neural Networks, 20(4):583--596, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Zhao, F. Wang, and C. Zhang. Efficient maximum margin clustering via cutting plane algorithm. SDM, pages 751--762, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  24. B. Zhao, F. Wang, and C. Zhang. Efficient multiclass maximum margin clustering. ICML, pages 751--762, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Zhu. Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Serendipitous learning: learning beyond the predefined label space

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2011
      1446 pages
      ISBN:9781450308137
      DOI:10.1145/2020408

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader