skip to main content
10.1145/2487575.2487626acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Indexed block coordinate descent for large-scale linear classification with limited memory

Published:11 August 2013Publication History

ABSTRACT

Linear Classification has achieved complexity linear to the data size. However, in many applications, data contain large amount of samples that does not help improve the quality of model, but still cost much I/O and memory to process. In this paper, we show how a Block Coordinate Descent method based on Nearest-Neighbor Index can significantly reduce such cost when learning a dual-sparse model. In particular, we employ truncated loss function to induce a series of convex programs with superior dual sparsity, and solve each dual using Indexed Block Coordinate Descent, which makes use of Approximate Nearest Neighbor (ANN) search to select active dual variables without I/O cost on irrelevant samples. We prove that, despite the bias and weak guarantee from ANN query, the proposed algorithm has global convergence to the solution defined on entire dataset, with sublinear complexity each iteration. Experiments in both sufficient and limited memory conditions show that the proposed approach learns many times faster than other state-of-the-art solvers without sacrificing accuracy.

References

  1. J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In CVPR, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K.-W. Chang and D. Roth. Selective block minimization for faster convergence of limited memory large-scale linear models. In SIGKDD. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Chapelle, C. B. Do, Q. V. Le, A. J. Smola, and C. H. Teo. Tighter bounds for structured estimation. In NIPS, 2008.Google ScholarGoogle Scholar
  4. R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. Neural Computation, 14, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Collobert, F. Sinz, J. Weston, and L. Bottou. Trading convexity for scalability. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. S. Dhillon, P. D. Ravikumar, and A. Tewari. Nearest neighbor based greedy coordinate descent. In NIPS, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 9, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In ICML, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Jain, S. Vijayanarasimhan, and K. Grauman. Hashing hyperplane queries to near points with applications to large-scale active learning. In NIPS, 2010.Google ScholarGoogle Scholar
  10. T. Joachims. Training linear SVMs in linear time. In SIGKDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region newton method for logistic regression, 2008.Google ScholarGoogle Scholar
  12. T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In NIPS, 2004.Google ScholarGoogle Scholar
  13. Z.-Q. Luo and P. Tseng. On the convergence of coordinate descent method for convex differentiable minimization. Optim. Theory, 72, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In ICVLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Ram and A. G. Gray. Maximum inner-product search using cone trees. In KDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Shalev-Shwartz, K. Crammer, O. Dekel, and Y. Singer. Online passive-aggressive algorithms. In NIPS, 2003.Google ScholarGoogle Scholar
  17. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient SOlver for SVM. In ICML, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. K. Sriperumbudur and G. R. G. Lanckriet. On the convergence of the concave-convex procedure. In NIPS, 2009.Google ScholarGoogle Scholar
  19. I. Steinwart. Sparseness of support vector machines. JMLR, 4, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable minimization. Math. Program, 117, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Wang, H. Jia, and J. Li. Letters: Training robust support vector machine with smooth ramp loss in the primal space. Neurocomput., 71, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Wang and S. Vucetic. Fast online training of ramp loss support vector machines. In ICDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. E. Yen, N. Peng., P. Wang, and S. Lin. On convergence rate of concave-convex procedure. In NIPS, 2012.Google ScholarGoogle Scholar
  24. H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. SIGKDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. L. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Indexed block coordinate descent for large-scale linear classification with limited memory

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader