ABSTRACT
Linear Classification has achieved complexity linear to the data size. However, in many applications, data contain large amount of samples that does not help improve the quality of model, but still cost much I/O and memory to process. In this paper, we show how a Block Coordinate Descent method based on Nearest-Neighbor Index can significantly reduce such cost when learning a dual-sparse model. In particular, we employ truncated loss function to induce a series of convex programs with superior dual sparsity, and solve each dual using Indexed Block Coordinate Descent, which makes use of Approximate Nearest Neighbor (ANN) search to select active dual variables without I/O cost on irrelevant samples. We prove that, despite the bias and weak guarantee from ANN query, the proposed algorithm has global convergence to the solution defined on entire dataset, with sublinear complexity each iteration. Experiments in both sufficient and limited memory conditions show that the proposed approach learns many times faster than other state-of-the-art solvers without sacrificing accuracy.
- J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In CVPR, 1997. Google ScholarDigital Library
- K.-W. Chang and D. Roth. Selective block minimization for faster convergence of limited memory large-scale linear models. In SIGKDD. ACM, 2011. Google ScholarDigital Library
- O. Chapelle, C. B. Do, Q. V. Le, A. J. Smola, and C. H. Teo. Tighter bounds for structured estimation. In NIPS, 2008.Google Scholar
- R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. Neural Computation, 14, 2002. Google ScholarDigital Library
- R. Collobert, F. Sinz, J. Weston, and L. Bottou. Trading convexity for scalability. In ICML, 2006. Google ScholarDigital Library
- I. S. Dhillon, P. D. Ravikumar, and A. Tewari. Nearest neighbor based greedy coordinate descent. In NIPS, 2011.Google ScholarDigital Library
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 9, 2008. Google ScholarDigital Library
- C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In ICML, 2008. Google ScholarDigital Library
- P. Jain, S. Vijayanarasimhan, and K. Grauman. Hashing hyperplane queries to near points with applications to large-scale active learning. In NIPS, 2010.Google Scholar
- T. Joachims. Training linear SVMs in linear time. In SIGKDD, 2006. Google ScholarDigital Library
- C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region newton method for logistic regression, 2008.Google Scholar
- T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In NIPS, 2004.Google Scholar
- Z.-Q. Luo and P. Tseng. On the convergence of coordinate descent method for convex differentiable minimization. Optim. Theory, 72, 1992. Google ScholarDigital Library
- Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In ICVLDB, 2007. Google ScholarDigital Library
- P. Ram and A. G. Gray. Maximum inner-product search using cone trees. In KDD, 2012. Google ScholarDigital Library
- S. Shalev-Shwartz, K. Crammer, O. Dekel, and Y. Singer. Online passive-aggressive algorithms. In NIPS, 2003.Google Scholar
- S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient SOlver for SVM. In ICML, 2007. Google ScholarDigital Library
- B. K. Sriperumbudur and G. R. G. Lanckriet. On the convergence of the concave-convex procedure. In NIPS, 2009.Google Scholar
- I. Steinwart. Sparseness of support vector machines. JMLR, 4, 2003. Google ScholarDigital Library
- P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable minimization. Math. Program, 117, 2009. Google ScholarDigital Library
- L. Wang, H. Jia, and J. Li. Letters: Training robust support vector machine with smooth ramp loss in the primal space. Neurocomput., 71, 2008. Google ScholarDigital Library
- Z. Wang and S. Vucetic. Fast online training of ramp loss support vector machines. In ICDM, 2009. Google ScholarDigital Library
- I. E. Yen, N. Peng., P. Wang, and S. Lin. On convergence rate of concave-convex procedure. In NIPS, 2012.Google Scholar
- H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. SIGKDD, 2010. Google ScholarDigital Library
- A. L. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15, 2002. Google ScholarDigital Library
Index Terms
- Indexed block coordinate descent for large-scale linear classification with limited memory
Recommendations
BFGS with Update Skipping and Varying Memory
We give conditions under which limited-memory quasi-Newton methods with exact line searches will terminate in n steps when minimizing n-dimensional quadratic functions. We show that although all Broyden family methods terminate in n steps in their full-...
Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization
In this paper we analyze several new methods for solving nonconvex optimization problems with the objective function consisting of a sum of two terms: one is nonconvex and smooth, and another is convex but simple and its structure is known. Further, we ...
On the complexity analysis of randomized block-coordinate descent methods
In this paper we analyze the randomized block-coordinate descent (RBCD) methods proposed in Nesterov (SIAM J Optim 22(2):341---362, 2012), Richtárik and Takáă (Math Program 144(1---2):1---38, 2014) for minimizing the sum of a smooth convex function and ...
Comments