ABSTRACT
We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension.
Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + <pre>λ</pre> U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and <pre>λ</pre> > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation <pre>λ</pre>, when <pre>λ</pre> is near the “phase transition” <pre>λ</pre> = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.
Supplemental Material
- Alekh Agarwal and Leon Bottou. 2014. A lower bound for the optimization of finite sums. arXiv preprint arXiv:1410.0723 (2014).Google Scholar
- Alekh Agarwal, Martin J Wainwright, Peter L Bartlett, and Pradeep K Ravikumar. 2009. Information-theoretic lower bounds on the oracle complexity of convex optimization. In Advances in Neural Information Processing Systems. 1–9. Google ScholarDigital Library
- Zeyuan Allen-Zhu and Yuanzhi Li. 2016.Google Scholar
- First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate. arXiv preprint arXiv:1607.07837 (2016).Google Scholar
- Zeyuan Allen-Zhu and Yuanzhi Li. 2016. LazySVD: Even faster SVD decomposition yet without agonizing pain. In Advances in Neural Information Processing Systems. 974–982. Google ScholarDigital Library
- Greg W Anderson, Alice Guionnet, and Ofer Zeitouni. 2010.Google Scholar
- An introduction to random matrices. Vol. 118. Cambridge university press.Google Scholar
- Anurag Anshu, Naresh B Goud, Rahul Jain, Srijita Kundu, and Priyanka Mukhopadhyay. 2017.Google Scholar
- Lifting randomized query complexity to randomized communication complexity. arXiv preprint arXiv:1703.07521 (2017).Google Scholar
- Ery Arias-Castro, Emmanuel J Candes, and Mark A Davenport. 2013.Google Scholar
- On the fundamental limits of adaptive sensing. IEEE Transactions on Information Theory 59, 1 (2013), 472–481. Tight Query Complexity Lower Bounds for PCA STOC’18, June 25–29, 2018, Los Angeles, CA, USA Google ScholarDigital Library
- Afonso S Bandeira, Ramon van Handel, et al. 2016. Sharp nonasymptotic bounds on the norm of random matrices with independent entries. The Annals of Probability 44, 4 (2016), 2479–2506.Google ScholarCross Ref
- Florent Benaych-Georges and Raj Rao Nadakuditi. 2011. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227, 1 (2011), 494–521.Google ScholarCross Ref
- Mireille Capitaine, Catherine Donati-Martin, and Delphine Féral. 2009.Google Scholar
- The largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. Annals of Probability (2009), 1–47.Google Scholar
- Rui M Castro et al. 2014. Adaptive sensing performance lower bounds for sparse signal detection and support estimation. Bernoulli 20, 4 (2014), 2217–2246.Google ScholarCross Ref
- Rui M Castro and Ervin Tánczos. 2017. Adaptive Compressed Sensing for Support Recovery of Structured Sparse Sets. IEEE Transactions on Information Theory 63, 3 (2017), 1535–1554. Google ScholarDigital Library
- Xi Chen, Adityanand Guntuboyina, and Yuchen Zhang. 2016.Google Scholar
- On Bayes Risk Lower Bounds. Journal of Machine Learning Research 17 (2016), 1–58. Google ScholarDigital Library
- Imre Csiszár. 1972. A class of measures of informativity of observation channels. Periodica Mathematica Hungarica 2, 1-4 (1972), 191–213.Google ScholarCross Ref
- James W Demmel. 1997.Google Scholar
- Applied numerical linear algebra. SIAM.Google Scholar
- Delphine Féral and Sandrine Péché. 2007. The largest eigenvalue of rank one deformation of large Wigner matrices. Communications in mathematical physics 272, 1 (2007), 185–228.Google Scholar
- Dan Garber, Elad Hazan, Chi Jin, Sham M Kakade, Cameron Musco, Praneeth Netrapalli, and Aaron Sidford. 2016. Faster eigenvector computation via shift- and-invert preconditioning. In Proceedings of the 33nd International Conference on Machine Learning, ICML. 2626–2634. Google ScholarDigital Library
- Roger A Horn and Charles R Johnson. 2012.Google Scholar
- Matrix analysis. Cambridge university press.Google Scholar
- Kevin G Jamieson, Robert Nowak, and Ben Recht. 2012.Google Scholar
- Query complexity of derivative-free optimization. In Advances in Neural Information Processing Systems. 2672–2680. Google ScholarDigital Library
- Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M Kakade, and Michael I Jordan. 2017. How to Escape Saddle Points Efficiently. arXiv preprint arXiv:1703.00887 (2017).Google Scholar
- Ian Jolliffe. 2002.Google Scholar
- Principal component analysis. Wiley Online Library.Google Scholar
- Olav Kallenberg. 2006.Google Scholar
- Foundations of modern probability. Springer Science & Business Media.Google Scholar
- Cameron Musco and Christopher Musco. 2015. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In Advances in Neural Information Processing Systems. 1396–1404. Google ScholarDigital Library
- Jelani Nelson, Jakub Pachocki, and Zhengyu Wang. 2017.Google Scholar
- Optimal lower bounds for universal relation, samplers, and finding duplicates. arXiv preprint arXiv:1703.08139 (2017).Google Scholar
- Arkadii Nemirovskii, David Borisovich Yudin, and Edgar Ronald Dawson. 1983.Google Scholar
- Problem complexity and method efficiency in optimization. (1983).Google Scholar
- Andrew Y Ng, Michael I Jordan, et al. {n. d.}. On spectral clustering: Analysis and an algorithm.Google Scholar
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999.Google Scholar
- The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google Scholar
- Eric Price and David P Woodruff. 2013. Lower bounds for adaptive sparse recovery. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 652–663. Google ScholarDigital Library
- Ohad Shamir. 2014. Fundamental limits of online and distributed algorithms for statistical learning and estimation. In Advances in Neural Information Processing Systems. 163–171. Google ScholarDigital Library
- Ohad Shamir. 2015. Fast stochastic algorithms for SVD and PCA: Convergence properties and convexity. arXiv preprint arXiv:1507.08788 (2015).Google Scholar
- Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. 2018.Google Scholar
- Tight Query Complexity Lower Bounds for PCA via Finite Sample Deformed Wigner Law. arXiv preprint (to appear) (2018).Google Scholar
- Daniel A Spielman. 2007. Spectral graph theory and its applications. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on. IEEE, 29–38. Google ScholarDigital Library
- Jacob Steinhardt and John C Duchi. 2015. Minimax rates for memory-bounded sparse linear regression.. In COLT. 1564–1587.Google Scholar
- Jacob Steinhardt, Gregory Valiant, and Stefan Wager. 2015. Memory, Communication, and Statistical Queries.. In Electronic Colloquium on Computational Complexity (ECCC), Vol. 22. 1–2.Google Scholar
- Roman Vershynin. {n. d.}. High Dimensional Probability. ({n. d.}).Google Scholar
- Blake E Woodworth and Nati Srebro. 2016. Tight complexity bounds for optimizing composite objectives. In Advances in neural information processing systems. 3639–3647. Google ScholarDigital Library
Index Terms
- Tight query complexity lower bounds for PCA via finite sample deformed wigner law
Recommendations
Sampling lower bounds via information theory
STOC '03: Proceedings of the thirty-fifth annual ACM symposium on Theory of computingWe present a novel technique, based on the Jensen-Shannon divergence from information theory, to prove lower bounds on the query complexity of sampling algorithms that approximate functions over arbitrary domain and range. Unlike previous methods, our ...
Group-theoretic lower bounds for the complexity of matrix multiplication
TAMC'11: Proceedings of the 8th annual conference on Theory and applications of models of computationThe complexity of multiplication in group algebras is closely related to the complexity of matrix multiplication. Inspired by the recent group-theoretic approach by Cohn and Umans [10] and the algorithms by Cohn et al. [9] for matrix multiplication, we ...
Tight bounds on one- and two-pass MapReduce algorithms for matrix multiplication
BeyondMR '16: Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and BeyondWe study one- and two-pass mapReduce algorithms for multiplying two matrices. First, consider one-pass algorithms. In the literature, there is a tight bound for the tradeoff between communication cost and parallelism. It measures communication cost ...
Comments