skip to main content
10.1145/3188745.3188796acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Tight query complexity lower bounds for PCA via finite sample deformed wigner law

Published:20 June 2018Publication History

ABSTRACT

We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ ji−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension.

Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + <pre>λ</pre> U U, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and <pre>λ</pre> > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation <pre>λ</pre>, when <pre>λ</pre> is near the “phase transition” <pre>λ</pre> = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.

Skip Supplemental Material Section

Supplemental Material

8b-4.mp4

mp4

42.2 MB

References

  1. Alekh Agarwal and Leon Bottou. 2014. A lower bound for the optimization of finite sums. arXiv preprint arXiv:1410.0723 (2014).Google ScholarGoogle Scholar
  2. Alekh Agarwal, Martin J Wainwright, Peter L Bartlett, and Pradeep K Ravikumar. 2009. Information-theoretic lower bounds on the oracle complexity of convex optimization. In Advances in Neural Information Processing Systems. 1–9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Zeyuan Allen-Zhu and Yuanzhi Li. 2016.Google ScholarGoogle Scholar
  4. First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate. arXiv preprint arXiv:1607.07837 (2016).Google ScholarGoogle Scholar
  5. Zeyuan Allen-Zhu and Yuanzhi Li. 2016. LazySVD: Even faster SVD decomposition yet without agonizing pain. In Advances in Neural Information Processing Systems. 974–982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Greg W Anderson, Alice Guionnet, and Ofer Zeitouni. 2010.Google ScholarGoogle Scholar
  7. An introduction to random matrices. Vol. 118. Cambridge university press.Google ScholarGoogle Scholar
  8. Anurag Anshu, Naresh B Goud, Rahul Jain, Srijita Kundu, and Priyanka Mukhopadhyay. 2017.Google ScholarGoogle Scholar
  9. Lifting randomized query complexity to randomized communication complexity. arXiv preprint arXiv:1703.07521 (2017).Google ScholarGoogle Scholar
  10. Ery Arias-Castro, Emmanuel J Candes, and Mark A Davenport. 2013.Google ScholarGoogle Scholar
  11. On the fundamental limits of adaptive sensing. IEEE Transactions on Information Theory 59, 1 (2013), 472–481. Tight Query Complexity Lower Bounds for PCA STOC’18, June 25–29, 2018, Los Angeles, CA, USA Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Afonso S Bandeira, Ramon van Handel, et al. 2016. Sharp nonasymptotic bounds on the norm of random matrices with independent entries. The Annals of Probability 44, 4 (2016), 2479–2506.Google ScholarGoogle ScholarCross RefCross Ref
  13. Florent Benaych-Georges and Raj Rao Nadakuditi. 2011. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227, 1 (2011), 494–521.Google ScholarGoogle ScholarCross RefCross Ref
  14. Mireille Capitaine, Catherine Donati-Martin, and Delphine Féral. 2009.Google ScholarGoogle Scholar
  15. The largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. Annals of Probability (2009), 1–47.Google ScholarGoogle Scholar
  16. Rui M Castro et al. 2014. Adaptive sensing performance lower bounds for sparse signal detection and support estimation. Bernoulli 20, 4 (2014), 2217–2246.Google ScholarGoogle ScholarCross RefCross Ref
  17. Rui M Castro and Ervin Tánczos. 2017. Adaptive Compressed Sensing for Support Recovery of Structured Sparse Sets. IEEE Transactions on Information Theory 63, 3 (2017), 1535–1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xi Chen, Adityanand Guntuboyina, and Yuchen Zhang. 2016.Google ScholarGoogle Scholar
  19. On Bayes Risk Lower Bounds. Journal of Machine Learning Research 17 (2016), 1–58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Imre Csiszár. 1972. A class of measures of informativity of observation channels. Periodica Mathematica Hungarica 2, 1-4 (1972), 191–213.Google ScholarGoogle ScholarCross RefCross Ref
  21. James W Demmel. 1997.Google ScholarGoogle Scholar
  22. Applied numerical linear algebra. SIAM.Google ScholarGoogle Scholar
  23. Delphine Féral and Sandrine Péché. 2007. The largest eigenvalue of rank one deformation of large Wigner matrices. Communications in mathematical physics 272, 1 (2007), 185–228.Google ScholarGoogle Scholar
  24. Dan Garber, Elad Hazan, Chi Jin, Sham M Kakade, Cameron Musco, Praneeth Netrapalli, and Aaron Sidford. 2016. Faster eigenvector computation via shift- and-invert preconditioning. In Proceedings of the 33nd International Conference on Machine Learning, ICML. 2626–2634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Roger A Horn and Charles R Johnson. 2012.Google ScholarGoogle Scholar
  26. Matrix analysis. Cambridge university press.Google ScholarGoogle Scholar
  27. Kevin G Jamieson, Robert Nowak, and Ben Recht. 2012.Google ScholarGoogle Scholar
  28. Query complexity of derivative-free optimization. In Advances in Neural Information Processing Systems. 2672–2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M Kakade, and Michael I Jordan. 2017. How to Escape Saddle Points Efficiently. arXiv preprint arXiv:1703.00887 (2017).Google ScholarGoogle Scholar
  30. Ian Jolliffe. 2002.Google ScholarGoogle Scholar
  31. Principal component analysis. Wiley Online Library.Google ScholarGoogle Scholar
  32. Olav Kallenberg. 2006.Google ScholarGoogle Scholar
  33. Foundations of modern probability. Springer Science &amp; Business Media.Google ScholarGoogle Scholar
  34. Cameron Musco and Christopher Musco. 2015. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In Advances in Neural Information Processing Systems. 1396–1404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jelani Nelson, Jakub Pachocki, and Zhengyu Wang. 2017.Google ScholarGoogle Scholar
  36. Optimal lower bounds for universal relation, samplers, and finding duplicates. arXiv preprint arXiv:1703.08139 (2017).Google ScholarGoogle Scholar
  37. Arkadii Nemirovskii, David Borisovich Yudin, and Edgar Ronald Dawson. 1983.Google ScholarGoogle Scholar
  38. Problem complexity and method efficiency in optimization. (1983).Google ScholarGoogle Scholar
  39. Andrew Y Ng, Michael I Jordan, et al. {n. d.}. On spectral clustering: Analysis and an algorithm.Google ScholarGoogle Scholar
  40. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999.Google ScholarGoogle Scholar
  41. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  42. Eric Price and David P Woodruff. 2013. Lower bounds for adaptive sparse recovery. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 652–663. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ohad Shamir. 2014. Fundamental limits of online and distributed algorithms for statistical learning and estimation. In Advances in Neural Information Processing Systems. 163–171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ohad Shamir. 2015. Fast stochastic algorithms for SVD and PCA: Convergence properties and convexity. arXiv preprint arXiv:1507.08788 (2015).Google ScholarGoogle Scholar
  45. Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. 2018.Google ScholarGoogle Scholar
  46. Tight Query Complexity Lower Bounds for PCA via Finite Sample Deformed Wigner Law. arXiv preprint (to appear) (2018).Google ScholarGoogle Scholar
  47. Daniel A Spielman. 2007. Spectral graph theory and its applications. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on. IEEE, 29–38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jacob Steinhardt and John C Duchi. 2015. Minimax rates for memory-bounded sparse linear regression.. In COLT. 1564–1587.Google ScholarGoogle Scholar
  49. Jacob Steinhardt, Gregory Valiant, and Stefan Wager. 2015. Memory, Communication, and Statistical Queries.. In Electronic Colloquium on Computational Complexity (ECCC), Vol. 22. 1–2.Google ScholarGoogle Scholar
  50. Roman Vershynin. {n. d.}. High Dimensional Probability. ({n. d.}).Google ScholarGoogle Scholar
  51. Blake E Woodworth and Nati Srebro. 2016. Tight complexity bounds for optimizing composite objectives. In Advances in neural information processing systems. 3639–3647. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tight query complexity lower bounds for PCA via finite sample deformed wigner law

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
      June 2018
      1332 pages
      ISBN:9781450355599
      DOI:10.1145/3188745

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 June 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,469of4,586submissions,32%

      Upcoming Conference

      STOC '24
      56th Annual ACM Symposium on Theory of Computing (STOC 2024)
      June 24 - 28, 2024
      Vancouver , BC , Canada

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader