- Ahn, K.J., Guha, S., McGregor, A. Analyzing graph structure via linear measurements. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, (2012). Google ScholarDigital Library
- Bhagat, S., Burke, M., Diuk, C., Filiz, I.O., Edunov, S. Three-and-a-half degrees of separation. Facebook Research, 2016; https://research.fb.com/three-and-a-half-degrees-of-separation/.Google Scholar
- Bloom, B. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (July 1970), 422--426. Google ScholarDigital Library
- Broder, M., Mitzenmacher, A. Network applications of Bloom filters: a survey. Internet Mathematics 1, 4 (2005), 485--509.Google Scholar
- Clarkson, K.L., Woodruff, D.P. Low rank approximation and regression in input sparsity time. In Proceedings of the ACM Symposium on Theory of Computing, (2013), 81--90. Google ScholarDigital Library
- Cormode, G., Korn, F., Muthukrishnan, S., Johnson, T., Spatscheck, O., Srivastava, D. 2004. Holistic UDAFs at streaming speeds. In Proceedings of the ACM SIGMOD International Conference on Management of Data, (2004), 35--46. Google ScholarDigital Library
- Cormode, G., Muthukrishnan, S. An improved data stream summary: the Count-Min sketch and its applications. J. Algorithms 55, 1 (2005), 58--75. Google ScholarDigital Library
- Flajolet, P., Martin, G.N. 1985. Probabilistic counting. In Proceedings of the IEEE Conference on Foundations of Computer Science, 1985, 76--82. Also in J. Computer and System Sciences 31, 182--209. Google ScholarDigital Library
- Guha, S., Mishra, N., Motwani, R., O'Callaghan, L. Clustering data streams. In Proceedings of the IEEE Conference on Foundations of Computer Science, 2000. Google ScholarDigital Library
- Heule, S., Nunkesser, M., Hall, A. HyperLogLog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm. In Proceedings of the International Conference on Extending Database Technology, 2013. Google ScholarDigital Library
- Jermaine, C. Sampling techniques for massive data. Synopses for massive data: samples, histograms, wavelets and sketches. Foundations and Trends in Databases 4, 1--3 (2012). G. Cormode, M. Garofalakis, P. Haas, and C. Jermaine, Eds. NOW Publishers. Google ScholarDigital Library
- Morris, R. Counting large numbers of events in small registers. Commun. ACM 21, 10 (Oct. 1977), 840--842. Google ScholarDigital Library
- Pike, R., Dorward, S., Griesemer, R., Quinlan, S. Interpreting the data: Parallel analysis with Sawzall. Dynamic Grids and Worldwide Computing 13, 4 (2005), 277--298. Google ScholarDigital Library
- Weinberger, K.Q., Dasgupta, A., Langford, J., Smola, A.J., Attenberg, J. Feature hashing for large-scale multitask learning. In Proceedings of the International Conference on Machine Learning, 2009. Google ScholarDigital Library
- Whang, K.Y., Vander-Zanden, B.T., Taylor, H.M. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Systems 15, 2 (1990, 208. Google ScholarDigital Library
- Woodruff, D. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science 10, 1--2 (2014), 1--157. Google ScholarDigital Library
Index Terms
- Data sketching
Comments