skip to main content
article
Free Access

Random sampling with a reservoir

Published:01 March 1985Publication History
Skip Abstract Section

Abstract

We introduce fast algorithms for selecting a random sample of n records without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in O(n(1 + log(N/n))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.

References

  1. 1 BENTLEY, J.L. Personal communication, Apr. 1983; see {11}.Google ScholarGoogle Scholar
  2. 2 ERNVALL, J. AND NEVALAINEN, O. An algorithm for unbiased random sampling. Comput. J. 25, 1 (Jan. 1982), 45-47.Google ScholarGoogle Scholar
  3. 3 FAN, C. T., MULLER, M. E., AND REZUCHA, I. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. Am. Stat. Assoc. J. 57 (June 1962), 387-402.Google ScholarGoogle Scholar
  4. 4 FELLER, W. An Introduction to Probability Theory and Its Applications, vol. I, 3rd ed. Wiley, 1968.Google ScholarGoogle Scholar
  5. 5 FELLER, W. An Introduction to Probability Theory and Its Applications, vol. II, 2nd ed. Wiley, 1971.Google ScholarGoogle Scholar
  6. 6 JONES, T.G. A note on sampling a tape file. Commun. ACM 5, 6 (June 1962), 343. Google ScholarGoogle Scholar
  7. 7 KAWARASAKI, J., AND SIBUYA, M. Random numbers for simple random sampling without replacement. Keio Math. Sere. Rep. No. 7 (1982), 1-9.Google ScholarGoogle Scholar
  8. 8 KNUTH, D.E. The Art of Computer Programming. vol. 2: Seminumerical Algorithms, 2nd ed. Addison-Wesley, Reading, Mass., 1981. Google ScholarGoogle Scholar
  9. 9 SEDGEWICK, R. Algorithms. Addison-Wesley, Reading, Mass., 1981. Google ScholarGoogle Scholar
  10. 10 VITTER, J.S. Faster methods for random sampling. Commun. ACM 27, 7 (July 1984). 703-718. Google ScholarGoogle Scholar
  11. 11 VITTER, J.S. Optimum algorithms for two random sampling problems. In Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science (Tucson, Az., Nov. 7-9), IEEE, New York, 1983 pp. 65-75.Google ScholarGoogle Scholar

Index Terms

  1. Random sampling with a reservoir

                Recommendations

                Reviews

                Prokop Vondracek

                There is presented a fast algorithm for selecting a random sample of n records without replacement for a pool of N records, where the value of N is unknown beforehand. Algorithm Z, which does the sampling, is designed. It uses constant space and does the sampling in an expected time of O( n(1+log( N/ n))), which is optimum up to a constant factor. Algorithm Z outperforms current methods.

                Access critical reviews of Computing literature here

                Become a reviewer for Computing Reviews.

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Mathematical Software
                  ACM Transactions on Mathematical Software  Volume 11, Issue 1
                  March 1985
                  78 pages
                  ISSN:0098-3500
                  EISSN:1557-7295
                  DOI:10.1145/3147
                  Issue’s Table of Contents

                  Copyright © 1985 ACM

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 March 1985
                  Published in toms Volume 11, Issue 1

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader