Abstract
We introduce fast algorithms for selecting a random sample of n records without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in O(n(1 + log(N/n))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.
- 1 BENTLEY, J.L. Personal communication, Apr. 1983; see {11}.Google Scholar
- 2 ERNVALL, J. AND NEVALAINEN, O. An algorithm for unbiased random sampling. Comput. J. 25, 1 (Jan. 1982), 45-47.Google Scholar
- 3 FAN, C. T., MULLER, M. E., AND REZUCHA, I. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. Am. Stat. Assoc. J. 57 (June 1962), 387-402.Google Scholar
- 4 FELLER, W. An Introduction to Probability Theory and Its Applications, vol. I, 3rd ed. Wiley, 1968.Google Scholar
- 5 FELLER, W. An Introduction to Probability Theory and Its Applications, vol. II, 2nd ed. Wiley, 1971.Google Scholar
- 6 JONES, T.G. A note on sampling a tape file. Commun. ACM 5, 6 (June 1962), 343. Google Scholar
- 7 KAWARASAKI, J., AND SIBUYA, M. Random numbers for simple random sampling without replacement. Keio Math. Sere. Rep. No. 7 (1982), 1-9.Google Scholar
- 8 KNUTH, D.E. The Art of Computer Programming. vol. 2: Seminumerical Algorithms, 2nd ed. Addison-Wesley, Reading, Mass., 1981. Google Scholar
- 9 SEDGEWICK, R. Algorithms. Addison-Wesley, Reading, Mass., 1981. Google Scholar
- 10 VITTER, J.S. Faster methods for random sampling. Commun. ACM 27, 7 (July 1984). 703-718. Google Scholar
- 11 VITTER, J.S. Optimum algorithms for two random sampling problems. In Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science (Tucson, Az., Nov. 7-9), IEEE, New York, 1983 pp. 65-75.Google Scholar
Index Terms
- Random sampling with a reservoir
Recommendations
Applications of random sampling in computational geometry, II
SCG '88: Proceedings of the fourth annual symposium on Computational geometryRandom sampling is used for several new geometric algorithms. The algorithms are “Las Vegas,” and their expected bounds are with respect to the random behavior of the algorithms. One algorithm reports all the intersecting pairs of a set of line segments ...
Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))
One-pass algorithms for sampling n records without replacement from a population of unknown size n are known as reservoir-sampling algorithms. In this article, Vitter's reservoir-sampling algorithm, algorithm Z, is modified to give a more efficient ...
Applications of random sampling in computational geometry, II
We use random sampling for several new geometric algorithms. The algorithms are "Las Vegas," and their expected bounds are with respect to the random behavior of the algorithms. These algorithms follow from new general results giving sharp bounds for ...
Comments