The theory of inference from simple random samples (SRSs) is fundamental in statistics; many statistical techniques and formulae assume that the data are an SRS. True random samples are rare; in practice, people tend to draw samples by using pseudo-random number generators (PRNGs) and algorithms that map a set of pseudo-random numbers into a subset of the population. Most statisticians take for granted that the software they use "does the right thing," producing samples that can be treated as if they are SRSs. In fact, the PRNG and the algorithm for drawing samples matter enormously. We show, using basic counting principles, that some widely used methods cannot generate all SRSs of a given size, and those that can do not always do so with equal frequencies in simulations. We compare the "randomness" and computational efficiency of commonly-used PRNGs to PRNGs based on cryptographic hash functions, which avoid these pitfalls. We judge these PRNGs by their ability to generate SRSs and find in simulations that their relative merits varies by seed, population and sample size, and sampling algorithm. These results are not just limited to SRSs but have implications for all resampling methods, including the bootstrap, MCMC, and Monte Carlo integration. [if field=url_to_page empty=false]Visit website[/if]
- Slides
- Start date: 2017-02-07 11:00:00
- End date: 2017-02-07 12:30:00
- Venue: 639 Evans Hall at UC Berkeley
- Address: 639 Evans Hall, Berkeley, CA, 94720