fbpx
Wikipedia

Simple random sample

In statistics, a simple random sample (or SRS) is a subset of individuals (a sample) chosen from a larger set (a population) in which a subset of individuals are chosen randomly, all with the same probability. It is a process of selecting a sample in a random way. In SRS, each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals.[1] A simple random sample is an unbiased sampling technique. Simple random sampling is a basic type of sampling and can be a component of other more complex sampling methods.

Introduction

The principle of simple random sampling is that every set of items has the same probability of being chosen. For example, suppose N college students want to get a ticket for a basketball game, but there are only X < N tickets for them, so they decide to have a fair way to see who gets to go. Then, everybody is given a number in the range from 0 to N-1, and random numbers are generated, either electronically or from a table of random numbers. Numbers outside the range from 0 to N-1 are ignored, as are any numbers previously selected. The first X numbers would identify the lucky ticket winners.

In small populations and often in large ones, such sampling is typically done "without replacement", i.e., one deliberately avoids choosing any member of the population more than once. Although simple random sampling can be conducted with replacement instead, this is less common and would normally be described more fully as simple random sampling with replacement. Sampling done without replacement is no longer independent, but still satisfies exchangeability, hence many results still hold. Further, for a small sample from a large population, sampling without replacement is approximately the same as sampling with replacement, since the probability of choosing the same individual twice is low.

An unbiased random selection of individuals is important so that if many samples were drawn, the average sample would accurately represent the population. However, this does not guarantee that a particular sample is a perfect representation of the population. Simple random sampling merely allows one to draw externally valid conclusions about the entire population based on the sample.

Conceptually, simple random sampling is the simplest of the probability sampling techniques. It requires a complete sampling frame, which may not be available or feasible to construct for large populations. Even if a complete frame is available, more efficient approaches may be possible if other useful information is available about the units in the population.

Advantages are that it is free of classification error, and it requires minimum advance knowledge of the population other than the frame. Its simplicity also makes it relatively easy to interpret data collected in this manner. For these reasons, simple random sampling best suits situations where not much information is available about the population and data collection can be efficiently conducted on randomly distributed items, or where the cost of sampling is small enough to make efficiency less important than simplicity. If these conditions do not hold, stratified sampling or cluster sampling may be a better choice.

Relationship between simple random sample and other methods

Equal probability sampling (epsem)

A sampling method for which each individual unit has the same chance of being selected is called equal probability sampling (epsem for short).

Using a simple random sample will always lead to an epsem, but not all epsem samples are SRS. For example, if a teacher has a class arranged in 5 rows of 6 columns and she wants to take a random sample of 5 students she might pick one of the 6 columns at random. This would be an epsem sample but not all subsets of 5 pupils are equally likely here, as only the subsets that are arranged as a single column are eligible for selection. There are also ways of constructing multistage sampling, that are not srs, while the final sample will be epsem.[2] For example, systematic random sampling produces a sample for which each individual unit has the same probability of inclusion, but different sets of units have different probabilities of being selected.

Samples that are epsem are self weighting, meaning that the inverse of selection probability for each sample is equal.

Distinction between a systematic random sample and a simple random sample

Consider a school with 1000 students, and suppose that a researcher wants to select 100 of them for further study. All their names might be put in a bucket and then 100 names might be pulled out. Not only does each person have an equal chance of being selected, we can also easily calculate the probability (P) of a given person being chosen, since we know the sample size (n) and the population (N):

1. In the case that any given person can only be selected once (i.e., after selection a person is removed from the selection pool):

 

2. In the case that any selected person is returned to the selection pool (i.e., can be picked more than once):

 

This means that every student in the school has in any case approximately a 1 in 10 chance of being selected using this method. Further, any combination of 100 students has the same probability of selection.

If a systematic pattern is introduced into random sampling, it is referred to as "systematic (random) sampling". An example would be if the students in the school had numbers attached to their names ranging from 0001 to 1000, and we chose a random starting point, e.g. 0533, and then picked every 10th name thereafter to give us our sample of 100 (starting over with 0003 after reaching 0993). In this sense, this technique is similar to cluster sampling, since the choice of the first unit will determine the remainder. This is no longer simple random sampling, because some combinations of 100 students have a larger selection probability than others – for instance, {3, 13, 23, ..., 993} has a 1/10 chance of selection, while {1, 2, 3, ..., 100} cannot be selected under this method.

Sampling a dichotomous population

If the members of the population come in three kinds, say "blue" "red" and "black", the number of red elements in a sample of given size will vary by sample and hence is a random variable whose distribution can be studied. That distribution depends on the numbers of red and black elements in the full population. For a simple random sample with replacement, the distribution is a binomial distribution. For a simple random sample without replacement, one obtains a hypergeometric distribution.

Algorithms

Several efficient algorithms for simple random sampling have been developed.[3][4] A naive algorithm is the draw-by-draw algorithm where at each step we remove the item at that step from the set with equal probability and put the item in the sample. We continue until we have sample of desired size  . The drawback of this method is that it requires random access in the set.

The selection-rejection algorithm developed by Fan et al. in 1962[5] requires a single pass over data; however, it is a sequential algorithm and requires knowledge of total count of items  , which is not available in streaming scenarios.

A very simple random sort algorithm was proved by Sunter in 1977.[6] The algorithm simply assigns a random number drawn from uniform distribution   as a key to each item, then sorts all items using the key and selects the smallest   items.

J. Vitter in 1985[7] proposed reservoir sampling algorithms, which are widely used. This algorithm does not require knowledge of the size of the population   in advance, and uses constant space.

Random sampling can also be accelerated by sampling from the distribution of gaps between samples[8] and skipping over the gaps.

See also

References

  1. ^ Yates, Daniel S.; David S. Moore; Daren S. Starnes (2008). The Practice of Statistics, 3rd Ed. Freeman. ISBN 978-0-7167-7309-2.
  2. ^ Peters, Tim J., and Jenny I. Eachus. "Achieving equal probability of selection under various random sampling strategies." Paediatric and perinatal epidemiology 9.2 (1995): 219-224.
  3. ^ Tille, Yves; Tillé, Yves (2006-01-01). Sampling Algorithms - Springer. Springer Series in Statistics. doi:10.1007/0-387-34240-0. ISBN 978-0-387-30814-2.
  4. ^ Meng, Xiangrui (2013). "Scalable Simple Random Sampling and Stratified Sampling" (PDF). Proceedings of the 30th International Conference on Machine Learning (ICML-13): 531–539.
  5. ^ Fan, C. T.; Muller, Mervin E.; Rezucha, Ivan (1962-06-01). "Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital Computers". Journal of the American Statistical Association. 57 (298): 387–402. doi:10.1080/01621459.1962.10480667. ISSN 0162-1459.
  6. ^ Sunter, A. B. (1977-01-01). "List Sequential Sampling with Equal or Unequal Probabilities without Replacement". Applied Statistics. 26 (3): 261–268. doi:10.2307/2346966. JSTOR 2346966.
  7. ^ Vitter, Jeffrey S. (1985-03-01). "Random Sampling with a Reservoir". ACM Trans. Math. Softw. 11 (1): 37–57. CiteSeerX 10.1.1.138.784. doi:10.1145/3147.3165. ISSN 0098-3500.
  8. ^ Vitter, Jeffrey S. (1984-07-01). "Faster methods for random sampling". Communications of the ACM. 27 (7): 703–718. CiteSeerX 10.1.1.329.6400. doi:10.1145/358105.893. ISSN 0001-0782.

the

External links

  •   Media related to Random sampling at Wikimedia Commons

simple, random, sample, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, jstor, nov. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Simple random sample news newspapers books scholar JSTOR November 2011 Learn how and when to remove this template message In statistics a simple random sample or SRS is a subset of individuals a sample chosen from a larger set a population in which a subset of individuals are chosen randomly all with the same probability It is a process of selecting a sample in a random way In SRS each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals 1 A simple random sample is an unbiased sampling technique Simple random sampling is a basic type of sampling and can be a component of other more complex sampling methods Contents 1 Introduction 2 Relationship between simple random sample and other methods 2 1 Equal probability sampling epsem 2 2 Distinction between a systematic random sample and a simple random sample 3 Sampling a dichotomous population 4 Algorithms 5 See also 6 References 7 External linksIntroduction EditThe principle of simple random sampling is that every set of items has the same probability of being chosen For example suppose N college students want to get a ticket for a basketball game but there are only X lt N tickets for them so they decide to have a fair way to see who gets to go Then everybody is given a number in the range from 0 to N 1 and random numbers are generated either electronically or from a table of random numbers Numbers outside the range from 0 to N 1 are ignored as are any numbers previously selected The first X numbers would identify the lucky ticket winners In small populations and often in large ones such sampling is typically done without replacement i e one deliberately avoids choosing any member of the population more than once Although simple random sampling can be conducted with replacement instead this is less common and would normally be described more fully as simple random sampling with replacement Sampling done without replacement is no longer independent but still satisfies exchangeability hence many results still hold Further for a small sample from a large population sampling without replacement is approximately the same as sampling with replacement since the probability of choosing the same individual twice is low An unbiased random selection of individuals is important so that if many samples were drawn the average sample would accurately represent the population However this does not guarantee that a particular sample is a perfect representation of the population Simple random sampling merely allows one to draw externally valid conclusions about the entire population based on the sample Conceptually simple random sampling is the simplest of the probability sampling techniques It requires a complete sampling frame which may not be available or feasible to construct for large populations Even if a complete frame is available more efficient approaches may be possible if other useful information is available about the units in the population Advantages are that it is free of classification error and it requires minimum advance knowledge of the population other than the frame Its simplicity also makes it relatively easy to interpret data collected in this manner For these reasons simple random sampling best suits situations where not much information is available about the population and data collection can be efficiently conducted on randomly distributed items or where the cost of sampling is small enough to make efficiency less important than simplicity If these conditions do not hold stratified sampling or cluster sampling may be a better choice Relationship between simple random sample and other methods EditEqual probability sampling epsem Edit A sampling method for which each individual unit has the same chance of being selected is called equal probability sampling epsem for short Using a simple random sample will always lead to an epsem but not all epsem samples are SRS For example if a teacher has a class arranged in 5 rows of 6 columns and she wants to take a random sample of 5 students she might pick one of the 6 columns at random This would be an epsem sample but not all subsets of 5 pupils are equally likely here as only the subsets that are arranged as a single column are eligible for selection There are also ways of constructing multistage sampling that are not srs while the final sample will be epsem 2 For example systematic random sampling produces a sample for which each individual unit has the same probability of inclusion but different sets of units have different probabilities of being selected Samples that are epsem are self weighting meaning that the inverse of selection probability for each sample is equal Distinction between a systematic random sample and a simple random sample Edit Consider a school with 1000 students and suppose that a researcher wants to select 100 of them for further study All their names might be put in a bucket and then 100 names might be pulled out Not only does each person have an equal chance of being selected we can also easily calculate the probability P of a given person being chosen since we know the sample size n and the population N 1 In the case that any given person can only be selected once i e after selection a person is removed from the selection pool P 1 N 1 N N 2 N 1 N n N n 1 Canceling 1 N n N n N 100 1000 10 displaystyle begin aligned P amp 1 frac N 1 N cdot frac N 2 N 1 cdot cdots cdot frac N n N n 1 8pt amp stackrel text Canceling 1 frac N n N 8pt amp frac n N 8pt amp frac 100 1000 8pt amp 10 end aligned 2 In the case that any selected person is returned to the selection pool i e can be picked more than once P 1 1 1 N n 1 999 1000 100 0 0952 9 5 displaystyle P 1 left 1 frac 1 N right n 1 left frac 999 1000 right 100 0 0952 dots approx 9 5 This means that every student in the school has in any case approximately a 1 in 10 chance of being selected using this method Further any combination of 100 students has the same probability of selection If a systematic pattern is introduced into random sampling it is referred to as systematic random sampling An example would be if the students in the school had numbers attached to their names ranging from 0001 to 1000 and we chose a random starting point e g 0533 and then picked every 10th name thereafter to give us our sample of 100 starting over with 0003 after reaching 0993 In this sense this technique is similar to cluster sampling since the choice of the first unit will determine the remainder This is no longer simple random sampling because some combinations of 100 students have a larger selection probability than others for instance 3 13 23 993 has a 1 10 chance of selection while 1 2 3 100 cannot be selected under this method Sampling a dichotomous population EditIf the members of the population come in three kinds say blue red and black the number of red elements in a sample of given size will vary by sample and hence is a random variable whose distribution can be studied That distribution depends on the numbers of red and black elements in the full population For a simple random sample with replacement the distribution is a binomial distribution For a simple random sample without replacement one obtains a hypergeometric distribution Algorithms EditSeveral efficient algorithms for simple random sampling have been developed 3 4 A naive algorithm is the draw by draw algorithm where at each step we remove the item at that step from the set with equal probability and put the item in the sample We continue until we have sample of desired size k displaystyle k The drawback of this method is that it requires random access in the set The selection rejection algorithm developed by Fan et al in 1962 5 requires a single pass over data however it is a sequential algorithm and requires knowledge of total count of items n displaystyle n which is not available in streaming scenarios A very simple random sort algorithm was proved by Sunter in 1977 6 The algorithm simply assigns a random number drawn from uniform distribution 0 1 displaystyle 0 1 as a key to each item then sorts all items using the key and selects the smallest k displaystyle k items J Vitter in 1985 7 proposed reservoir sampling algorithms which are widely used This algorithm does not require knowledge of the size of the population n displaystyle n in advance and uses constant space Random sampling can also be accelerated by sampling from the distribution of gaps between samples 8 and skipping over the gaps See also EditMultistage sampling Nonprobability sampling Opinion poll Quantitative marketing research Sampling design Bernoulli sampling Poisson samplingReferences Edit Yates Daniel S David S Moore Daren S Starnes 2008 The Practice of Statistics 3rd Ed Freeman ISBN 978 0 7167 7309 2 Peters Tim J and Jenny I Eachus Achieving equal probability of selection under various random sampling strategies Paediatric and perinatal epidemiology 9 2 1995 219 224 Tille Yves Tille Yves 2006 01 01 Sampling Algorithms Springer Springer Series in Statistics doi 10 1007 0 387 34240 0 ISBN 978 0 387 30814 2 Meng Xiangrui 2013 Scalable Simple Random Sampling and Stratified Sampling PDF Proceedings of the 30th International Conference on Machine Learning ICML 13 531 539 Fan C T Muller Mervin E Rezucha Ivan 1962 06 01 Development of Sampling Plans by Using Sequential Item by Item Selection Techniques and Digital Computers Journal of the American Statistical Association 57 298 387 402 doi 10 1080 01621459 1962 10480667 ISSN 0162 1459 Sunter A B 1977 01 01 List Sequential Sampling with Equal or Unequal Probabilities without Replacement Applied Statistics 26 3 261 268 doi 10 2307 2346966 JSTOR 2346966 Vitter Jeffrey S 1985 03 01 Random Sampling with a Reservoir ACM Trans Math Softw 11 1 37 57 CiteSeerX 10 1 1 138 784 doi 10 1145 3147 3165 ISSN 0098 3500 Vitter Jeffrey S 1984 07 01 Faster methods for random sampling Communications of the ACM 27 7 703 718 CiteSeerX 10 1 1 329 6400 doi 10 1145 358105 893 ISSN 0001 0782 theExternal links Edit Media related to Random sampling at Wikimedia Commons Retrieved from https en wikipedia org w index php title Simple random sample amp oldid 1120333870, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.