Much of the data needed for information purposes are collected through sampling. A sample is a set of values taken from a population of those values to represent that population without the need for taking them all. Sampling saves time, and in many cases it is impossible to identify all the individuals in a population anyway. While the results or parameters, of a sample may or may not represent the parameters of the population exactly, given the right sample size and an unbiased sample the results can be very close to the original data.
The first action is to define the sampling frame i.e. to identify the population from which the sample will be taken. The population, for example, may be people, manufactured items, or a working period (as in activity sampling).
Random sampling is akin to the lottery methods and is the best method in statistical terms as it is more likely to reduce the effects of bias. There are several ways in which a random sample can be taken such as drawing well-mixed numbers from a container without deliberate selection or using the random numbers published in textbooks as tables, or computer spreadsheets.
Systematic sampling (the constant skip method) is a non-random method. To reduce the chance of bias something must be random within the sampling frame, so if the sampling method is not random then the data themselves must be random. Systematic sampling is a way of taking every nth value.
For example, in activity sampling (see Related Topics), a snap observation of a task being performed may be made every twenty minutes as long as this did not coincide with cycles of the work (e.g. each work cycle was not a multiple of five minutes).
Stratified sampling is a method of using natural divisions of a sampling frame such as age, social class, type of machine or equipment, or time periods (such as days). This ensures that all sections of the population are represented. Taking the activity sampling example again, if random times are used the situation could arise where certain hours of the day had, by chance, significantly more observations than others. If this were a problem then the strata could be "hours in the day" and each hour divided into a fixed number of random or systematic times.
The results of sampling can produce a picture of the whole population, say in the form of a frequency distribution or more specifically, its parameters. These parameters include averages (e.g. mean, median, mode and others) and how the distribution is dispersed e.g. range, mean deviation, standard deviation (see Related Topics).
Because a sample usually is a relatively small fraction of the population it may not always describe the population and its parameters very accurately so the latter will contain statistical errors. For example, the mean of the sample may not mirror the mean of the population exactly because we did not measure every value in the population. In this case there will be an error associated with the sample mean. This is called the standard error of the mean.
The form of the standard error depends on the type of data (see Related Topics).
An example statement describing the estimated population mean is:
"the estimated population mean = the sample mean ±X standard errors"
The constant X is the number of standard errors necessary to define the level of confidence we have in the result and this is obtained from statistical tables published in textbooks on Statistical Method.
A schematic diagram to illustrate and compare two methods of sampling
|Task: To take a sample of ten items from a random population by (a) random and (b) systematic sampling.|
|Ten items sampled|