# Glossary for Statistical Samples

Definitions and explanations of relevant terminology generally associated with statistical samples

A statistical sample is a limited number of observations selected from a population on a systematic or random basis, which yield generalizations about the population after it is manipulated mathematically.

A *population* is the unity of events or participants in which a researcher is interested, for example all children of twelve years in a country. Populations can vary to a great extent in size. Because it is (often) not possible to measure the whole population, samples are used in a study. A *sample* is a selection of participants or observations from the full population, which are being measured. A *random sample* is preferred. This means that all participants from the population have an equal chance of being selected for the sample. A sample is *representative* if a certain characteristic occurs as frequently in the sample as in the population. Often however, the sample is not a perfect representation of the population (note also the difference between a parameter and a statistic: when a measurement refers to the whole population, it is called a *parameter*. When a measure refers to the sample, it is called a *statistic*. Statistics are thus estimates of the parameter). The difference between a sample and the corresponding population is caused by *sampling error*.

Often, a chance sample is used for sampling. Such a sample can be achieved in several ways.

With *simple random sampling*, a sample is chosen in such a way that each possible sample has an equal chance of being selected from the population. When a researcher for example wants to select a sample of 100 participants from a population of 5000 participants and each combination of 100 participants has an equal chance of being selected as sample, it is a *simple random sample*. To select such a sample, the researcher should use a *sampling frame*. That is a list for the whole population from which the sample will be drawn. Participants are selected randomly from this list. A disadvantage of the simple random sampling is that it requires to know beforehand how many participants there are in the population, and how many are required for the sampling frame. In some situations, forming a sampling frame is impossible. In such situations, a *systematic sampling* is chosen. Every ..th person is chosen to participate in the sample. For example, every 10^{th} person that enters a building is selected to participate.

*Stratified random sapling* is a variant of simple random sampling. Here, participants are not selected directly from the population, but are first subdivided into multiple strata. A *stratum* is a part of the population that is in accordance with a certain characteristic. For example, we can subdivide the population into men and women or into three age categories (20-29, 30-39 and 40-49). Next, participants are chosen randomly from each stratum. By means of this procedure, researchers can control that an equal number of participants is drawn from each stratum. Therefore, researchers often use a *proportional sampling method* in which individuals are selected from each stratum proportionally. That means that the percentage of participants (from a certain stratum) is in accordance with the proportion in which this stratum occurs in the population.

When it is difficult to receive information beforehand about how many and which participants are present in the population, the *cluster sampling* method is used frequently. Here, the researcher does not draw individuals from the population, but clusters of possible participants. These clusters are often based on naturally occurring clusters, such as regions within a country. Often, *multistage sampling* is used with cluster sampling. With multistage sampling, large clusters are determined first. Next, smaller clusters within these large clusters are determined. This continues until a sample emerges with randomly chosen participants from each cluster.

In some situations, it is not useful or not possible to select a chance sample. In those situations, a *nonprobability sample* is drawn. In that case, the researchers do not know to what extent their sample is representative for the population. Many psychological studies are conducted with samples that are not representative for the population. Nevertheless, these samples are very useful for certain studies. Nonprobability samples are appropriate for studies in which testing hypotheses is important, and in which the population is not described. The faith in validity increases when different samples (about the same topic) result in similar results. There exist three types of nonprobability samples:

*Convenience sampling*: A convenience sample is a sample in which researcher use participants that are directly available. A main advantage of a convenience sample is that by using this method it is much easier to recruit participants than it would be with representative samples.*Quota sampling*: With a quota sample, the researcher determines beforehand what percentages should be met. The sample is drawn based on these percentages. For example, a researcher might say that he wants to select exactly 20 men and 20 women for his study instead of randomly drawing 40 participants from the population without paying attention to gender.*Purposive sampling*: With a purposive sample, the researchers have strong ideas about which participants are typical for the population. Based on these ideas, they select which participants may participate in their study. The problem with purposive sampling is that it is highly subjective.

It is difficult to make a fully representative sample. There are different ways in which a sample can not be representative. These are called sampling errors or *bias*, and may result in misleading research outcomes. Sampling errors (bias) refers to deviations of your result from the true parameter. Imagine that you checked all grades of all your fellow students and calculated that on average people scored a 7.4. Imagine someone else who had less time than you who took a sample of 100 students out of the total population. Those 100 students, he finds, score on average a 7.6. Now the true parameter is 7.4 and the sampling error (or bias) is 0.2.

Two types of bias exist: *systematic* and *non-systematic*. Non-systematic bias occurs always. These are the result of sampling variance. For example, psychology students from one year are not the same as psychology students from another year, which may result in a different mean of the measured variable. However, you assume that the higher the number of participants in your sample, the smaller the influence of non-systematic bias will be. It is very difficult (if not impossible) to control non-systematic sampling errors; systematic sampling errors (or systematic bias) on the other hand can be controled by the researcher.

Systematic bias can arise by means of the following different causes:

*Selection bias*: The way in which the participants are selected, causes a biased view. For example, psychology students may have a higher IQ than the total population of students. Another example can be found in inter-questionnaires. People without internet are automatically excluded from such a study.*Non-response bias*: A biased view arises, because the people that are willing to participate in your study, are different from the people that do not respond in your study. For example, an IQ test for psychology students is voluntarily. People who consider themselves to be clever, may me more tempted to participate in the IQ test than people who consider themselves to be not so clever. On average, therefore, measured IQ could be higher than real IQ level of the population.*Response bias:*A biased view arises, because the answers that are given are not in accordance with the truth. For example, students do not feel like participating in an IQ test, but the test is mandatory. As a result, these students might randomly fill in some answers. In this example, measured IQ could be lower than real IQ level of the population.

A large sample is not a guarantee for a representative sample. The way in which the sample is drawn is at least as important as the sample size. However, there are guidelines that tell you how large your sample at least should be. In general, it is the case that the smaller the population, the larger the part has to be that is included in your sample. For example, if the population consists of 50 people, you need approximately 49 to obtain representative results. A rule-of-thumb is that, for small populations (<500), you select at least 50% for the sample. For large populations (>5000), you select 17-27%. If the population exceeds 250.000, the required sample size hardly increases (between 1060-1840 observations).

In sum: the smaller the population, the larger the required sample ratio.

As mentioned before, you can never be sure that your results are exactly in accordance with the true population parameter. To indicate this, you can calculate a *confidence interval*. That is a range of numbers below and above the estimate parameter, in which the true parameter will likely be. For example, if a 95% confidence interval runs from 30 to 33, you can say that you know with 95% confidence that the true population parameter is somewhere between 30 and 33. The sample size influences the confidence interval. The larger the sample size, the smaller the confidence interval. That implies that you are able to do a more precise estimation of the parameter based on a larger sample.

Definitions and explanations of relevant terminology generally associated with statistical samples

1. What is the difference between a parameter and a statistic?

2. Which three kinds of non-probability sample exist?

3. In a study about patients in psychiatric institutions in The Netherlands, a sample is drawn as follows: First, one draws at random a number of institutions from the full list of Dutch psychiatric institutions. Then, a number of patients is drawn at random from each of the selected institutions. What kind of sampling procedure is described here?

Extra clarification with basic concepts of sampling methods

Source and more information:

...Understand statistics with knowledge and explanation about a topic of statisticsPractice with questions and answers to test your statistical knowledge and skillsWatch statistics practiced in real life with selected videos for extra clarificationStudy relevant terminology with glossaries of statistical topicsShare your knowledge and experience and see other WorldSupporters' contributions about a topic of statistics...

Latest news and updates of WorldSupporter Statistics

Connect & Continue

More from Statistics Supporter

- 1 of 41
- next ›

## Add new contribution