Statistics Magazine: Understanding statistical samples

In short: Statistical samples

A statistical sample is a small group of people or things that is used to represent a larger group. This is often done because it is not possible or practical to measure the entire group.
If the sample is representative of the larger group, then the results of the analysis of the sample can be used to make inferences about the larger group.

Understanding statistical samples

A statistical sample is a limited number of observations selected from a population on a systematic or random basis, which yield generalizations about the population after it is manipulated mathematically.

Basic terminology

A population is the unity of events or participants in which a researcher is interested, for example all children of twelve years in a country. Populations can vary to a great extent in size. Because it is (often) not possible to measure the whole population, samples are used in a study. A sample is a selection of participants or observations from the full population, which are being measured. A random sample is preferred. This means that all participants from the population have an equal chance of being selected for the sample. A sample is representative if a certain characteristic occurs as frequently in the sample as in the population. Often however, the sample is not a perfect representation of the population (note also the difference between a parameter and a statistic: when a measurement refers to the whole population, it is called a parameter. When a measure refers to the sample, it is called a statistic. Statistics are thus estimates of the parameter). The difference between a sample and the corresponding population is caused by sampling error.

Often, a chance sample is used for sampling. Such a sample can be achieved in several ways.

Types of sampling

Simple random sampling

With simple random sampling, a sample is chosen in such a way that each possible sample has an equal chance of being selected from the population. When a researcher for example wants to select a sample of 100 participants from a population of 5000 participants and each combination of 100 participants has an equal chance of being selected as sample, it is a simple random sample. To select such a sample, the researcher should use a sampling frame. That is a list for the whole population from which the sample will be drawn. Participants are selected randomly from this list. A disadvantage of the simple random sampling is that it requires to know beforehand how many participants there are in the population, and how many are required for the sampling frame. In some situations, forming a sampling frame is impossible. In such situations, a systematic sampling is chosen. Every ..th person is chosen to participate in the sample. For example, every 10^th person that enters a building is selected to participate.

Stratified random sampling

Stratified random sapling is a variant of simple random sampling. Here, participants are not selected directly from the population, but are first subdivided into multiple strata. A stratum is a part of the population that is in accordance with a certain characteristic. For example, we can subdivide the population into men and women or into three age categories (20-29, 30-39 and 40-49). Next, participants are chosen randomly from each stratum. By means of this procedure, researchers can control that an equal number of participants is drawn from each stratum. Therefore, researchers often use a proportional sampling method in which individuals are selected from each stratum proportionally. That means that the percentage of participants (from a certain stratum) is in accordance with the proportion in which this stratum occurs in the population.

Cluster sampling

When it is difficult to receive information beforehand about how many and which participants are present in the population, the cluster sampling method is used frequently. Here, the researcher does not draw individuals from the population, but clusters of possible participants. These clusters are often based on naturally occurring clusters, such as regions within a country. Often, multistage sampling is used with cluster sampling. With multistage sampling, large clusters are determined first. Next, smaller clusters within these large clusters are determined. This continues until a sample emerges with randomly chosen participants from each cluster.

Nonprobability samples

In some situations, it is not useful or not possible to select a chance sample. In those situations, a nonprobability sample is drawn. In that case, the researchers do not know to what extent their sample is representative for the population. Many psychological studies are conducted with samples that are not representative for the population. Nevertheless, these samples are very useful for certain studies. Nonprobability samples are appropriate for studies in which testing hypotheses is important, and in which the population is not described. The faith in validity increases when different samples (about the same topic) result in similar results. There exist three types of nonprobability samples:

Convenience sampling: A convenience sample is a sample in which researcher use participants that are directly available. A main advantage of a convenience sample is that by using this method it is much easier to recruit participants than it would be with representative samples.
Quota sampling: With a quota sample, the researcher determines beforehand what percentages should be met. The sample is drawn based on these percentages. For example, a researcher might say that he wants to select exactly 20 men and 20 women for his study instead of randomly drawing 40 participants from the population without paying attention to gender.
Purposive sampling: With a purposive sample, the researchers have strong ideas about which participants are typical for the population. Based on these ideas, they select which participants may participate in their study. The problem with purposive sampling is that it is highly subjective.

Reducing sampling errors (bias)

It is difficult to make a fully representative sample. There are different ways in which a sample can not be representative. These are called sampling errors or bias, and may result in misleading research outcomes. Sampling errors (bias) refers to deviations of your result from the true parameter. Imagine that you checked all grades of all your fellow students and calculated that on average people scored a 7.4. Imagine someone else who had less time than you who took a sample of 100 students out of the total population. Those 100 students, he finds, score on average a 7.6. Now the true parameter is 7.4 and the sampling error (or bias) is 0.2.

Two types of bias exist: systematic and non-systematic. Non-systematic bias occurs always. These are the result of sampling variance. For example, psychology students from one year are not the same as psychology students from another year, which may result in a different mean of the measured variable. However, you assume that the higher the number of participants in your sample, the smaller the influence of non-systematic bias will be. It is very difficult (if not impossible) to control non-systematic sampling errors; systematic sampling errors (or systematic bias) on the other hand can be controled by the researcher.

Systematic bias can arise by means of the following different causes:

Selection bias: The way in which the participants are selected, causes a biased view. For example, psychology students may have a higher IQ than the total population of students. Another example can be found in inter-questionnaires. People without internet are automatically excluded from such a study.
Non-response bias: A biased view arises, because the people that are willing to participate in your study, are different from the people that do not respond in your study. For example, an IQ test for psychology students is voluntarily. People who consider themselves to be clever, may me more tempted to participate in the IQ test than people who consider themselves to be not so clever. On average, therefore, measured IQ could be higher than real IQ level of the population.
Response bias: A biased view arises, because the answers that are given are not in accordance with the truth. For example, students do not feel like participating in an IQ test, but the test is mandatory. As a result, these students might randomly fill in some answers. In this example, measured IQ could be lower than real IQ level of the population.

Sample size

A large sample is not a guarantee for a representative sample. The way in which the sample is drawn is at least as important as the sample size. However, there are guidelines that tell you how large your sample at least should be. In general, it is the case that the smaller the population, the larger the part has to be that is included in your sample. For example, if the population consists of 50 people, you need approximately 49 to obtain representative results. A rule-of-thumb is that, for small populations (<500), you select at least 50% for the sample. For large populations (>5000), you select 17-27%. If the population exceeds 250.000, the required sample size hardly increases (between 1060-1840 observations).

In sum: the smaller the population, the larger the required sample ratio.

Confidence interval (CI)

As mentioned before, you can never be sure that your results are exactly in accordance with the true population parameter. To indicate this, you can calculate a confidence interval. That is a range of numbers below and above the estimate parameter, in which the true parameter will likely be. For example, if a 95% confidence interval runs from 30 to 33, you can say that you know with 95% confidence that the true population parameter is somewhere between 30 and 33. The sample size influences the confidence interval. The larger the sample size, the smaller the confidence interval. That implies that you are able to do a more precise estimation of the parameter based on a larger sample.

Glossary, practice questions and video with statistical samples

Glossary for Statistical Samples

Definitions and explanations of relevant terminology generally associated with statistical samples

What is a population in statistics?

What is a sample in statistics?

What is a random sample?

What is a representative sample?

What is a simple random sample?

What is a cluster sample?

What is a convenience sample?

What is a quota sample?

What is a purposive sample?

What is sampling error?

What is sampling bias?

What is the difference between sampling error and sampling bias?

What is non-systematic bias?

What is systematic sampling error (or systematic bias)?

What is the best sample size for quantitative research?

What is the confidence interval?

Practice Questions for Statistical Samples

Questions

1. What is the difference between a parameter and a statistic?

2. Which three kinds of non-probability sample exist?

3. In a study about patients in psychiatric institutions in The Netherlands, a sample is drawn as follows: First, one draws at random a number of institutions from the full list of Dutch psychiatric institutions. Then, a number of patients is drawn at random from each of the selected institutions. What kind of sampling procedure is described here?

4. A researcher wants to know to what extent alcohol use is associated with study results. She puts a note on the bulletin board to ask students who drink to participate in the study. 33 students sign up. What kind of sampling procedure is described here?

5. What is meant with the so-called selection bias?

Answers

1. What is the difference between a parameter and a statistic?

A parameter refers to a value that describes the population. A statistic refers to a value that describes the sample.

2. Which three kinds of nonprobability sample exist?

Convenience sample
Quota sample
Purposive sample

A cluster sample.

When it is difficult to receive information beforehand about how many and what kind of individuals are present n the population, cluster sampling is commonly used. In this case, the researcher does not draw individuals directly from the population, but from clusters of possible participants, such as regions within a country. Often, ‘multistage sampling’ is used with cluster sampling. With multistage sampling, one determines large clusters first. Then, smaller clusters within the large clusters are determined. This process continues until a sample is drawn, with participants randomly drawn from each cluster.

A nonprobability sample (or convenience sample).

In some situations it is not possible or awkward to draw a chance-sample. In that case, a ‘nonprobability’ sample is drawn. With nonprobability samples, one does not know how representative the sample is for the population. Many psychological studies for example are performed on the basis of samples that may not be representative for the population.

5. What is meant with the so-called selection bias?

This implies that the way in which the participants are selected may lead to a biased image. Imagine for example online questionnaires. People who do not have access to the internet, are automatically excluded from the study.

Video for understanding statistical samples

Types of Sampling Methods (4.1)

Visit the author's profile page.

Extra clarification with basic concepts of sampling methods

Source and more information: http://wwww.simplelearningpro.com

Knowledge and assistance for understanding statistical samples

Statistical samples

Glossary and practice questions

Updates & About WorldSupporter Statistics

To put your samples to the test!

Topics related to statistical samples

Statistics: Magazines for encountering Statistics

Knowledge and assistance for discovering, identifying, recognizing, observing and defining statistics.

Startmagazine: Introduction to Statistics

Recognizing commonly used statistical symbols

Stats for students: Simple steps for passing your statistics courses

Statistiek in The Netherlands- learn, study or share - Startup page

Statistics: Magazines for understanding statistics

Knowledge and assistance for classifying, illustrating, interpreting, demonstrating and discussing statistics.

Startmagazine: Introduction to Statistics

Understanding data: distributions, connections and gatherings

Understanding reliability and validity

Statistics Magazine: Understanding statistical samples

Understanding distributions in statistics

Understanding variability, variance and standard deviation

Understanding inferential statistics

Understanding type-I and type-II errors

Understanding effect size, proportion of explained variance and power of tests to your significant results

Statistiek in The Netherlands- learn, study or share - Startup page

Statistics: Magazines for applying statistics

Knowledge and assistance for choosing, modeling, organizing, planning and utilizing statistics.

Applying z-tests and t-tests

Applying correlation, regression and linear regression

Applying spearman's correlation

Applying multiple regression

Applying logistic regression

Statistiek in The Netherlands- learn, study or share - Startup page

Statistics Startup pages: summaries and study notes

Startpages for studying statistics

Statistics: Magazines for encountering Statistics

Statistics: Magazines for understanding statistics

Statistics: Magazines for applying statistics

Statistics Magazine: Understanding statistical samples

Recognizing commonly used statistical symbols

Statistiek in The Netherlands- learn, study or share - Startup page

Updates & About WorldSupporter Statistics

What can you do on a WorldSupporter Statistics Topic?

Understand statistics with knowledge and explanation about a topic of statistics
Practice with questions and answers to test your statistical knowledge and skills
Watch statistics practiced in real life with selected videos for extra clarification
Study relevant terminology with glossaries of statistical topics
Share your knowledge and experience and see other WorldSupporters' contributions about a topic of statistics

Updates of WorldSupporter Statistics

Latest news and updates of WorldSupporter Statistics