How do you perform significance tests? – Chapter 6

6.1 What are the five components of a significance test?
6.2 How do you perform a significance test for a mean?
6.3 How do you perform a significance test for a proportion?
6.4 Which errors can be made in significance tests?
6.5 Which limitations do significance tests have?
6.6 How can you calculate the probability of type II error?
6.7 How is the binomial distribution used in significance rests for small samples?

6.1 What are the five components of a significance test?

A hypothesis is a prediction that a parameter within the population has a certain value or falls within a certain interval. A distinction can be made between two kinds of hypotheses. A null hypothesis (H₀) is the assumption that a parameter will assume a certain value. Opposite is the alternative hypothesis (H_a), the assumption that the parameter falls in a range outside of that value. Usually the null hypothesis means no effect. A significance test (also called hypothesis test or test) finds if enough material exists to support the alternative hypothesis. A significance test compares point estimates of parameters with the expected values of the null hypothesis.

Significance tests consist of five parts:

Assumption. Each test makes assumptions about the type of data (quantitative/categorical), the required level of randomization, the population distribution (for instance the normal distribution) and the sample size.
Hypotheses. Each test has a null hypothesis and an alternative hypothesis.
Test statistic. This indicates how far the estimate lies from the parameter value of H₀. Often, this is shown by the number of standard errors between the estimate and the value of H₀.
P-value. This gives the weight of evidence against H₀. The smaller the P-value is, the more evidence that H₀ is incorrect and that H_a is correct.
Conclusion. This is an interpretation of the P-value and a decision on whether H₀ should be accepted or rejected.

6.2 How do you perform a significance test for a mean?

Significance tests for quantitative variables usually research the population mean µ. The five parts of a significance test come to play here.

Assumed is that the data is retrieved from a random sample and it has the normal distribution.

The hypothesis is two-sided, meaning that both a null hypothesis and an alternative hypothesis exist. Usually the null hypothesis is H₀: µ = µ₀, in which µ₀ is the value of the population mean. This hypothesis says that there is no effect (0). The alternative hypothesis then contains all other values and looks like this: H_a: µ ≠ µ₀.

The test statistic is the t-score. The formula is as follows:

t = (ȳ – µ₀) / se in which se = $\frac{s}{\sqrt{n}}$ .
The sample mean ȳ estimates the population mean μ. If H₀ is true, then the mean of the distribution of ȳ equals the value of µ₀ (and lies in the middle of the distribution of ȳ). A value of ȳ far in the tail of the distribution gives strong evidence against H₀. The further ȳ is from µ₀ then the bigger is the t-score and the bigger is the evidence against H₀.

The P-value indicates how extreme the existing data would be if H₀ would be true. The probability that this happens, is located in the two tails of the t-distribution. Software can find the P-value.

To draw conclusions, the P-value needs to be interpreted. If the P-value is smaller, the evidence against H₀is stronger.

For two-sided significance tests the conclusions should be the same for the confidence interval and the significance test. This means that when a confidence interval of µ contains H₀ the P-value should be bigger than 0.05. When this interval doesn't contain H₀ the P-value is smaller than 0.05.

In two-sided tests the region of rejection is in both tails of the normal distribution. In most cases a two-sided test is performed. However, in some cases the researcher already senses in which direction the effect will go, for instance that a particular type of meat will cause people to gain weight. Sometimes it's physically impossible that the effect will take the opposite direction. In these cases a one-sided test can be used, this is an easier way to test a specific idea. A one-sided test has the region of rejection in only one of its tails, which one depends on the alternative hypothesis. If the alternative hypothesis says that there will be weight gain after consumption of a certain product, then the region of rejection is in the right tail. For two-sided tests the alternative hypothesis is H_a: µ ≠ µ₀ (so the population mean can be anything but a certain value), for one-sided tests it is H_a: µ > µ₀ or H_a: µ < µ₀ (so the population mean needs to be either bigger or smaller than a certain value).

All researchers agree that one-sided and two-sided tests are two different things. Some researchers prefer a two-sided test, because it provides more substantial evidence to reject the null hypothesis. Other researchers prefer one-sided tests because they show the outcome of a very specific hypothesis. They say a one-sided test is more sensitive. A tiny effect has a bigger impact on a one-sided test than on a two-sided test. Generally, if the direction of the effect is unknown, two-sided tests are applicable.

The hypotheses are expressed in parameters for the population (such as µ), never in statistics about the sample (such as ȳ), because retrieving information about the population is the end goal.

Usually H₀ is rejected when P is smaller or equal to 0.05 or 0.01. This demarcation is called the alpha level or significance level and it is indicated as α. If the alpha level decreases, the research should be more careful and the evidence that the null hypothesis is wrong should be stronger.

Two-sided tests are robust; even when the distribution isn't normal, still confidence intervals and tests using the t-distribution will function. However, significance rests don't work well for one-sided test with a small sample and a very skewed population.

6.3 How do you perform a significance test for a proportion?

Significance tests for proportions work roughly similar like significance tests for means. For categorical variables the sample proportion can help to test the population proportion.

In terms of assumptions, it needs to be stated whether it's a random sample with a normal distribution. If the value of H₀ is π 0,50 (this means that the population is divided exactly in half, 50-50%), then the sample size needs to be at least 20.

The null hypothesis says that there is no effect, so H₀: π = π₀. The alternative hypothesis of a two-sided test contains all other values, H_a: π ≠ π₀.

The test statistic for proportions is the z-score. The formula for the z-score used as a test statistic for a significance test of a proportion is:

z = $z = \frac{{(\hat{\pi}-\pi_0)}}{se_0}$ in which $se_0 = \sqrt{\frac{\pi_0(1-\pi_0)}{n}}$

The z-score measures how many standard errors the sample proportion is from the value of the null hypothesis. This means that the z-score indicates how big the deviation is, how much of the expected effect is observed.

The P-value can be searched with software or found in a table. Also internet apps can find the P-value. The P-value indicates how big is the probability that the observed proportion happens if H₀ would be true. For one-sided the probability of a certain value for z is easily found, for two-sided tests the probability needs to be doubled first.

Drawing conclusions works similar for proportion and for means. The smaller the P-value is, the stronger the evidence is against H₀. The null hypothesis is rejected when P is bigger than α for an alpha level of around 0,05. Even in case of strong evidence for H₀ it will not get accepted by many researchers, they will avoid drawing conclusions are too big and will just 'not reject' H₀.

6.4 Which errors can be made in significance tests?

To give people more insight into the findings of a significance test, it's better to give the P-value than to state merely whether the alternative hypothesis was accepted. This is an idea of Fisher. The collection of values for which the null hypothesis is rejected is called the rejection region.

Testing hypotheses is an inferential process. This means that a limited amount of information serves to draw a general conclusions. It's possible that a researcher thinks the null hypothesis should be rejected while the treatment doesn't really have effect. The cause is that samples aren't identical to populations. There can be many parts of a research where an error is created, for instance if an extreme sample happens to be selected. This is called a type I error, when the null hypothesis is rejected while it is true. It can have big consequences. However, there is only a small chane that type I error occurs. The alpha level shows how big is the probability that type I error occurs, usually not exceeding 5%, sometimes limited to 2.5% or 1%. But smaller alpha levels also create the need to find more evidence to reject the null hypothesis.

A type II error occurs when a researcher doesn't reject the null hypothesis while it is wrong, type I error when the null hypothesis is rejected but it is true. If the probability of type I error decreases, the probability of type II error increases.

If P is smaller than 0.05, then H₀ is rejected in case of α = 0,05. For type II error the values of µ₀ that don't cause H₀ to be rejected in the 95% confidence interval.

6.5 Which limitations do significance tests have?

It is important to notice that statistical significance and practical significance are not the same. Finding a significant effect doesn't mean that it's an important find. The size of P simply indicates how much evidence exists against H₀, and not how far the parameter lies from H₀.

It's misleading to only report research that found significant effects. The same research may have been done 20 times, but only once with a significant effect, which may have been found by coincidence.

A significant effect doesn't say whether a treatment has a big effect. To get a better appreciation of the size of a significant effect, the effect size can be calculated. The difference between the sample mean and the value of the population mean for the null hypothesis (M- µ₀) is divided by the population standard deviation. An effect size of 0.2 or less isn't practically significant.

For interpreting the practical consequences of a research the confidence interval is more important than a significance test. Often H₀ is only one value while other values might be plausible too. That's why a confidence interval with a spectrum of values gives more information.

Other ways that significance tests can mislead:

Sometimes results are only reported when they are regarded as statistically significant.
Statistical significance can be coincidence.
The P-value is not the probability that H₀ is true because it can either be true or false, not something in between.
Real effects usually are smaller than the effects in research that gets a lot of attention.

Publication bias is when research with small effects isn't even published.

6.6 How can you calculate the probability of type II error?

A type II error is located in the range of H_a. Every value within H_a has a P(type II error), a probability that type II error occurs. A type II error is calculated using software. The software then creates sampling distributions for the null hypothesis and for the alternative hypothesis and it compares the area where they overlap. The probability of type II error decreases when the parameter value is further away from the null hypothesis, when the sample gets bigger and when the probability of type I error increases.

The power of a test is the probability that the test will reject the null hypothesis when it is wrong. So the power is about finding an effect that is real. The formula for the power of a certain parameter value is: power = 1 – P (type 2 error). If the probability of type II error decreases, the power increases.

6.7 How is the binomial distribution used in significance rests for small samples?

Estimating proportions with small samples is difficult. For the outcome of a small sample with categorical discrete variables, like tossing a coin, a sampling distribution can be made. This is called the binomial distribution. A binomial distribution is only applicable when:

Every observation falls within one of two categories.
The probabilities are the same for every category.
The observations are independent.

The symbol π is the probability of category 1, the symbol x in this case is the binomial variable. The probability of x observations in category 1 is:

$P(x)=\frac{n!}{x!(n-x)!)}\pi^x (1-\pi)^{n-x}$

The symbol n! Is called n factorial, this is the product of all numbers 1 x 2 x 3 x... x n. The binomial distribution is only symmetrical for π = 0,50. The mean is µ= n π and the standard deviation is $\sigma=\sqrt{n\pi(1-\pi))}$

So even for tiny samples of less than 10 observations in each category a significance test can be done, but then the binomial distribution is used as a help. H₀ is π = 0,50 and H_a is π < 0,50.

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

This content is related to:

Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)

Summary of Statistical methods for the social sciences by Agresti, 5th edition, 2018. Summary in English.Read more

2993 reads

Check more of topic:

Samenvattingen voor psychologie en gedrag

Universiteit Groningen en studieverenigingen

This content is used in:

Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Associate with your Field of Study

Search Summaries or Notes

Start using Summaries

Add a Summary

Search a summary

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

Spotlight: topics

Check the related and most recent topics and summaries:

Institutions, jobs and organizations:

Universiteit Groningen en studieverenigingen

Activity abroad, study field of working area:

Samenvattingen voor psychologie en gedrag

WorldSupporter and development goals:

Goal 4: Quality Education

This content is also used in .....

Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)

Summary of Statistical methods for the social sciences by Agresti, 5th edition, 2018. Summary in English.

What are statistical methods? – Chapter 1

Which kinds of samples and variables are possible? – Chapter 2

What are the main measures and graphs of descriptive statistics? - Chapter 3

What role do probability distributions play in statistical inference? – Chapter 4

How can you make estimates for statistical inference? – Chapter 5

How do you perform significance tests? – Chapter 6

How do you compare two groups in statistics? - Chapter 7

How do you analyze the association between categorical variables? – Chapter 8

How do linear regression and correlation work? – Chapter 9

Which types of multivariate relationships exist? – Chapter 10

What is multiple regression? – Chapter 11

What is ANOVA? – Chapter 12

How does multiple regression with both quantitative and categorical predictors work? – Chapter 13

How do you make a multiple regression model for extreme or strongly correlating data? – Chapter 14

What is logistic regression? – Chapter 15

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams
How and why use WorldSupporter.org for your summaries and study assistance?
Using and finding summaries, notes and practice exams on JoHo WorldSupporter
Quicklinks to fields of study for summaries and study assistance

Online access to all summaries, study notes en practice exams

Check out: Register with JoHo WorldSupporter: starting page (EN)
Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)

How and why use WorldSupporter.org for your summaries and study assistance?

For free use of many of the summaries and study aids provided or collected by your fellow students.
For free use of many of the lecture and study group notes, exam questions and practice questions.
For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
For compiling your own materials and contributions with relevant study help
For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

Use the summaries home pages for your study or field of study
Use the check and search pages for summaries and study aids by field of study, subject or faculty
Use and follow your (study) organization
- by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
- this option is only available through partner organizations
Check or follow authors or other WorldSupporters
Use the menu above each page to go to the main theme pages for summaries
- Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Check out: Why and how to add a WorldSupporter contributions
JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports

Main study fields NL:

Studies: Bedrijfskunde en economie, communicatie en marketing, geneeskunde en gezondheidszorg, internationale studies en betrekkingen, IT, Logistiek en technologie, maatschappij, cultuur en sociale studies, pedagogiek en onderwijskunde, rechten en bestuurskunde, statistiek, onderzoeksmethoden en SPSS
Studie instellingen: Maatschappij: ISW in Utrecht - Pedagogiek: Groningen, Leiden , Utrecht - Psychologie: Amsterdam, Leiden, Nijmegen, Twente, Utrecht - Recht: Arresten en jurisprudentie, Groningen, Leiden

WorldSupporter: what are the features, functionalities and rules on WorldSupporter.org?

WorldSupporter NL: hoe vind je samenvattingen en studiehulp op WorldSupporter.org en JoHo.org

Summaries and Study Assistance - Start

Follow the author: Annemarie JoHo

Annemarie JoHo

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

2004