Sometimes you can’t correct problems in your data.
This is especially irksome if you have a small sample and can’t rely on the central limit theorem to get you out of trouble.
- The historical solution is a small family of models called non-parametric tests or assumption-free tests that make fewer assumptions than the linear model.
The four most common non-parametric procedures:
- the Mann-Whitney test
- the Wilcoxon signed-rank test
- the Friedman’s test
- the Kruskal-Wallis test
All four tests overcome distributional problems by ranking the data.
Ranking the data: finding the lowest score and giving it a rank 1, then finding the next highest score and giving it the rank 3, and so on.
This process results in high scores being represented by large ranks, and low scores being represented by small ranks.
The model is then fitted to the ranks and not to the raw scores.
- By using ranks we eliminate the effect of outliers.
There are two choices to compare the distributions in two conditions containing scores from different entities:
- the Mann-Whitney test
- the Wilcoxon rank-sum test
Both tests are equivalent.
There is also a second Wilcoxon test that does something different.
If you were to rank the data ignoring the group to which a person belonged from lowest to highest, if there’s no difference between the groups, ten you should find a similar number of high and low ranks in each group.
- if you added up the ranks, then you’d expect the summed total of ranks in each group to be about the same.
If you were to rank the data ignoring the group to which a person belonged from lowest to highest, if there’s a difference between the groups, ten you should not find a similar number of high and low ranks in each group.
- if you added up the ranks, then you’d expect the summed total of ranks in each group to be different.
The Mann-Whitney and Wilcoxon rank-sum test use the principles above.
- when the groups have unequal numbers of participants in them, the test statistic (Ws) for the Wilxcoxon rank-sum test is simply the sum of ranks in the group that contains the fewer people.
- then the group sizes are equal it’s value the value of the smaller summed rank.
Starting at the lowest score, we assign potential ranks starting with 1 and going up the number of scores we have.
- these are potential ranks because sometimes the same score occurs more than once in a data set.
- tied ranks: if a score occurs more than once in a data set, use the average of the ranks you wanted to give them. (For example, if you have two times a 6 and you wanted to give them the rank 4 and 5, you give it the rank 4,5, the average of these values).
One you’ve ranked the data, we add the ranks for the two groups.
Wilcoxon rank-sum test
The mean (Ẅs) and the standard error (SEẄs) of this test statistic can be calculated from the sample sizes of each group.
n1 is the sample size of group 1
n2 is the sample size of group 2
Ẅs = (n1 + n2 +1)/2
SEẄs = square root((n1n2(n1 + n2 +1))/12)
z = (X – Ẍ)/s = (Ws – Ẅs)/ SEẄx
The Mann-Whitney test
Basically the same, but uses a test statistic U, which has a direct relationship with the Wilcoxon tst statistic.
R1 is the sum of ranks for group 1
U = n1 n2 + (n1 (n1 +1)/2) – R1
Output from the Mann-Whitney test
The Mann-Whitney test works by looking at differences in the ranked positions of scores in different groups.
- The first part of the output is a graph summarizing the data after they have been ranked.
- SPSS shows us the distribution of ranks in the two groups and the mean rank in each condition.
With all non-parametric tests, the output contains a summary table that you need to double-click to open the model viewer window.
The model viewer is divided into two panels:
- the left-hand panel shows the summary table of any analysis you have done
- the right-hand panel shows the details of the analysis
The group that has the highest mean rank should have a greater number of high scores within it.
Underneath the graph a table shows the test statistics for the Mann-Whitney test, the Wilcoxon procedure and the corresponding z-score.
- the rows labelled Asymptotic Sig and Exact Sig tell us the probability that a test statistic of at least that magnitude would occur if there were no difference between groups
Calculating effect size
r: effect size estimate
r = z/ square root(N)
z: the z-score SPSS procedures
N: the size of the study (number of total observations)
For the Mann-Whetney test, report only the test statistic (denoted by U) and its significance. Include the effect size and report exact values of p.
- The Mann-Whitney test and Wilcoxon rank-sum test compare two conditions when different participants take part in each condition and the resulting data have unusual cases or violate any assumption in chapter 6.
- Look at the row labelled Asymptotic Sig or Exact Sig (if your sample is small). If the value is less than 0.05 then the two groups are significantly different.
- The values of the mean ranks tell you how the groups differ
- The group with the highest scores wil have the highest mean rank
- Report the U-statistic (or Ws if you prefer), the corresponding z and the significance value. Also report the medians and their corresponding ranges (or draw a boxplot)
- Calculate the effect size and report this too
The Wilcoxon signed-rank test: used in situations where you want to compare two sets of scores that are related in some way (they come from the same entities).
Theory of the Wilcoxon signed-rank test
The Wilcoxon signed-rank test is based on ranking the differences between scores in the two conditions you’re comparing.
- Once the differences have been ranked, the sign of the difference (positive or negative) is assigned in the rank.
Doing the test
- First, we calculate the difference between scores of the two measurements.
- If the difference is zero, we exclude this score from ranking.
- We make a note of the sign of the difference (positive or negative) and then rank the differences (starting with the smallest), ignoring whether they are positive or negative.
- The ranking process is the same as above, and we deal with tied scores in the same way.
- Finally, we collect together the ranks that came from a positive difference between the conditions, and add them up to get the sum of positive ranks (T+).
- We also add up the ranks that came from negative differences between the conditions to get the sum of negative ranks (T-)
To calculate significance of the test statistic (T), we look at the mean (Ť), and the standard error (SET)
Ť = (n(n+1))/4
SET = square root (n(n+1)(2n+1))/ 24
z = (T – Ť) / SET
If you have split the file, the first set of results obtained will be for the fist group.
If you double-click this table to enter the model viewer you will see a histogram of the distribution of differences.
- These differences are the scores of the second measurement subtracted from the scores on the first measurement
- These scores correspond to the values in the Difference column
- The histogram is colour-coded based on whether the ranks are positive or negative
- Positive ranks appear in brown bars
- Negative ranks appear in blue bars
Calculating the effect size
The effect size can be calculated in the same way as for the Mann-Whitney test.
Writing the results
For the Wilcoxon test, we report the test statistic (denoted by the letter T), its exact significance and an effect size
- The Wilcoxon signed-rank test compares two conditions when the scores are related (come from the same participants) and the resulting data have unusual cases or violate any assumption in chapter 6.
- Look at the row labelled Asymptotic Sig (2-sided test). If the value is less than 0.05 then the two conditions are significantly different.
- Look at the histogram and numbers of positive or negative differences to tell you how the groups differ (the greater number of differences in a particular direction tells you the direction of the result)
- Report the T-statistic, the corresponding z, the exact significance value and effect size. Also report the medians and their corresponding ranges (or draw a boxplot)
The Kruskal-Wallis test: compares groups or conditions containing independent scores.
Assesses the hypothesis that multiple independent groups come form different populations.
Theory of the Kruskal-Wallis test
The Kruskal-Wallis test is used with ranked data.
- To begin with, scores are ordered from lowest to highest, ignoring the group to which the score belongs.
- The lowest score is assigned the rank of 1, the next highest a rank of 2 and so on.
- Once ranked, the scores are collected back into their groups and their ranks are added within each group
- The sum of ranks within each group is denoted by Ri
Once the ranks has been calculated within each group, the test statistic H is calculated.
H = 12/(N(N+1)) Σki=1 ((R2i/ni)-3(N+1))
N is the total sample size
ni is the sample size within each group
This test statistic has a distribution from the family of chi-square distributions.
The Kruswal-Wallis test tells us that, overall, groups come from different populations.
It doesn’t tell us which groups differ.
- The simplest way to break down the overall effect is to compare all pairs of groups → pairwise comparisons.
Output from the Kruskal-Wallis test
Double-click on the summary table to open up the model viewer, which contains:
- In the left pane the summary table
- In the right pane the a detailed output.
- The boxplot of the data can help to see which groups differ.
- A table containing the Kruskal-Walles test statistic, H
- The associated degrees of freedom
- The significance
The right-hand pane of the model viewer shows the main output by default (the Independent Samples Test View).
- We can chance what is visible using the drop-down list labelled View.
- Clicking on this drop-down list reveals options including Pairwise Comparisons or Homogenous Subtest
The column labelled Adj.Sig contains the adjusted p-values.
Testing for trends: the Jockheere-Terpstra test
Jockheere-Terpstra test: tests for an ordered pattern to the medians of the groups you’re comparing.
It does the same thing as the Kruskal-Wallis test (test for a difference between the medians of the groups), but it incorporates information about whether the order of the group is meaningful.
- You should use this test when you expect the groups you’re comparing to produce a meaningful order of medians.
Two options of the test:
- Smallest to largest: tests whether the first group differs from the second group, which in turn differs from the third group, which in turn differs form the fourth and so on until the last group.
- Largest to smallest: test whether the last group differs from the group before, which in turn differs from the group before and so on till the last group.
The test determines whether the medians of the groups ascend or descend in the order specified by the coding variable.
The coding variable must code groups in the order that you expect the medians to change.
Calculating effect size
There isn’t an easy way to convert a Kruskal-Wallis test statistic that has more than 2 degree of freedom to an effect size, r.
You should us the significance value of the Kruskal-Wallis test statistic to find an associated value of z from a table of probability values for the normal distribution.
From this you could use the conversion to r.
r = z/square root (N)
rJ-T = z/square root (N)
Writing and interpreting the results
For the Kruskal-Wallis test, report the test statistic H, its degrees of freedom and its significance.
Report the follow-up statistics as well.
- The Kruskal-Wallis test compares several conditions when different participants take part in each condition and the resulting data have unusual cases or violate any assumption in chapter 6.
- Look at the row labelled Asymptotic Sig. A value less than 0.05 is typically taken to mean that the groups are significantly different.
- Pairwise comparisons compare all possible pairs of groups with a p-value that is corrected so that the error rate across all tests remain at 5%
- If you predict that the medians will increase or decrease across your groups in a specific order then test this with the Jonckheere-Terpstra test
- Report the H-statistic, the degrees of freedom and the significance value for the main analysis. For any follow-up tests, report an effect size, the corresponding z and the significance value. Also report the medians and their corresponding ranges (or draw a boxplot).
Friedman’s ANOVA: tests differences between three or more conditions when the scores across conditions are related (usually the same entities have provided scores in all conditions).
- Friedman’s ANOVA is used to counteract the presence of unusual cases or when one of the assumptions from chapter 6 has been violated.
Theory of Friedman’s ANOVA
Friedman’s ANOVA works on ranked data.
Once the sum of ranks has been calculated for each group, the test statistic Fr is calculated:
Fr = [(12/Nk(k+1)) Σki=1R2i] – 3N(k+1)
Ri is the sum of ranks for each group
N is the total sample size
k is the number of conditions
When the number of people tested is greater than 10, this test statistic has a chi-square distribution with degrees of freedom that is (k-1).
Output from Friedman’s ANOVA
Double-click the summary table to display more details in the model viewer window.
We now see:
- The summary table
- Some histograms
- Show the distribution of ranks across the groups
- A table containing
- The test statistic F
- The degrees of freedom
- The associated p-value
Following up Friedman’s ANOVA
We can follow up a Friedman’s ANOVA by comparing all groups, or using a step-down procedure.
Calculating an effect size
We can do a series of Wilcoxon tests form which we extract a z-score.
Then get an effect size r from the Wilcoxon signed-rank test for each comparison.
r = z/square root (N)
Writing and interpreting results
For Friedman’s ANOVA we report the test statistic, denoted by X2F, its degrees of freedom and its significance.
- Friedman’s ANOVA compares several conditions when the data are related and the resulting data have unusual cases or violate any assumption of chapter 6
- Look at the row labelled Asymptotic Sig. If the value is less than 0.05 then typically people conclude that the conditions are significantly different
- You can follow up the main analysis with pairwise comparisons.
- These tests compare all possible pairs of conditions using a p-value that is adjusted such that the overall Type I error rate remains 5%
- Report the X2 statistic, the degrees of freedom and the significance value for the main analysis. For any follow-up tests, report an effect size, the corresponding z and the significance value.
- Report the medians and their ranges (or draw a boxplot)
Add new contribution