Non-parametric tests can be used when the assumptions of the regular statistical tests have been violated. Non-parametric tests use fewer assumptions and are robust. A non-parametric test has less power than parametric tests if the sampling distribution is normally distributed.
Ranking the data refers to giving the lowest score the rank of 1, the next highest score a rank of 2 and so on. This eliminates the effect of outliers. It does neglect the difference in magnitude between the scores. If there are two scores that are the same, there are tied ranks. These scores are ranked by the value of the average potential rank for those scores (e.g. rank 3 and 4 will become rank 3.5).
There are several alternatives to the four most used non-parametric tests:
- Kolmogorov-Smirnov Z
It tests whether two groups have been drawn from the same population. It has more power than the Mann-Whitney test when the sample sizes are less than 25 per group. - Moses Extreme Reaction
It tests the variability of scores across the two groups and is a non-parametric form of the Levene’s test. - Wald-Wolfowitz runs
It looks at clusters of scores in order to determine whether the groups differ. If there is no difference, the ranks should be randomly interspersed. - Sign test
It does the same as the Wilcoxon-signed rank test but it is only based on the direction of the difference. The magnitude of change is neglected. It lacks power unless the sample size is really small. - McNemar’s test
It uses nominal, rather than ordinal data. It is useful when looking for changes in people’s scores. It compares the number of people who changed their response in one direction to those who changed in the opposite direction. - Marginal homogeneity
It is an extension of McNemar’s test and is similar to the Wilcoxon test. - Friedman’s 2-way ANOVA by ranks (k samples)
It is a non-parametric ANOVA to compare two groups but has low power compared to the Wilcoxon signed-rank test. - Median test
It assesses whether samples are drawn from a population with the same median. - Jonckheere-Terpstra
It tests for trends in the data. It tests for an ordered pattern of the medains of the group. It does the same as the Kruskal-Wallis test but incorporates the order of the groups. This test should be used when a meaningful order of medians is expected. - Kendall’s W
It tests the agreement between raters and ranges between 0 and 1. - Cochran’s Q
It is a Friedman test on dichotomous data.
The effect size for both the Wilcoxon rank-sum test and the Mann-Whitney test can be calculated using the following formula:

denotes the total sample size.
WILCOXON RANK-SUM TEST
This test can be used to compare the distributions in two conditions containing scores from different entities. It uses the difference in sum rank between two conditions. The test statistic is the lower sum of rank or if the sample sizes are unequal, the sum of the group with the smallest sample size.
The mean and the standard error can be calculated in the following way:


The z-score can be calculated using the standard z-score formula.
MANN-WHITNEY TEST
This test can be used to compare the distributions in two conditions containing scores from different entities. It uses the difference in sum rank between two conditions. The test statistic can be calculated in the following way:

denotes the sum of ranks for group 1.
WILCOXON SIGNED-RANK TEST
This test is used when two sets of scores that are related are compared (e.g. within-subject design). It ranks the differences between scores in the two conditions. A sign (plus or minus sign) is added to the difference. Scores that have a difference of zero are excluded from the ranking. A sum of negative ranks and a sum of positive ranks is calculated. The sum of positive ranks is the test statistic.
The mean and the standard deviation can be calculated in the following way:


The z-score can be calculated using the standard formula for the z-score. The effect size uses the same formula as the effect size for the Mann-Whitney test.
KRUSKAL-WALLIS TEST
This test compares more than two independent conditions. It assesses the hypothesis that multiple independent groups come from different populations. This test ranks data. The test statistic can be calculated in the following way:

It uses a chi-square distribution. The degrees of freedom is k-1.
The conclusion of this test is whether the groups differ or not but it does not show which groups differ. This can be checked using pairwise comparisons, compare all pairs of groups. This can be done by using a stepped procedure. This compares two groups until one differs. Then the groups that are equivalent to each other form a sub-group and the non-equivalent group is used in further comparisons. This creates homogeneous groups.
An effect size for the Kruskal-Wallis test is not useful. However, the effect size for the pairwise tests is useful and uses the same formula as the effect size of the Mann-Whitney test.
FRIEDMAN’S ANOVA
This tests the difference between three or more conditions when the scores across conditions are related. The ranks are per person instead of per group, therefore, every person has k ranks.
The test statistic uses the following formula:

denotes the sum of ranks for each group.