It is possible to predict categorical outcome variables, meaning, in which category an entity falls. When looking at categorical variables, frequencies are used. The chi-squared test can be used to see whether there is a relationship between two categorical variables. It is comparing the observed frequencies with the expected frequencies. The chi-squared test standardizes the deviation for each observation and these are added together. The chi-squared test uses the following formula:The expected score has the following formula:The degrees of freedom of the chi-squared distribution are (r-1)(c-1). In order to use the chi-squared distribution with the chi-squared statistic, there is a need for the expected value in each cell to be greater than 5. If this is not the case, then Fisher’s exact test can be used. The likelihood ratio statistic is an alternative to the chi-square statistic. It is comparing the probability of obtaining the observed data with the probability of obtaining the same data under the null hypothesis. The likelihood ratio statistic uses the following formula:It uses the chi-squared distribution and is the preferred test if the sample size is small. The chi-square statistic tends to make a type-I error if the table is 2 x 2. This can be corrected for by using Yates’ correction and uses the following formula:In short, the chi-square test tests whether there is a significant association between two categorical variables.ASSUMPTIONS WHEN ANALYSING CATEGORICAL DATAOne assumption the chi-square test uses is the assumption of independence of cases. Each person, item or entity must contribute to only one cell of the contingency table. Another assumption is that in 2x2 tables, no expected value should be below 5. In larger tables, not more than 20% of the expected values should be below 5 and all expected values should be greater than 1. Not meeting this assumption leads to a reduction in test power. The residual is the...

