What is ANOVA? – Chapter 12

12.1 How do dummy variables replace categories?

For analyzing categorical variables without assigning a ranking, dummy variables are an option. This means that fake variables are created from observations:

z1 = 1 and z2 = 0 : observations of category 1 (men)

z1 = 0 and z2 = 1 : observations of category 2 (women)

z1 = 0 and z2 = 0 : observations of category 3 (transgender and other identities)

The model is: E(y) = α + β1z1 + β2z2. The means are deducted from the model: μ1 = α + β1 and μ2 = α + β2 and μ3 = α. Three categories only require two dummy variables, because what remains falls in category 3.

A significance test using the F-distribution tests whether the means are the same. The null hypothesis H0 : μ1 = μ2 = μ3 = 0 is the same as H0 : β1 = β2 = 0. A small F means a big P and much evidence against the null hypothesis.

The F-test is robust against small violations of normality and differences in the standard deviations. However, it can't handle very skewed data. This is why randomization is important.

12.2 How do you make multiple comparisons of means?

A small P doesn't say which means differ or how much. Confidence intervals give more information. For every mean a confidence interval can be constructed, or for the difference between two means. An estimate of the difference in population means is:

The degrees of freedom of the t-score are df = N – g, in which g is the number of categories and N is the combined sample size (n1 + n2 + … + ng). When the confidence interval doesn't contain 0, this is proof of difference between the means.

In case of lots of groups with equal population means, it might happen that a confidence interval finds a difference anyway, due to the increase in errors that comes with the increase in the number of comparisons. Multiple comparison methods check the probability that all intervals of a lot of comparisons contain the real differences. For a 95% confidence interval the probability that one comparison contains an error is 5%, this is the multiple comparison error rate. One such method is the Bonferroni method, which divides the desired error rate by the number of comparisons (5% / 4 comparisons = 1,25% per comparison). Another option is Tukey's method, this method can be calculated with software and uses the so-called Studentized range, a special kind of distribution. The advantage of Tukey's method is that it gives more specific confidence intervals than Bonferroni's method.

12.3 What is one-way ANOVA?

Analysis of variance (ANOVA) is an inferential method to compare the means of multiple groups. This is an independence test between a quantitative response variable and a categorical explanatory variable. The categorical explanatory variables are called factors in ANOVA. The test is basically a F-test. The assumptions are the same: normal distribution, equal standard deviations for the groups and independent random samples. The null hypothesis is H0 : μ1 = μ2 = … = μg and the alternative hypothesis is Ha : at least two means differ.

The F-test uses two measures of variance. The between-groups estimate is the variability between each sample mean ȳi and the general mean ȳ. The within-groups estimate is the variability within each group; within ȳ1, ȳ2, etc. This is an estimate of the variance σ2. Generally, the bigger the variability between the sample means and the smaller the variability within the groups, the more evidence that the population means are inequal. This is the equation for F: between-groups estimate / within-groups estimate. When F increases, P decreases.

In an ANOVA table the mean squares (MS) are the between-groups estimate and the within-groups estimate, these are estimates of the population variance σ2. The between-groups estimate is the sum of squares between the groups (the regression SS) divided by df1. The within-groups estimate is the sum of squares within the groups (the remaining SS, or SSE) divided by df2. Together the SS between the groups and the SSE are the TSS; total sum of squares.

The degrees of freedom of the within-groups estimate are: df2 = N (total sample size) – g (number of groups). The estimate of variance by the within-groups sum of squares is:

The degrees of freedom of the between-groups estimate are: df1 = g – 1. The variance by the between-groups sum of squares is:

When this value increases, the population mean is further from the null hypothesis and the difference between the means increases.

For a distribution very different from the normal distribution, the nonparametric Kruskal-Wallis test is an option, this test ranks the data and also works for distributions far from normal.

12.4 What is two-way ANOVA?

One-way ANOVA works for a quantitative dependent variable and the categories of a single explanatory variable. Two-way ANOVA works for multiple categorical explanatory variables. Each factor has a null hypothesis to measure the main effects of an individual factor on the response variable, while controlling for the other variable. The main effect of a factor is: MS / residual MS. The MS is calculated by dividing the sum of squares by the degrees of freedom. Because two-way ANOVA is complex, software is used that shows the MS and the degrees of freedom in an ANOVA table.

ANOVA can be done by creating dummy variables. For instance in research about the groceries spendings of vegetarians, taking into account how someone identifies:

v1 = 1 if the subject is vegetarian, 0 if the subject isn't

v2 = 1 if the subject is vegan, 0 if the subject isn't

If someone is vegetarian nor vegan, then they fall in the remaining category (meat eaters).

k = 1 if the subject identifies as budget-minded, 0 if the subject doesn't

Then the model is: E(y) = α + β1v1 + β2v2 + β3k. The prediction equation can be deduced. A confidence interval indicates the difference between the effects.

In reality, two-way ANOVA needs to be checked for interaction effects first, using an expanded model: E(y) = α + β1v1 + β2v2 + β3k.+ β4(v1 x k) + β5(v2 x k).

The sum of squares of one of the (dummy) variables is called the partial sum of squares or Type III sum of squares. This is the variability in y that is explained by a certain variable when the other aspects are already in the model.

ANOVA with multiple factors is factorial ANOVA. The advantage of factorial ANOVA and two-way ANOVA compared to one-way ANOVA is that it's possible to study the interaction of effects.

12.5 How does ANOVA with repeated measures work?

Within research, sometimes samples depend on each other, like with repeated measures in different moments of time but using the same subjects. Then each subject is a factor. This may result in three pairs of means (for instance before, during and after treatment), requiring multiple comparison methods. The Bonferroni method divides the margin of error over several confidence intervals.

An assumption of ANOVA with repeated measures is sphericity. This means that the variances of the differences between all possible pairs of explanatory variables are the same. If even the standard deviations and correlations are the same, then there is compound symmetry. Software tests for sphericity with Mauchly's test. If sphericity is lacking, then software uses the Greenhouse-Geisser adjustment of the degrees of freedom to allow for a F-test.

The advantage of using the same subjects is that certain factors are constant, this is called blocking.

Factors with a selected number of outcomes are fixed effects. Random effects are the randomly happening output of factors, like the characteristics of random people that happen to become research subjects.

12.6 How does two-way ANOVA with repeated measures of a factor work?

In research with repeated measures, more fixed effects can be involved. An example of a within-subjects factor is time (before/during/after treatment), because it requires the same subjects. The subjects are crossed with the factor. Something else is a between-subjects factor, for example the kind of treatment, because it compares the experiences of different subjects. Then subjects are nested in the factor.

Due to these two kinds of factors, the SSE consists of two kinds of errors. To analyze every difference between two categories, a confidence interval is required. With the two kinds of errors, residuals can't be used. What can be used instead, are multiple one-way ANOVA F-tests with Bonferroni's method.

Multivariate analysis of variance (MANOVA) is a method that can handle multivariate responses and that makes less assumptions. The disadvantage of making less assumptions is that it has a weaker power.

A disadvantage of repeated measures in general is that it requires data from all subjects in all moments. A model that has both fixed effects and random effects is called a mixed model.

 

Image

Access: 
Public

Image

Join WorldSupporter!
Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Institutions, jobs and organizations:
Activity abroad, study field of working area:

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: Annemarie JoHo
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
1783