15. Analysis of Variance

There are situations and experiments that require processes to be compared at more than two levels. Data from such experiments can be analysed using analysis of variance or ANOVA.

15.1. Comparing Population Means

There are other ways to compare population means than ANOVA, but these are based on the assumption of either paired observations or independent random samples, and can only be used to compare two population means. ANOVA can be used to compare more than two populations, and also uses assessments of variation, which forms a large problem in other methods.

15.2. One-Way ANOVA

The procedure for testing the equality of population means is called a one-way ANOVA. This procedure is based on the assumption that all included populations have a common variance.

The total sum of squares (SST) in this procedure is made up of a within-group sum of squares (SSW) and a between groups sum of squares (SSG): SST = SSW + SSG

This division of the SST forms the basis of the one-way ANOVA, as it expresses the total variability around the mean for the sample observations.

If the null hypothesis is true (all population means are the same) then both SSW and SSG can be used to estimate the common population variance. This is done by dividing by the appropriate number of degrees of freedom.

Because SSW and SSG both provide an unbiased estimate of the common population variance if the null hypothesis is true, a difference between the two values indicates that the null hypothesis is false. The test of the null hypothesis is thus based on the ratio of mean squares:

Where  and . With the assumptions that the population variances are equal and the population distributions are normal.

The closer the ratio is to 1, the less indication there is that the null hypothesis is false.

 

These results are also summarized in a one-way ANOVA table, which has the following format:

... Interested? Read the instructions below in order to read the full content of this page.

Source of Variation

Sum of Squares

Degrees of Freedom

Mean Squares

F-ratio

Between groups

SSG

K – 1

MSG

MSG/MSW


Access options

      How do you get full online access and services on JoHo WorldSupporter.org?

      1 - Go to www JoHo.org, and join JoHo WorldSupporter by choosing a membership + online access
       
      2 - Return to WorldSupporter.org and create an account with the same email address
       
      3 - State your JoHo WorldSupporter Membership during the creation of your account, and you can start using the services
      • You have online access to all free + all exclusive summaries and study notes on WorldSupporter.org and JoHo.org
      • You can use all services on JoHo WorldSupporter.org (EN/NL)
      • You can make use of the tools for work abroad, long journeys, voluntary work, internships and study abroad on JoHo.org (Dutch service)
      Already an account?
      • If you already have a WorldSupporter account than you can change your account status from 'I am not a JoHo WorldSupporter Member' into 'I am a JoHo WorldSupporter Member with full online access
      • Please note: here too you must have used the same email address.
      Are you having trouble logging in or are you having problems logging in?

      Toegangsopties (NL)

      Hoe krijg je volledige toegang en online services op JoHo WorldSupporter.org?

      1 - Ga naar www JoHo.org, en sluit je aan bij JoHo WorldSupporter door een membership met online toegang te kiezen
      2 - Ga terug naar WorldSupporter.org, en maak een account aan met hetzelfde e-mailadres
      3 - Geef bij het account aanmaken je JoHo WorldSupporter membership aan, en je kunt je services direct gebruiken
      • Je hebt nu online toegang tot alle gratis en alle exclusieve samenvattingen en studiehulp op WorldSupporter.org en JoHo.org
      • Je kunt gebruik maken van alle diensten op JoHo WorldSupporter.org (EN/NL)
      • Op JoHo.org kun je gebruik maken van de tools voor werken in het buitenland, verre reizen, vrijwilligerswerk, stages en studeren in het buitenland
      Heb je al een WorldSupporter account?
      • Wanneer je al eerder een WorldSupporter account hebt aangemaakt dan kan je, nadat je bent aangesloten bij JoHo via je 'membership + online access ook je status op WorldSupporter.org aanpassen
      • Je kunt je status aanpassen van 'I am not a JoHo WorldSupporter Member' naar 'I am a JoHo WorldSupporter Member with 'full online access'.
      • Let op: ook hier moet je dan wel hetzelfde email adres gebruikt hebben
      Kom je er niet helemaal uit of heb je problemen met inloggen?

      Join JoHo WorldSupporter!

      What can you choose from?

      JoHo WorldSupporter membership (= from €5 per calendar year):
      • To support the JoHo WorldSupporter and Smokey projects and to contribute to all activities in the field of international cooperation and talent development
      • To use the basic features of JoHo WorldSupporter.org
      JoHo WorldSupporter membership + online access (= from €10 per calendar year):
      • To support the JoHo WorldSupporter and Smokey projects and to contribute to all activities in the field of international cooperation and talent development
      • To use full services on JoHo WorldSupporter.org (EN/NL)
      • For access to the online book summaries and study notes on JoHo.org and Worldsupporter.org
      • To make use of the tools for work abroad, long journeys, voluntary work, internships and study abroad on JoHo.org (NL service)

      Sluit je aan bij JoHo WorldSupporter!  (NL)

      Waar kan je uit kiezen?

      JoHo membership zonder extra services (donateurschap) = €5 per kalenderjaar
      • Voor steun aan de JoHo WorldSupporter en Smokey projecten en een bijdrage aan alle activiteiten op het gebied van internationale samenwerking en talentontwikkeling
      • Voor gebruik van de basisfuncties van JoHo WorldSupporter.org
      • Voor het gebruik van de kortingen en voordelen bij partners
      • Voor gebruik van de voordelen bij verzekeringen en reisverzekeringen zonder assurantiebelasting
      JoHo membership met extra services (abonnee services):  Online toegang Only= €10 per kalenderjaar
      • Voor volledige online toegang en gebruik van alle online boeksamenvattingen en studietools op WorldSupporter.org en JoHo.org
      • voor online toegang tot de tools en services voor werk in het buitenland, lange reizen, vrijwilligerswerk, stages en studie in het buitenland
      • voor online toegang tot de tools en services voor emigratie of lang verblijf in het buitenland
      • voor online toegang tot de tools en services voor competentieverbetering en kwaliteitenonderzoek
      • Voor extra steun aan JoHo, WorldSupporter en Smokey projecten

      Meld je aan, wordt donateur en maak gebruik van de services

      Access: 
      JoHo members
      Work for WorldSupporter

      Image

      JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

      Working for JoHo as a student in Leyden

      Parttime werken voor JoHo

      Image

      This content is also used in .....

      Samenvatting Statistics for Business and Economics

      Multiple Regression (12)

      Multiple Regression (12)

      12. Multiple Regression

      Simple regression (see chapter 11) can predict a dependent variable as a function of a single independent variable. But often there are multiple variables at play. In order to determine the simultaneous effect of multiple independent variables on a dependent variable, multiple regression is used. The least squares principle fit the model.

      12.1. The model

      As with simple regression, the first step in the model development is model specification, the selection of the model variables and functional form of the model. This is influenced by the model objectives, namely: (1) predicting the dependent variable, and/or (2) estimating the marginal effect of each independent variable. The second objective is hard to achieve, however, in a model with multiple independent variables, because these variables are not only related to the dependent variable but also to each other. This leaves a web of effects that is not easily untangled.

      To make multiple regression models more accurate an error termε” is added, as a way to recognize that none of the described relationships in the model will hold exactly and there are likely to be variables that affect the dependent variable, but are not included in the model.

      12.2. Estimating Coefficients

      Multiple regression coefficients are calculated with the least squares procedure. However, again this is more complicated than with simple regression, as the independent variables not only affect the dependent variable but also each other. It is not possible to identify the unique effect of each independent variable on the dependent variable. This means that the higher the correlations between two or more of the independent variables in a model are, the less reliable the estimated regression coefficients are.

      There are 5 assumptions to standard multiple regression. The first 4 are the same as are made for simple regression (see chapter 11). The 5th states that it is not possible to find a set of nonzero numbers such that the sum of the coefficients equals 0. This assumption excludes the cases in which there is a linear relationship between a pair of independent variables. In most cases this assumption will not be violated if the model is properly specified.

      Whereas in simple regression the least squares procedure finds a line that best represents the set of points in space, multiple regression finds a plane that best represents these points (as each variable is represented with its own dimension).

      It is important to be aware of the fact that in a multiple regression it is not possible to know which independent variable predicts which change in the dependent variable. After all, the slope coefficient estimated is affected by the correlations between all independent and dependent variables. This also means that any multiple regression coefficient is dependent on all independent variables in the model. These coefficients are thus referred to as conditional coefficients. This is the case is all multiple regression models unless there are two independent variables with a sample correlation of zero (but this is very unlikely). Because of this.....read more

      Access: 
      JoHo members
      Several Aspects of Regression Analysis (13)

      Several Aspects of Regression Analysis (13)

      13. Several Aspects of Regression Analysis

      This chapter focusses on topics that add to the understanding of regression analysis. This includes alternative specifications for these models, and what happens in the situations where basic regression assumptions are violated.

      13.1. Developing models

      The goal when developing a model is to approximate the complex reality as close as possible with a relatively simple model, which can then be used to provide insight into reality. It is impossible to represent all of the influences in the real situation in a model, instead only the most influential variables are selected.

      Building a statistical model has 4 stages:

      1. Model Specification: This step involves the selection of the variables (dependent and independent), the algebraic form of the model, and the required data. In order to this correctly it is important to understand the underlying theory and context for the model. This stage may require serious study and analysis. This step is crucial to the integrity of the model.
      2. Coefficient Estimation: This step involves using the available data to estimate the coefficients and/or parameters in the model. The desired values are dependent on the objective of the model. Roughly there are two goals:
        1. Predicting the mean of the dependent variable: In this case it is desirable to have a small standard error of the estimate, se. The correlations between independent variables need to be steady, and there needs to be a wide spread for these independent variables (as this means that the prediction variance is small).
        2. Estimating one or more coefficients: In this case a number of problems arise, as there is always a trade-off between estimator bias and variance, within which a proper balance must be found. Including an independent variable that is highly correlated with other independent variables decreases bias but increases variance. Excluding the variable decreases variance, but increases bias. This is the case because both these correlations and the spread of the independent variables influence the standard deviation of the slope coefficients, sb.
      3. Model Verification: This step involves checking whether the model is still accurate in its portrayal of reality. This is important because simplifications and assumptions are often made while constructing the model, this can lead to the model becoming (too) inaccurate). It is important to examine the regression assumptions, the model specification, and the selected data. If something is wrong here, we return to step 1.
      4. Interpretation and Inference: This step involves drawing conclusions from the outcomes of the model. Here it is important to remain critical. Inferences drawn from these outcomes can only be accurate if the previous 3 steps have been completed properly. If these outcomes differ from expectations or previous findings you must be critical about whether this is due to the model or whether you really have found something new.

      13.2. Further Application of Dummy Variables

      Dummy variables were introduced in chapter 12 as a way to include categorical variables in regression analysis. Further uses for these variables will be.....read more

      Access: 
      JoHo members
      Analysis of Variance (15)

      Analysis of Variance (15)

       

      15. Analysis of Variance

      There are situations and experiments that require processes to be compared at more than two levels. Data from such experiments can be analysed using analysis of variance or ANOVA.

      15.1. Comparing Population Means

      There are other ways to compare population means than ANOVA, but these are based on the assumption of either paired observations or independent random samples, and can only be used to compare two population means. ANOVA can be used to compare more than two populations, and also uses assessments of variation, which forms a large problem in other methods.

      15.2. One-Way ANOVA

      The procedure for testing the equality of population means is called a one-way ANOVA. This procedure is based on the assumption that all included populations have a common variance.

      The total sum of squares (SST) in this procedure is made up of a within-group sum of squares (SSW) and a between groups sum of squares (SSG): SST = SSW + SSG

      This division of the SST forms the basis of the one-way ANOVA, as it expresses the total variability around the mean for the sample observations.

      If the null hypothesis is true (all population means are the same) then both SSW and SSG can be used to estimate the common population variance. This is done by dividing by the appropriate number of degrees of freedom.

      Because SSW and SSG both provide an unbiased estimate of the common population variance if the null hypothesis is true, a difference between the two values indicates that the null hypothesis is false. The test of the null hypothesis is thus based on the ratio of mean squares:

      Where  and . With the assumptions that the population variances are equal and the population distributions are normal.

      The closer the ratio is to 1, the less indication there is that the null hypothesis is false.

       

      These results are also summarized in a one-way ANOVA table, which has the following format:

      Source of Variation

      Sum of Squares

      Degrees of Freedom

      Mean Squares

      F-ratio

      Between groups

      SSG

      K – 1

      MSG

      MSG/MSW

      Within groups

      SSW

      n – K

      MSW

      Total

      SST

      n – 1

       

       

      It is also possible to calculate a minimum significant difference (MSD) between two sample means, as evidence to conclude whether the population means are different. This is done:

      With sp being the estimate of variance (),.....read more

      Access: 
      JoHo members
      Predictions with Time-Series Data (16)

      Predictions with Time-Series Data (16)

      16. Predictions with Time-Series Data

      Time series data involves measurements that are ordered over time, in which the sequence of observations is important. Most procedures for data analysis cannot be used for this data, as these procedures are based on the assumption that the errors are independent. Thus, different forms of analysis are needed.

      The main goal of analysing time-series data is to make predictions. An important assumption here is that the relations between variables remain constant.

      16.1. Time-Series Components

      Most time-series have the following four components:

      1. Trend component: Values grow or decrease steadily over long periods of time.
      2. Seasonality component: An oscillatory patterns that is specific per season (quarter year) repeats itself.
      3. Cyclical component: And oscillatory or cyclical pattern that is not related to seasonal behaviour.
      4. Irregular component: No pattern is regular enough to only exist through these predictable trends; each series of data will also have irregular components (similar to the random error term).

      Analysis of time-series data involves constructing a formal model in which most of these components are explicitly or implicitly present, in order to describe the behaviour of the data series. In building this model the series components can either be regarded as being fixed over time, or as steadily evolving over time.

      16.2. Moving Averages

      Moving averages are the basis for many practical adjustment procedures. It can be used to remove the irregular component or smooth seasonal component:

      • Removing the irregular component: This is done by replacing each observation with the average of itself and its neighbours. The theory is that this will decrease the effect of the irregular component on each data point.
      • Smoothing the seasonal component: This is done by producing four-period moving averages in such a manner that the seasonal values become one single seasonal moving average. This does mean that the values have shifted in time (in comparison to the original series), but this can be corrected by centring the averages. The specific procedure always depends on the amount of stability the pattern is assumed to have, and whether seasonality is thought to be additive or multiplicative (in the latter case: use logarithms).
        If there is an assumption of a stable seasonal pattern a further seasonal-adjustment approach can be used: the seasonal index method. Here the original series is expressed as a percentage of the centred 4-point moving average series.

      Additionally moving averages are very suitable for detecting cyclical components and/or trends.

      16.3. Predictions using smoothing

      There are a various prediction methods, and the choice you make should always depend on the resources, the objectives, and the available data.

      Simple exponential smoothing is a more basic prediction method that is appropriate when the series is non-seasonal and has no consistent trends. It predicts future values on the basis of an estimate of the current level of the time series. This estimate is comprised of a weighted average of current and past values, where most weight is given to the most recent observations (with decreasing weight.....read more

      Access: 
      Public
      Sampling (17)

      Sampling (17)

      17. Sampling

      There are various ways of sampling a population, according to research and analysis goals.

      17.1. Stratified Sampling

      Stratified sampling involves breaking the population into strata (a.k.a. subgroups) according to a specific identifiable characteristic in such a way that each member of the population belongs to only one strata. Stratified random sampling is the process of selecting independent simple random samples from each strata. A question that arises here Is how to allocate the sampling effort among the strata. There are various possibilities:

      • Proportional allocation: The proportion of the sample from a stratum is the same as the proportion of that stratum to the population. This is used if there is little to nothing known about the population and there are no strong requirements for the production of information.
      • Optimal allocation: More sample effort is allocated to strata with a higher population variance. This is used if the objective is to estimate an overall population parameter (such as mean, total, or proportion) as precisely as possible. This method is only optimal with this goal in mind.

      Analysing the results of stratified random samples is relatively straightforward, and any stratum sample mean (mj) can be used as an unbiased estimator of the population mean (μj). It can also be sued to estimate the population total, as this is the product of the population mean and the number of population members.

      17.2. Other Ways to Sample

      Various other sampling methods are:

      • Cluster Sampling: This method can be used when a population can be subdivided into small geographical units, or clusters. A simple random sample of clusters is then selected, and each member of these clusters is contacted for data. Using this method very little prior information of the population is needed.
      • Two-Phase Sampling: In this method the regular data-collection is preceded by a smaller pilot study, in which a smaller sample is used. This cost more time but allows for methods and procedures to be improved, and can provide some estimations for the true study.
      • Non-random sampling: There are two main methods:
        • Non-probabilistic sampling: Sample members are selected by convenience. This often means that the sample is not representative of the population and lacks proper statistical validity.
        • Quota sampling: There are specified numbers of people of certain characteristics (race, age, gender etc.) that are contacted. This usually produces quite accurate estimates of population parameters, but it is not possible to determine the reliability of these estimates, because the sample was not randomly chosen.
      Access: 
      Public
      Bullets Statistics for Business and Economics

      Bullets Statistics for Business and Economics

      12. Multiple Regression

      • Regression objectives are either to predict the value of the dependent variable, or to estimate the marginal effect of each independent variable.
      • A population multiple regression model is a model that includes multiple independent variables.
      • Standard multiple regression assumptions include the four standard simple regression assumptions, plus a fifth one: It is not possible to find a set off nonzero numbers such that the sum of the coefficients equals zero.
      • Multiple regression models include an error term, ε, that represents variability caused by variables not included in the model.
      • In multiple regression coefficients are estimated using least squares, but these estimates become less reliable the higher the correlations between independent variables are.
      • Any regression coefficient in a multiple regression model is dependent on all independent variables, and are thus referred to as conditional coefficients.
      • Mean square regression (MSR) shows the proportion of the variability by the dependent variable that can be explained by the regression model.
      • In a multiple regression model the sum-of-squares (SST; or sample variability) can be split into the sum of squares regression (SSR; or explained variability) and the sum of squares error (SSE; or unexplained variability). This is referred to as sum-of-squares decomposition.
      • The coefficient of determination, R2, describes the strength of the linear relationship between the independent variables and the dependent variables, and is calculated by 1 – SSE/SST.
      • Adding more independent variables leads to a misleading increase in R2, which can be avoided by calculating the adjusted coefficient of determination.
      • The coefficient variance estimator, s2b, is calculated as:
        The square root of s2b is the coefficient standard error.
      • Multiple regression models can be transformed into non-linear models, namely quadratic models and logarithmic models.
      • Dummy variables can be used to represent categorical data in a regression model, and have a value of either 0 or 1.

       

      13. Additional Topics in Regression Analysis

      • Models are developed through four steps: model specification (selecting the variables, the algebraic form, and the data), coefficient estimation, model verification (checking whether the model is still accurate), and interpretation and inference.
      • Dummy variables can be used to represent more than two categories by using multiple dummy variables. The rule is: number of categories -1 = number of dummy variables.
      • In time series data the values of the dependent variable are related, this is then referred to as a lagged dependent variable.
      • Not including important independent variables in a model can make any conclusions drawn from this model faulty.
      • Multicollinearity is the phenomenon of two highly correlated independent variables. This leads to misleading estimated coefficients.
      • Correlations between error terms are called auto-correlated errors. This leads to the estimated standard errors for the coefficients being biased, the null hypotheses falsely being rejected, and confidence intervals being too narrow. Autocorrelation can be formally tested with the Durbin-Watson test.

      15. Analysis of Variance

      • An Analysis of Variance (ANOVA) can be used to analyze data at more
      • .....read more
      Access: 
      JoHo members
      Oefenvragen Statistics for Business and Economics

      Oefenvragen Statistics for Business and Economics


      12. Multiple Regression

      1. A study was conducted to assess the influence of various factors on the start of new firms in the agricultural industry. For a sample of 70 countries the following model was estimated:

      yn = -59.31 + 4.983x1 + 2.198x2 + 3.816x3 - 0.310x4 11.1562 10.2102 12.0632 10.3302

      -0.886x5 + 3.215x6 + 0.85x7 13.0552 11.5682 10.3542

      R2 = 0.766

      where:

      yn = new business starts in the industry

      x1 = population in millions

      x2 = industry size

      x3 = measure of economic quality of life

      x4 = measure of political quality of life

      x5 = measure of environmental quality of life

      x6 = measure of health and educational quality of life

      x7 = measure of social quality of life

      The numbers in parentheses under the coefficients are the estimated coefficient standard errors.

      a. Interpret the estimated regression coefficients.

      b. Interpret the coefficient of determination.

      c. Find a 90% confidence interval for the increase in new business starts resulting from a one-unit increase in the economic quality of life, with all other variables unchanged.

      d. Test, against a two-sided alternative at the 5% level, the null hypothesis that, all else remaining equal, the environmental quality of life does not influence new business starts.

      e. Test, against a two-sided alternative at the 5% level, the null hypothesis that, all else remaining equal, the health and educational quality of life does not influence new business starts.

      f. Test the null hypothesis that, taken together, these seven independent variables do not influence new business starts.

      2. Based on 25 years of annual data, an attempt was made to explain savings in Japan. The model fitted was as follows:

      y = b0 + b1x1 + b2x2 + e

      where

      y = change in real deposit rate

      x1 = change in real per capita income

      x2 = change in real interest rate

      The least squares parameter estimates (with standard errors in parentheses) were (Ghatak and Deadman 1989) as follows:

      b1 = 0.097410.02152 b2 = 0.37410.2092

      The adjusted coefficient of determination was as follows:

      R2 = .91

      a. Find and interpret a 99% confidence interval for b1.

      b. Test, against the alternative that it is positive, the null hypothesis that b2 is 0.

      c. Find the coefficient of determination.

      d. Test the null hypothesis that b1 = b2 = 0.

      e. Find and interpret the coefficient of multiple correlation.

      3. Based on data from 63 countries, the following model was estimated by least squares:

      yn = 0.58 - .052x1 - .005x2 R2 = .17

      1.0192 1.0422

      where:

      yn = growth rate in real gross domestic product

      x1 = real income per capita.....read more

      Access: 
      Public
      Follow the author: Dara Yapp
      Check how to use summaries on WorldSupporter.org


      Online access to all summaries, study notes en practice exams

      Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

      There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

      1. Starting Pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
      2. Use the menu above every page to go to one of the main starting pages
      3. Tags & Taxonomy: gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
      4. Follow authors or (study) organizations: by following individual users, authors and your study organizations you are likely to discover more relevant study materials.
      5. Search tool : 'quick & dirty'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject. The search tool is also available at the bottom of most pages

      Do you want to share your summaries with JoHo WorldSupporter and its visitors?

      Quicklinks to fields of study (main tags and taxonomy terms)

      Field of study

      Comments, Compliments & Kudos:

      Add new contribution

      CAPTCHA
      This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
      Image CAPTCHA
      Enter the characters shown in the image.
      Access level of this page
      • Public
      • WorldSupporters only
      • JoHo members
      • Private
      Statistics
      907