Inferential Statistics, Howell Chapter 7,8,18

Inferential Statistics, Howell Chapter 7,8,18


7.1 Sampling Distribution of the Mean. 1

7.2 Testing Hypotheses about Means – ơ (pop. standard deviation) known (usually not the case). 1

7.3 Testing a Sample Mean vs Pop Mean when ơ is unknown – The one sample t-test. 2

7.4 Confidence intervals. 3

7.5 Other. 5


7.1 Sampling Distribution of the Mean



  • Used for measurement  or quantitative data (instead of categorical data)
  • To analyse difference between groups of subjects or relationship between 2+ variables


Sampling distribution of the mean: Use mean instead of any statistic, like in normal sampling distribution


Central Limit Theorem: Basis to set up sample distribution with the mean







  • If pop. is skewed, samplessizes n = 30+ is needed to approximate a normal distribution


Uniform rectangular distribution:     mean = range /2    standard dev = range / √12








If we take samples from this population, the sampling distribution will better approximate a normal distribution, if we take samplesizes of n =30 instead of n = 5

+ the higher the samplesize, the lower the standard deviation of the sampling distribution

7.2 Testing Hypotheses about Means – ơ (pop. standard deviation) known (usually not the case)


  • We can do this by using the z score and table (but not if we do not have the ơ of the pop)
  • Usually we do not know the variance of the population we take samples from
  • t-tests are designed for this scenario
  • Central limit theorem states:
    If we take a sample from a pop with µ = 50, then variance = ơ²/n  and  standard dev = ơ / √n



Standard error: standard deviation of the sampling distribution à ơ / √n




Applied in practice:



                !!!!! To test a sample mean vs a pop. mean using t-tests, the sampling distribution needs to approximate the

                a normal distribution !!!!!

7.3 Testing a Sample Mean vs Pop Mean when ơ is unknown – The one sample t-test


  • Ơ is not known à has to be estimated using the sample standard deviation. (replace ơ with s)
  • Z becomes t à can no longer use z-tables but use student´s t distribution
  • If we used z, we would get too many significant results, thus make more than 5% type I errors

(reject Ho even though it is true)


Sampling distribution of s²:

  • ttest uses s² as an (unbiased) estimate of ơ²
  • Problem: Shape of sampling distribution  under s² is positively skewed (lower standard deviations are more likely / variance is more likely to be not so big with small samples)

    Tvalue obtained from s² is likely to be larger than the zvalue obtained from ơ

T-statistic formula:



                Remember: t statistic can only be compared to the pop. mean if sample size is big enough
                                        à because: sample distribution needs to be approximately normal




                Student´s  t- distribution:




  • Works with degrees of freedom (df) : n – 1 (number of observations in sample – 1 )
  • Because: Formula of s² = ∑(x-x) leaves 1 value that is already determined if the other values are known. à so that the ∑ = 0
  • Skewness disappears as the df / samplesize increases


7.4 Confidence intervals


  • Given to convey meaning of experimental results beyond the hypothesis test


Point estimate: A specific estimator of a parameter. E.g. sample mean is an estimate of pop. mean


Confidence interval: Interval estimates that describe the probability that the true pop. mean is included in them

  • we want to know how big or small the pop. mean can be without us rejecting it.

Confidence limits: Borders of the confidence interval


Method: Rearrange formula for one-sample t test. Solve it this time not for t but for µ






General formula for confidence intervals (credible intervals):





Confidence Intervals visualised:



                How to identify extreme cases (population estimates are unreliable)

  • apply this new formula, because sample size is small and thus, variance in sampling distribution is skewed
  • Remember: Problem with small samples is that we may calculate a disproportionally large z score
  • Now instead of using z scores to determine if the score is unlikely (like we learned in the first course)

, we use this corrected formula: Standard deviation is made bigger, so that the t value will be smaller.


!!! works with degrees of freedom:  n – 1 !!!


7.5 Other


                Bootstrapping: Done to estimate the variability of any sample statistic over repeated sampling

  • Sampling with replacement from obtained data, instead of from population




7.4 Hypothesis Tests applied to Means – Two matched samples. 1

7.5 Hypothesis Tests applied to Means – Two independent samples. 1

7.6 Heterogeneity of Variance: The Behrens-Fischer Problem.. 3


7.4 Hypothesis Tests applied to Means – Two matched samples


                Matched sample: (also;: repeated measures, related samples, correlated samples, paired samples or dependent

                                                                Samples.) Same Subjects respond on two occasions. If you have one set of scores, this

                                                                always tells you something about the other set of scores, because they are matched.


                Matched-sample t-test: Test the difference between the means

                                                                 ( Variables should be independent à may plot the points to check this)

  • Set up Ho: µ1 = µ2
  • Scores may the combined into difference or gain scores: X1 – X2 = D (diff.) (p199, 7.3) And Ho can be formulated         µD = µ1 - µ2 = 0
  • Create t test according to this difference score:




  • Calculate     df = n - 1


Missing Data: 2 ways of dealing with this:             1. Exclude missings

2. Create t-test with only available, then missing score and then

     combine and compare these with special tables.


7.5 Hypothesis Tests applied to Means – Two independent samples


                Sampling distribution of differences between means: -      We sample independently from each population

  • The sum or difference of two independent normally

distributed variables is itself normally distributed.

  • Variance should be ơ²1 = ơ²2 = ơ²
  • (remember however that t tests are robust = more or

less unaffected by small departures of the assumptions


                Variance sum law: The variance of a sum or difference of two independent variables is equal to the sum of

                                                      their variances.


      !!! Variances of the 2 samples have to be equal or at least similar !!!!

                                                      (e.g. before experiment, we always check that samples are as similar as possible so we may

                                                       may attribute differences to out experiment and not to error variance)

                                                      If sample size varies à use pooling (see next page below)






                                                                          Formula of the variance sum law


2 independent variables combined into the sampling distribution of mean differences.

Textfeld: 2 independent variables combined into the sampling distribution of mean differences. 







Pop ơ is known à use Z score and table

Textfeld: Pop ơ is known à use Z score and table

Standard error of differences between means. (stand. dev.)

Textfeld: Standard error of differences between means. (stand. dev.)


                T-test statistic of sampling distribution of mean difference: (pooling)




µ1 - µ2 = 0 , therefore we may drop the term in the formula

Textfeld: µ1 - µ2 = 0 , therefore we may drop the term in the formula



Pop ơ is not known à use t score and table (also df)

Textfeld: Pop ơ is not known à use t score and table (also df)                



                Pooling of variance (used when diff. sample size) + (only when variances are homogeneous)


  1. Step: Weighted average of 1 and S²2 à Use degrees of freedom



  1. Step: Pooled variance estimate:


Don´t forget: - 2 df on t-table




Degrees of freedom: Because we have two variances that are squared we lose 1 df  for each, thus substract -2

  • only counts for independent samples (example calculations: p211, p216)


7.6 Heterogeneity of Variance: The Behrens-Fischer Problem


                Heterogeneous variances: Use à not necessarily distributed on           n1 + n2 – 2df      on t- table

  • Behrens Fischer Problem: (they tried to create a table for this distribution but they couldn’t calculate the t for high degrees of freedom)






  • Welch-Satterthwaite solution: df´ (df are unknown and taken to their nearest integer) à df is bound as: Min (n1 – 1, n2 – 1) ≤ df´≤ (n1 + n2 – 2)



Testing for heterogeneity: Test this differenc of variance of our samples = S²1 and 2²2

  • By replacing each value of X with its absolute deviation from the group mean


 dij= Xij - X


  • Or by the squared deviation


dij=(Xij - X)²


  • Then run a normal two-sample t-test on the dijs
  • If t turns out to be significant, we may conclude that the 2 samples differ in their variances


Testing for homogeneity: Run a test for homogeneity (not yet learned?)

  • If variance is not homogeneous than pool the variance estimates.




7.0 Confidence Intervals. 1

One sample case. 1

Two sample case. 1

7.1 Effect size. 1

Cohen´s d. 1

One sample case. 1

Two sample case. 1

8.0 Power. 2

Factors affecting power. 2

Calculation of power. 2

Estimating the required sample size. 3

8.1 Noncentrality parameter δ. 3

8.2 Retrospective Power. 3


7.0 Confidence Intervals


One sample case



    Solve t formula for µ instead of t


Two sample case


  • Solve t formula for µ instead of t (like in the one sample case)
  • Use difference between the means and standard error of differences between means instead of mean or SE of mean







7.1 Effect size


  • Used when we examine differences between 2 related measures.
  • Confidence limits on effect size based on previous research are biased (narrower confidence limits than true)
  • Because only significant findings are published




Cohen´s d




Reports difference in standard deviation units




One sample case


Estimate of d (as in example from the book, p. 204)



Two sample case



8.0 Power


                Power: Probability of correctly rejecting a wrong Ho. More power = higher probability of rejecting Ho

                                Power = 1 - β







Figure 8.4



  Textfeld: Figure 8.4



Factors affecting power


  • Alpha (α). The larger α, the more power
  • Distance between means. The larger H1 the bigger the power
  • Sample size (n). If n increases, std.err. decreases à overlap between sampling distr. Decreases, thus higher power
  • Variance (σ²). If σ² decreases à overlap betw. Sampling. Dist. Decreases, thus higher power

                           à  variance of sampling distr. Is bound to sample size, because σx2= σ² / n

Calculation of power



    Because overlap is determinant for power, we may use Cohen´s d to asses how far the means differ, thus infer power from the size of d.


  • 3 methods to estimate d:              1. Prior research findings

2. Personal assessment of what difference would be important

3. Use of Cohen´s table



    Combine effect size (d) with sample size (n) à find delta (δ)




Estimating the required sample size

8.1 Noncentrality parameter δ





  • If Ho = true, t is distributed around zero

    If Ho = not true, t is distributed as δ (degree of noncentrality) à expresses the degree of wrongness of Ho


8.2 Retrospective Power


                Priori power: Power that is calculated before an experiment. Based on estimates population parameters. (means,

           variances, correlations, proportions)


                Retrospective (or post-hoc) power: Calculated after experiment.  Done with G Power tool (p. 244)

                                                                                       Purpose:  Help to design future research, evaluate studies in literature (meta-                                                                                             analysis)



18.0 Recap


            Parametric tests


T-Test: - uses sample variance as pop var. estimate. à assumption that population from which sample is

                                                                                                                Is normal.

            Non-parametric tests / Distribution free tests


  • Fall under the resampling tests (base conclusion on drawing a large number of samples under assum-

ption that Ho =  true) –> than they compare obtained sample result with resampled results


  • Some resampling procedures deal with raw scores, rather than with ranks
    à Bootstrapping + Randomization tests

à Used when we are uncertain of assumptions (e.g. normal distr. of population)
à Used also when we do not have good parametric tests (e.g. Conf. Int. on a median)



- Require general assumptions

- Lower power

- Are sensitive do medians rather than means

- Less specific

- Unaffected by outliers




à Ho is usually if 2 populations are symmetric or have a similar shape



                                                                                                Bootstrapping: Interested in median whose sampling distribution and SE cannot be derived analytically

                        à procedure is with replacement


                Permutation tests: à procedure without replacement


                Rank-randomization tests: Wilcoxon´s test and permutation test (draw every possible permutation only once)

18.1 Bootstrapping


                Use:       - Population distribution is not normal or unknown

                                - To estimate pop. parameters rather than testing hypotheses

                                - If we want confidence interval not of the mean

                                - sampling distribution with replacement

18.2 Bootstrapping with one sample


                Finding a confidence interval 95% (example, p.661)


  • Assumption: population distr. = sample distribution
  • Draw a large number of samples under this assumption with n = 20
  • Determine which values encompass the 95% à sort medians and cut off lowest and highest 2.5%

















18.6 Wilcoxon´s Rank-Sum Test


            Use: Analogue to t-test but it tests a broader Ho.

à Ho = 2 samples are drawn at random from identical Populations ( not just pop. with the same mean)

à if Ho is rejected, this means that the 2 pop. had different central tendencies


            How it works:


  • Assign ranks to observations of 2 independent samples
  • Add scores for each sample = W Test statistic
  • Check with W-Table if significant or not


















Add Rank-Scores to get W test statistic















Now compare Ws of the smaller sample!!!  to W table, which shows the smallest value that can be expected by chance if Ho = true.


  • Scores of the small sample can be big which means that if Ho = false, the sum of the ranks would be larger than chance expectation instead of smaller.
  • Calculated W´s  - 2W is given in W-table



2W = n1( n1+ n2+ 1)


  • Use W´s or Ws (whichever is smaller) to compare it to the table.
  • Two tailed test: Double the value of α









The normal approximation


                Ws distribution approaches normal, when sample size increases








                                                                                                    Parameters of the Ws- Distribution



                We can use z, because Ws is normally distributed










  • Use z to calculate a true probability of obtaining the Ws as low as the one we got.


Example (p.672)



            Treatment of Ties


  • When data contains tied scores, a test that relies on ranks is distorted
  • Assign ranks so that Ho gets hard to reject

Mann-Whitney U Statistic


  • Competitor of Wilcoxon´s test
  • U and W differ only by a constant
  • U and W can be converted with Wtable









18.7 Wilcoxon´s Matched-Pairs Signed-Ranks Test


  • Used because sample scores do not appear to reflect a normally distributed population.
  • Nonparametric analogue to ttest for matched samples
  • Tests Ho that distribution of difference scores (in the population) is symmetric about zero.



How to use:


  1. Calculate difference scores
  2. Rank all differences without regard to the sign
  3. Sum the positive and negative ranks
  4. This will give you the T test statistic (smaller sum) +  ignore the sign
  5. Evaluate against the T table


Relevant T Score

















  • If 0, eliminate participant from consideration
  • Assign tied ranks


The normal Approximation


  • Large samples size = T is approx. normally distributed













18.8 The Sign Test


  • Gain even more freedom from assumptions than Wilcoxon test
  • Lose power


How to

  1. Give difference scores a +  or – sign
  2. Sum them and calculated probability (with binomial distribution tables) of that outcome. Eg. p(13) of 16
  3. Use X² test (Chi-Square) p.678