Inferential Statistics, Howell Chapter 7,8,18
Inferential Statistics, Howell Chapter 7,8,18
Inhalt
7.1 Sampling Distribution of the Mean. 1
7.2 Testing Hypotheses about Means – ơ (pop. standard deviation) known (usually not the case). 1
7.3 Testing a Sample Mean vs Pop Mean when ơ is unknown – The one sample ttest. 2
7.1 Sampling Distribution of the Mean
Function:
 Used for measurement or quantitative data (instead of categorical data)
 To analyse difference between groups of subjects or relationship between 2+ variables
Sampling distribution of the mean: Use mean instead of any statistic, like in normal sampling distribution
Central Limit Theorem: Basis to set up sample distribution with the mean
 If pop. is skewed, samplessizes n = 30+ is needed to approximate a normal distribution
Uniform rectangular distribution: mean = range /2 standard dev = range / √12
If we take samples from this population, the sampling distribution will better approximate a normal distribution, if we take samplesizes of n =30 instead of n = 5
+ the higher the samplesize, the lower the standard deviation of the sampling distribution
7.2 Testing Hypotheses about Means – ơ (pop. standard deviation) known (usually not the case)
 We can do this by using the z score and table (but not if we do not have the ơ of the pop)
 Usually we do not know the variance of the population we take samples from
 ttests are designed for this scenario
 Central limit theorem states:
If we take a sample from a pop with µ = 50, then variance = ơ²/n and standard dev = ơ / √n
Standard error: standard deviation of the sampling distribution à ơ / √n
Applied in practice:
!!!!! To test a sample mean vs a pop. mean using ttests, the sampling distribution needs to approximate the
a normal distribution !!!!!
7.3 Testing a Sample Mean vs Pop Mean when ơ is unknown – The one sample ttest
 Ơ is not known à has to be estimated using the sample standard deviation. (replace ơ with s)
 Z becomes t à can no longer use ztables but use student´s t distribution
 If we used z, we would get too many significant results, thus make more than 5% type I errors
(reject Ho even though it is true)
Sampling distribution of s²:
 ttest uses s² as an (unbiased) estimate of ơ²
 Problem: Shape of sampling distribution under s² is positively skewed (lower standard deviations are more likely / variance is more likely to be not so big with small samples)

Tvalue obtained from s² is likely to be larger than the zvalue obtained from ơ
Tstatistic formula:
Remember: t statistic can only be compared to the pop. mean if sample size is big enough
à because: sample distribution needs to be approximately normal
Student´s t distribution:
 Works with degrees of freedom (df) : n – 1 (number of observations in sample – 1 )
 Because: Formula of s² = ∑(xx) leaves 1 value that is already determined if the other values are known. à so that the ∑ = 0
 Skewness disappears as the df / samplesize increases
7.4 Confidence intervals
 Given to convey meaning of experimental results beyond the hypothesis test
Point estimate: A specific estimator of a parameter. E.g. sample mean is an estimate of pop. mean
Confidence interval: Interval estimates that describe the probability that the true pop. mean is included in them
 we want to know how big or small the pop. mean can be without us rejecting it.
Confidence limits: Borders of the confidence interval
Method: Rearrange formula for onesample t test. Solve it this time not for t but for µ
General formula for confidence intervals (credible intervals):
Confidence Intervals visualised:
How to identify extreme cases (population estimates are unreliable)
 apply this new formula, because sample size is small and thus, variance in sampling distribution is skewed
 Remember: Problem with small samples is that we may calculate a disproportionally large z score
 Now instead of using z scores to determine if the score is unlikely (like we learned in the first course)
, we use this corrected formula: Standard deviation is made bigger, so that the t value will be smaller.
!!! works with degrees of freedom: n – 1 !!!
7.5 Other
Bootstrapping: Done to estimate the variability of any sample statistic over repeated sampling
 Sampling with replacement from obtained data, instead of from population
Inhalt
7.4 Hypothesis Tests applied to Means – Two matched samples. 1
7.5 Hypothesis Tests applied to Means – Two independent samples. 1
7.6 Heterogeneity of Variance: The BehrensFischer Problem.. 3
7.4 Hypothesis Tests applied to Means – Two matched samples
Matched sample: (also;: repeated measures, related samples, correlated samples, paired samples or dependent
Samples.) Same Subjects respond on two occasions. If you have one set of scores, this
always tells you something about the other set of scores, because they are matched.
Matchedsample ttest: Test the difference between the means
( Variables should be independent à may plot the points to check this)
 Set up Ho: µ1 = µ2
 Scores may the combined into difference or gain scores: X1 – X2 = D (diff.) (p199, 7.3) And Ho can be formulated µD = µ1  µ2 = 0
 Create t test according to this difference score:
 Calculate df = n  1
Missing Data: 2 ways of dealing with this: 1. Exclude missings
2. Create ttest with only available, then missing score and then
combine and compare these with special tables.
7.5 Hypothesis Tests applied to Means – Two independent samples
Sampling distribution of differences between means:  We sample independently from each population
 The sum or difference of two independent normally
distributed variables is itself normally distributed.
 Variance should be ơ²1 = ơ²2 = ơ²
 (remember however that t tests are robust = more or
less unaffected by small departures of the assumptions
Variance sum law: The variance of a sum or difference of two independent variables is equal to the sum of
their variances.
!!! Variances of the 2 samples have to be equal or at least similar !!!!
(e.g. before experiment, we always check that samples are as similar as possible so we may
may attribute differences to out experiment and not to error variance)
If sample size varies à use pooling (see next page below)
Formula of the variance sum law
2 independent variables combined into the sampling distribution of mean differences. 
Pop ơ is known à use Z score and table 
Standard error of differences between means. (stand. dev.) 
Ttest statistic of sampling distribution of mean difference: (pooling)
µ1  µ2 = 0 , therefore we may drop the term in the formula 
Pop ơ is not known à use t score and table (also df) 
Pooling of variance (used when diff. sample size) + (only when variances are homogeneous)
 Step: Weighted average of S²1 and S²2 à Use degrees of freedom
 Step: Pooled variance estimate:
Don´t forget:  2 df on ttable 
Degrees of freedom: Because we have two variances that are squared we lose 1 df for each, thus substract 2
 only counts for independent samples (example calculations: p211, p216)
7.6 Heterogeneity of Variance: The BehrensFischer Problem
Heterogeneous variances: Use t´ à not necessarily distributed on n1 + n2 – 2df on t table
 Behrens Fischer Problem: (they tried to create a table for this distribution but they couldn’t calculate the t for high degrees of freedom)
 WelchSatterthwaite solution: df´ (df are unknown and taken to their nearest integer) à df is bound as: Min (n1 – 1, n2 – 1) ≤ df´≤ (n1 + n2 – 2)
Testing for heterogeneity: Test this differenc of variance of our samples = S²1 and 2²2
 By replacing each value of X with its absolute deviation from the group mean
dij= Xij  X
 Or by the squared deviation
dij=(Xij  X)²
 Then run a normal twosample ttest on the dijs
 If t turns out to be significant, we may conclude that the 2 samples differ in their variances
Testing for homogeneity: Run a test for homogeneity (not yet learned?)
 If variance is not homogeneous than pool the variance estimates.
Estimating the required sample size. 3
8.1 Noncentrality parameter δ. 3
7.0 Confidence Intervals
One sample case

Solve t formula for µ instead of t
Two sample case
 Solve t formula for µ instead of t (like in the one sample case)
 Use difference between the means and standard error of differences between means instead of mean or SE of mean
7.1 Effect size
 Used when we examine differences between 2 related measures.
 Confidence limits on effect size based on previous research are biased (narrower confidence limits than true)
 Because only significant findings are published
Reports difference in standard deviation units
One sample case
Estimate of d (as in example from the book, p. 204)
Two sample case
8.0 Power
Power: Probability of correctly rejecting a wrong Ho. More power = higher probability of rejecting Ho
Power = 1  β
Figure 8.4

Factors affecting power
 Alpha (α). The larger α, the more power
 Distance between means. The larger H1 the bigger the power
 Sample size (n). If n increases, std.err. decreases à overlap between sampling distr. Decreases, thus higher power
 Variance (σ²). If σ² decreases à overlap betw. Sampling. Dist. Decreases, thus higher power
à variance of sampling distr. Is bound to sample size, because σx2= σ² / n
Calculation of power

Because overlap is determinant for power, we may use Cohen´s d to asses how far the means differ, thus infer power from the size of d.
 3 methods to estimate d: 1. Prior research findings
2. Personal assessment of what difference would be important
3. Use of Cohen´s table

Combine effect size (d) with sample size (n) à find delta (δ)
Estimating the required sample size
8.1 Noncentrality parameter δ
Summary:
 If Ho = true, t is distributed around zero

If Ho = not true, t is distributed as δ (degree of noncentrality) à expresses the degree of wrongness of Ho
8.2 Retrospective Power
Priori power: Power that is calculated before an experiment. Based on estimates population parameters. (means,
variances, correlations, proportions)
Retrospective (or posthoc) power: Calculated after experiment. Done with G Power tool (p. 244)
Purpose: Help to design future research, evaluate studies in literature (meta analysis)
18.0 Recap
Parametric tests
TTest:  uses sample variance as pop var. estimate. à assumption that population from which sample is
Is normal.
Nonparametric tests / Distribution free tests
 Fall under the resampling tests (base conclusion on drawing a large number of samples under assum
ption that Ho = true) –> than they compare obtained sample result with resampled results
 Some resampling procedures deal with raw scores, rather than with ranks
à Bootstrapping + Randomization tests
à Used when we are uncertain of assumptions (e.g. normal distr. of population)
à Used also when we do not have good parametric tests (e.g. Conf. Int. on a median)
Advantage 
Disadvantage 
 Require general assumptions 
 Lower power 
 Are sensitive do medians rather than means 
 Less specific 
 Unaffected by outliers 



à Ho is usually if 2 populations are symmetric or have a similar shape
Bootstrapping: Interested in median whose sampling distribution and SE cannot be derived analytically
à procedure is with replacement
Permutation tests: à procedure without replacement
Rankrandomization tests: Wilcoxon´s test and permutation test (draw every possible permutation only once)
18.1 Bootstrapping
Use:  Population distribution is not normal or unknown
 To estimate pop. parameters rather than testing hypotheses
 If we want confidence interval not of the mean
 sampling distribution with replacement
18.2 Bootstrapping with one sample
Finding a confidence interval 95% (example, p.661)
 Assumption: population distr. = sample distribution
 Draw a large number of samples under this assumption with n = 20
 Determine which values encompass the 95% à sort medians and cut off lowest and highest 2.5%
18.6 Wilcoxon´s RankSum Test
Use: Analogue to ttest but it tests a broader Ho.
à Ho = 2 samples are drawn at random from identical Populations ( not just pop. with the same mean)
à if Ho is rejected, this means that the 2 pop. had different central tendencies
How it works:
 Assign ranks to observations of 2 independent samples
 Add scores for each sample = W Test statistic
 Check with WTable if significant or not
Add RankScores to get W test statistic 
Now compare Ws of the smaller sample!!! to W table, which shows the smallest value that can be expected by chance if Ho = true.
 Scores of the small sample can be big which means that if Ho = false, the sum of the ranks would be larger than chance expectation instead of smaller.
 Calculated W´s  2W is given in Wtable
2W = n1( n1+ n2+ 1)
 Use W´s or Ws (whichever is smaller) to compare it to the table.
 Two tailed test: Double the value of α
The normal approximation
Ws distribution approaches normal, when sample size increases
Parameters of the Ws Distribution
We can use z, because Ws is normally distributed
 Use z to calculate a true probability of obtaining the Ws as low as the one we got.
Example (p.672)
Treatment of Ties
 When data contains tied scores, a test that relies on ranks is distorted
 Assign ranks so that Ho gets hard to reject
MannWhitney U Statistic
 Competitor of Wilcoxon´s test
 U and W differ only by a constant
 U and W can be converted with Wtable
18.7 Wilcoxon´s MatchedPairs SignedRanks Test
 Used because sample scores do not appear to reflect a normally distributed population.
 Nonparametric analogue to ttest for matched samples
 Tests Ho that distribution of difference scores (in the population) is symmetric about zero.
How to use:
 Calculate difference scores
 Rank all differences without regard to the sign
 Sum the positive and negative ranks
 This will give you the T test statistic (smaller sum) + ignore the sign
 Evaluate against the T table
Relevant T Score

Ties
 If 0, eliminate participant from consideration
 Assign tied ranks
The normal Approximation
 Large samples size = T is approx. normally distributed
18.8 The Sign Test
 Gain even more freedom from assumptions than Wilcoxon test
 Lose power
How to
 Give difference scores a + or – sign
 Sum them and calculated probability (with binomial distribution tables) of that outcome. Eg. p(13) of 16
 Use X² test (ChiSquare) p.678
Access level of this page
 Public
 WorldSupporters only
 JoHo members
 Private