The spine of statistics - summary of chapter 2 of Statistics by A. Field (5th edition)

Statistics
Chapter 2
The spine of statistics

What is the spine of statistics?

The spine of statistics: (an acronym for)

  • Standard error
  • Parameters
  • Interval estimates (confidence intervals)
  • Null hypotheses significance testing
  • Estimation


Statistical models

Testing hypotheses involves building statistical models of the phenomenon of interest.
Scientists build (statistical) models of real-world processes to predict how these processes operate under certain conditions. The models need to be as accurate as possible so that the prediction we make about the real world are accurate too.
The degree to which a statistical model represents the data collected is known as the fit of the model.

The data we observe can be predicted from the model we choose to fit plus some amount of error.

Populations and samples

Scientists are usually interested in finding results that apply to an entire population of entities.
Populations can be very general or very narrow.
Usually, scientists strive to infer things abut general populations rather than narrow ones.

We collect data from a smaller subset of the population known as a sample, and use these data to infer things about the population as a whole.
The bigger the sample, the more likely it is to reflect the whole population.

P is for parameters

Statistical models are made up of variables and parameters.
Parameters are not measured an are (usually) constants believed to represent some fundamental truth about the relations between variables in the model.
(Like mean and median).

We can predict values of an outcome variable based on a model. The form of the model changes, but there will always be some error in prediction, and there will always be parameters that tell us about the shape or form of the model.

To work out what the model looks like, we estimate the parameters.

The mean as a statistical model

The mean is a hypothetical value and not necessarily one that is observed in the data.

Estimates have ^.

Assessing the fit of a model: sums of squares and variance revisited.

The error or deviance for a particular entity is the score predicted by the model for that entity subtracted from the corresponding observed score.

Degrees of freedom (df): the number of scores used to compute the total adjusted for the fact that we’re trying to estimate the population value.
The degrees of freedom relate to the number of observations that are free to vary.

We can use the sum of squared errors and the mean squared error to assess the fit of a model.
The mean squared error is the variance.

Estimating parameters

The equation for the mean is designed to estimate that parameter to minimize the error.

Although the equations for estimating the parameters will differ from that of the mean, they are based on the principle of minimizing error: they will give you the parameter that has the least error given the data you have.

Method of least squares (or ordinary least squares OLS): the principle of minimizing the sum of squared errors.

Standard error

The standard deviation tells us about how well the mean represents the sample data.

Sampling variation: samples vary because they contain different members of the population.

Sampling distribution: the frequency distribution of sample means from the same population.
Only hypothetical.
The sampling distribution of the mean tells us about the behaviour of samples from the population. It is centred at the same value as the mean of the population.
If we took the average of all the sample means, we get the value of the population mean.

We can use the sampling distribution to tell us how representative a sample is of the population.

Standard error of the mean (SE) (or standard error: the standard deviation of sample means.

  • hypothetically, the standard error could be calculated by taking the difference between each sample mean and the overall mean, squaring the differences, adding them up, and dividing by the number of samples. And then taking the square root of this value to get the standard deviation of sample means.
  • in the real world, we compute the standard error from a mathematical approximation.

Central limit theorem: tells us that a samples get large, the sampling distribution has a normal distribution with a mean equal to the population mean, and a standard deviation shown in equation.

When the sample is relatively small (fewer than 30), the sampling distribution is not normal. It is a t-distribution.

Any parameter that can be estimated in a sample has a hypothetical sampling distribution and standard error.

In short:

The standard error of the mean is the standard deviation of sample means. As such, it is a measure of how representative of the population a sample mean is likely to be. A large standard error (relative to the sample mean) means that there is a lot of variability between the means of different samples and so the sample mean we have might not be representative of the population mean. A small standard error indicates that most sample means are similar to the population mean.

(Confidence) interval

  • we usually use a sample value as an estimate of a parameter in the population.
  • the estimate of a parameter will differ across samples.
  • we can use the standard error to get some idea of the extent to which these estimates differ across samples.
  • we can use this information to calculate boundaries within which we believe the population value will fall. → confidence intervals.

Applies to any parameter.

Calculating confidence intervals

Point estimate: a single value from the sample.

Interval estimate:; use our sample value as the midpoint, but set a lower and upper limit as well.

A confidence interval show us how often, in the long run, an interval contains the true value of the parameter we’re trying to estimate.
They are limits constructed such that, for a certain percentage of samples, the true value of the population parameter falls within the limit.

To calculate the confidence interval, we need to know the limits within which 95% of the sample mean will fall.
We know (in large samples) that the sampling distribution of means will be normal, and the normal distribution has been precisely defined such that it has a mean of 0 and a standard deviation of 1.

The limits of a confidence interval would be -1,96 to 1,96.
Samples above about 30, will be normally distributed.

The mean is always the centre of the confidence interval.
We assume the confidence interval contains the true mean.
If the interval is small, the sample mean must be very close to the true mean.

Calculating other confidence intervals

If we want to compute confidence intervals for a value other than 95%, we need to look up the value of z for the percentage we want.
The values of z are multiplied by the standard error to calculate the confidence interval.

Calculating confidence intervals in small samples

For small samples, the sampling distribution is not normal. It has a t-distribution.
The t-distribution is a family of probability distributions that change shape as the sample size gets bigger.
It is the same principle as z, but instead we use the value for t.

n-1 is the degrees of freedom, which tell us which of the t-distributions tu use.

Showing confidence intervals visually

Confidence intervals provide us with information about a parameter.

A confidence interval for the mean is a range of scores such that the population mean will fall within this range in 95% of samples.
The confidence interval is NOT an interval within which we are 95% confident that the population mean will fall.

Null hypothesis significance testing

Fisher’s p-value

Scientist tend to use 5% as a threshold for confidence. Only when there is a 5% change of getting the result we have if no effects exist are we confident enough to accept that the effect is genuine.

Types of hypothesis

Null hypothesis: an effect is absent

Alternative hypotheses: there is an effect.

Null hypotheses is the baseline against which we evaluate how plausible our alternative hypothesis is.

Hypotheses can be:

  • Directional
    States that an effect will occur, and also states the direction of that effect
  • Non-directional
    States that an effect will occur, but doesn’t state the direction of the effect.

The process of NHST

p value presents the long-run probability.

We can never be completely sure that one of the hypothesis is correct.

Test statistics

NHST relies on fitting a model to the data and then evaluating the probability of this model, given the assumption that no effect exists.

Systematic variation: variation that can be explained by the model that we’ve fitted to the data.

Unsystematic variation: variation that cannot be explained by the model that we’ve fitted. It is error, or variation not attributable to the effect we’re investigating.

To test whether the model fits the data, is to compare the systematic variation against the unsystematic variation.
In effect, we look at a signal-to-noise ration.

Effect/error
The best way to test a parameter is to look at the size of the parameter relative to the back-ground noise that produced it. The ratio of how big a parameter is to how much it can vary across samples.

The ratio of effect relative to error is a test statistic.
The exact from of the equation changes depending on which test statistic you’re calculating.
They all represtn the same thing. Signal-to-noise or the amount of variance explained by the model we’ve fitted to the data compared to the variance that can’t be explained by the model.

A test statistic: a statistic for which we know how frequently different values occur.

As test statistics get bigger, the probability of them occurring becomes smaller.
If this probability falls below a certain value (p>0,05), we presume that the test statistic is as large as it is because our model explains a sufficient amount of variation to reflect a genuine effect in the real world.
The test statistic is said to be statistically significant.

One- and two-tailed tests

Hypotheses can be directional.

One-tailed test: a statistical model that tests a directional hypothesis

Two-tailed test: a statistical model that tests a non-directional hypothesis.

If the result of a one-tailed test is in the opposite direction to what you expected, you cannot and must not reject the null hypothesis.

Type I and type II errors

There are two types of errors we can make when we test hypotheses.

  • Type I error: occurs when we believe that there is a genuine effect in our population, when in fact there isn’t.
    α level
  • Type II error: when we believe that there is not effect in the population when, in reality, there is.
    ß level

There is a trade-off between these two errors.
If we lower the probability of accepting an effect as genuine, we increase the probability that we’ll reject an effect that dies genuinely exists.

Inflated error rates

If a test uses a 0,05 level of significance, then the chances of making a type I error are only 5%.

We always need to conduct different tests.

Familiwise or experimentwise error rate: the more tests, the more chance of a type I error.

Familywise error = 1-0,95n

the most popular way to correct this is to divide alpha by the number of comparisons.

Statistical power

Statistical power: the ability of a test to find an effect.

ß level (type II error rate).

The power of a test is 1-ß.

We typically aim to achieve a power of 0,8.

The power of a statistical test depends on:

  • how big the effect is (effect size)
  • how strict we are about deciding that an effect is significant
  • sample size

Calculate the power of a test:
Given that we’ve conducted our experiment, we will have already selected a value of alpha, we can estimate the effect size based on our sample data, and we will know how many participants we used. We can calculate the power. If this value is 0,8 or more, we can be confident that we have achieved sufficient power to detect any effects might have existed.

Confidence intervals and statistical significance

Guidelines:

  • 95% confidence intervals that just about toch end-to-end represent a p-value for testing the null hypothesis of no differences of approximately 0,01.
  • If there is a gap between the upper end of one 95% confidence interval and the lower end of another, then p<0,01.
  • A p-value of 0.05 is represented by moderate overlap between the bars.

Sample size and statistical significance

The is a connection between the sample size and the p-value associated with a test statistic.

If the sample gets larger, the standard error (and therefore the confidence interval) gets smaller.

The significance of a test is directly linked to the sample size. The same effect will have different p-values in different-sized samples. Small differences can be deemed significant in large samples, and large effects might be deemed non-significant in small samples.

 

 

Join World Supporter
Join World Supporter
Log in or create your free account

Waarom een account aanmaken?

  • Je WorldSupporter account geeft je toegang tot alle functionaliteiten van het platform
  • Zodra je bent ingelogd kun je onder andere:
    • pagina's aan je lijst met favorieten toevoegen
    • feedback achterlaten
    • deelnemen aan discussies
    • zelf bijdragen delen via de 7 WorldSupporter tools
Follow the author: SanneA
Comments, Compliments & Kudos

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Promotions
vacatures

JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld

More contributions of WorldSupporter author: SanneA
WorldSupporter Resources
Discovering statistics using IBM SPSS statistics by A. Field (5th edition) a summary

Discovering statistics using IBM SPSS statistics by A. Field (5th edition) a summary

Image

This is a summary of the book "Discovering statistics using IBM SPSS statistics" by A. Field. In this summary, everything students at the second year of psychology at the Uva will need is present. The content needed in the thirst three blocks are already online, and the rest will be uploaded soon.