USING DATA TO ANSWER STATISTICAL QUESTIONS
The information we gather with experiments and surveys is collectively called data. Statistics is the art and science of learning from data. Statistical problem solving consists of four things:
- Formulate a statistical question
- Collect data
- Analyse data
- Interpret results
The three main components of statistics for answering a statistical question are:
- Design
Stating the goal and/or statistical question of interest and planning how to obtain data that will address them. (e.g: how do you conduct an experiment to determine the effects of ‘X’) - Description
Summarizing and analysing the data that are obtained (e.g: summarizing people’s tv-habits in ‘hours of tv watched per day’) - Inference
Making decisions and predictions based on the data for answering the statistical question. (predicting the outcome of an election, based on the description of the data)
Probability is a framework for quantifying how likely various possible outcomes are.
SAMPLE VERSUS POPULATION
The entities that are measured in a study are called the subjects. This usually means people, but it can also be schools, countries or days. The population is the set of all the subjects of interest. In practice, we usually have data for only some of the subjects who belong to that population. These subjects are called a sample.
Descriptive statistics refers to methods for summarizing the collected data. The summaries usually consist of graphs and numbers such as averages and percentages. Inferential statistics are used when data are available from a sample only, but we want to make a decision or prediction about the entire population. Inferential statistics refers to methods of making decisions or predictions about a population, based on data obtained from a sample of that population.
A parameter is a numerical summary of the population. A statistic is a numerical summary of a sample taken from the population. The true parameter values are almost always unknown, thus we use sample statistics to estimate the parameter values.
A sample is random when everyone in the population has the same chance of being included in the sample. Random sampling allows us to make powerful inferences about populations. The margin of error is a measure of the expected variability from one random sample to the next random sample.
The formula for calculating the approximate margin of error is:
. In this case, ‘n’ is the number of subjects.