Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 3 summary
THE ASSOCIATION BETWEEN TWO CATEGORICAL VARIABLES
When analysing data the first step is to distinguish between the response variable and the explanatory variable. The response variable is the outcome variable on which comparisons are made. If the explanatory variable is categorical, it defines the groups to be compared with respect to values for the response variable. If the explanatory variable is quantitative, it defines the change in different numerical values to be compared with respect to values for the response variable. The explanatory variable should explain the response variable (e.g: survival status is a response variable and smoking status is the explanatory variable).
An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable.
A contingency table is a display for two categorical variables. Conditional proportions are proportions which formation is conditional on ‘x’. A conditional proportion should be conditional to something. A conditional proportion is also a percentage. The proportion of the totals (e.g: percentage of total amount of ‘no’) is called a marginal proportion.
There is probably an association between two variables if there is a clear explanatory/response relationship, that dictates which way we compute the conditional proportions. Conditional proportions are useful in determining if there’s an association. A variable can be independent from another variable.
THE ASSOCIATION BETWEEN TWO QUANTITATIVE VARIABLES
We examine a scatterplot to study association. There is a difference between a positive association and a negative association. If there is a positive association, x goes up as y goes up. If there is a negative association, x goes up as y goes down.
Correlation describes the strength of the linear association. Correlation (r) summarizes th direction of the association between two quantitative variables and the strength of its linear trend. It can take a value between -1 and 1. A positive value for r indicates a positive association and a negative value for r indicates a negative association. The closer r is to 1, the closer the data points fall to a straight line and the stronger the linear association is. The closer r is to 0, the weaker the linear association is.
The properties of the correlation:
- The correlation always falls between -1 and +1.
- A positive correlation indicates a positive association and a negative correlation indicates a negative association.
- The value of the correlation does not depend on the variables’ unit (e.g: euros or dollars)
- Two variables have the same correlation no matter which is treated as the response variable and which is treated at the explanatory variable.
The correlation r can be calculated as following:
N is the number of points. and ȳ are means and
and
are standard deviations for x and y. The sum is taken over all n observations.
The product of the z-scores for any point in the upper-right quadrant is positive. The product is also positive for each point in the lower-left quadrant. Such points contribute to a positive correlation. The product of the z-scores for any point in the upper-left and lower-right quadrants are negative. Such points contribute to a negative association.
PREDICTING THE OUTCOME OF A VARIABLE
The regression line predicts the value for the response variable y as a straight line function of the value x of the explanator variable. The equation for the regression line has the form:
‘a’ denotes the y-intercept and ‘b’ denotes the slope. A regression equation is often called a prediction equation. The prediction error is the difference between the actual y and the predicted y. The prediction error can be calculated as following:
The outcomes of the prediction error formula are called residuals. The summary measure to evaluate regression lines is:
Choosing the line that has the minimum residual sum of squares is called the least squares method. This gives us the regression line. The slope equals b. The y-intercept equals a. The regression formulas for y-intercept and slope are:
and
The slope can’t be used to determine the strength of the association, because the slope depends on the units for the variables. A slope using dollars would look different than a slope using euros, thus it is not possible to say something about the strength of the association using the slope. Correlation and regression methods serve different purposes, but there are strong connections between them:
- They are both appropriate when the relationship between two quantitative variables can be approximated by a straight line.
- The correlation and the slope of the regression line have the same sign. If one is positive, so is the other one. If one is negative, so is the other one. If one is zero, the other is also zero.
r2 is the proportion of the variation in the y-values that is accounted for by the linear relationship of y with x.
CAUTIONS IN ANALYZING ASSOCIATIONS
Extrapolation refers to using a regression line to predict y values for x values outside the observed range of data. This is not always a good method to predict future data, because if the trend changes in the future, extrapolation gives poor predictions. Predictions about the future using time series data are called forecasts.
Regression outliers are outliers that are well removed from the trend that the rest of the data follow. An observation is influential if it has a large effect on results of a regression analysis. For an observation to be influential, two conditions must hold:
- Its x value is relatively low or high compared to the rest of the data
- The observation is a regression outlier, falling quite far from the trend that the rest of the data follow
Correlation and the regression line are non-resistant: they are prone to distortion by outliers. Correlation does not imply causation. Also, an association does not imply causation.
A third variable that is not measured in a study (or perhaps even known about to the researchers) but that influences the association between the response variable and the explanatory variable is referred to as a lurking variable. A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest.
The direction of an association between two variables can change after we include a third variable and analyse the data at separate levels of that variable. This is known as Simpson’s Paradox (e.g: a positive correlation between crime rate and education changed to a negative correlation when data were considered at separate levels of urbanization).
A lurking variable may be a common cause of both the explanatory and the response variable. There could also be multiple causes. Some things are merely associated because they both have a time trend (e.g: two things both have a rising trend over the course of 10 years, then they will be positively associated with each other).
When two explanatory variables are both associated with a response variable but are also associated with each other, confounding occurs. It is difficult to determine which one really causes the response variable. The difference between a confounding variable and a lurking variable is that a lurking variable is not measured. A lurking variable has potential for confounding.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
Concept of JoHo WorldSupporter
JoHo WorldSupporter mission and vision:
- JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant tolerant and sustainable world. Through physical and online platforms, it support personal development and promote international cooperation is encouraged.
JoHo concept:
- As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
- JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.
Join JoHo WorldSupporter!
for a modest and sustainable investment in yourself, and a valued contribution to what JoHo stands for
- 1495 keer gelezen
Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)
- 3961 keer gelezen
Research Methods & Statistics – Interim exam 4 (UNIVERSITY OF AMSTERDAM)
- 2804 keer gelezen
Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 1 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 2 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 3 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 5 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 6 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 7 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 8 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 9 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 10 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 11 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 12 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 14 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 15 summary
Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 1 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 2 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 3 summary
- Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 5 summary
- Research methods in psychology by B. Morling (third edition) – Chapter 1 summary
- Research methods in psychology by B. Morling (third edition) – Chapter 2 summary
Work for JoHo WorldSupporter?
Volunteering: WorldSupporter moderators and Summary Supporters
Volunteering: Share your summaries or study notes
Student jobs: Part-time work as study assistant in Leiden

Contributions: posts
Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary
This bundle contains a full summary for the book "Statistics, the art and science of learning from data by A. Agresti (third edition". It contains the following chapters:
1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15.
Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)
Contents of this bundle:
This bundle contains a summary for the first interim exam of the course "Research Methods & Statistics" given at the University of Amsterdam. It contains the books: "Statistics, the art and science of
...Research Methods & Statistics – Interim exam 2 (UNIVERSITY OF AMSTERDAM)
This bundle contains a summary for the second interim exam of the course "Research Methods & Statistics" given at the University of Amsterdam. It contains the books: "Statistics, the art and science of learning from data by A. Agresti (third edition)" with the chapters
...Research Methods & Statistics – Interim exam 4 (UNIVERSITY OF AMSTERDAM)
This bundle contains a summary for the fourth interim exam of the course "Research Methods & Statistics" given at the University of Amsterdam. It contains the books: "Statistics, the art and science of learning from data by A. Agresti (third edition)" with the chapters
...Search only via club, country, goal, study, topic or sector











Wrong Chapter Thijs Mulder contributed on 22-10-2021 15:46
Thank you for the great summaries. They are really helpful. Just a heads up: this is chapter 3 from Agresti, it should be chapter 6.
Kind regards
Add new contribution