Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary
- 2246 reads
THE ASSOCIATION BETWEEN TWO CATEGORICAL VARIABLES
When analysing data the first step is to distinguish between the response variable and the explanatory variable. The response variable is the outcome variable on which comparisons are made. If the explanatory variable is categorical, it defines the groups to be compared with respect to values for the response variable. If the explanatory variable is quantitative, it defines the change in different numerical values to be compared with respect to values for the response variable. The explanatory variable should explain the response variable (e.g: survival status is a response variable and smoking status is the explanatory variable).
An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable.
A contingency table is a display for two categorical variables. Conditional proportions are proportions which formation is conditional on ‘x’. A conditional proportion should be conditional to something. A conditional proportion is also a percentage. The proportion of the totals (e.g: percentage of total amount of ‘no’) is called a marginal proportion.
There is probably an association between two variables if there is a clear explanatory/response relationship, that dictates which way we compute the conditional proportions. Conditional proportions are useful in determining if there’s an association. A variable can be independent from another variable.
THE ASSOCIATION BETWEEN TWO QUANTITATIVE VARIABLES
We examine a scatterplot to study association. There is a difference between a positive association and a negative association. If there is a positive association, x goes up as y goes up. If there is a negative association, x goes up as y goes down.
Correlation describes the strength of the linear association. Correlation (r) summarizes th direction of the association between two quantitative variables and the strength of its linear trend. It can take a value between -1 and 1. A positive value for r indicates a positive association and a negative value for r indicates a negative association. The closer r is to 1, the closer the data points fall to a straight line and the stronger the linear association is. The closer r is to 0, the weaker the linear association is.
The properties of the correlation:
The correlation r can be calculated as following:
N is the number of points. and ȳ are means and and are standard deviations for x and y. The sum is taken over all n observations.
The product of the z-scores for any point in the upper-right quadrant is positive. The product is also positive for each point in the lower-left quadrant. Such points contribute to a positive correlation. The product of the z-scores for any point in the upper-left and lower-right quadrants are negative. Such points contribute to a negative association.
PREDICTING THE OUTCOME OF A VARIABLE
The regression line predicts the value for the response variable y as a straight line function of the value x of the explanator variable. The equation for the regression line has the form:
‘a’ denotes the y-intercept and ‘b’ denotes the slope. A regression equation is often called a prediction equation. The prediction error is the difference between the actual y and the predicted y. The prediction error can be calculated as following:
The outcomes of the prediction error formula are called residuals. The summary measure to evaluate regression lines is:
Choosing the line that has the minimum residual sum of squares is called the least squares method. This gives us the regression line. The slope equals b. The y-intercept equals a. The regression formulas for y-intercept and slope are:
and
The slope can’t be used to determine the strength of the association, because the slope depends on the units for the variables. A slope using dollars would look different than a slope using euros, thus it is not possible to say something about the strength of the association using the slope. Correlation and regression methods serve different purposes, but there are strong connections between them:
r2 is the proportion of the variation in the y-values that is accounted for by the linear relationship of y with x.
CAUTIONS IN ANALYZING ASSOCIATIONS
Extrapolation refers to using a regression line to predict y values for x values outside the observed range of data. This is not always a good method to predict future data, because if the trend changes in the future, extrapolation gives poor predictions. Predictions about the future using time series data are called forecasts.
Regression outliers are outliers that are well removed from the trend that the rest of the data follow. An observation is influential if it has a large effect on results of a regression analysis. For an observation to be influential, two conditions must hold:
Correlation and the regression line are non-resistant: they are prone to distortion by outliers. Correlation does not imply causation. Also, an association does not imply causation.
A third variable that is not measured in a study (or perhaps even known about to the researchers) but that influences the association between the response variable and the explanatory variable is referred to as a lurking variable. A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest.
The direction of an association between two variables can change after we include a third variable and analyse the data at separate levels of that variable. This is known as Simpson’s Paradox (e.g: a positive correlation between crime rate and education changed to a negative correlation when data were considered at separate levels of urbanization).
A lurking variable may be a common cause of both the explanatory and the response variable. There could also be multiple causes. Some things are merely associated because they both have a time trend (e.g: two things both have a rising trend over the course of 10 years, then they will be positively associated with each other).
When two explanatory variables are both associated with a response variable but are also associated with each other, confounding occurs. It is difficult to determine which one really causes the response variable. The difference between a confounding variable and a lurking variable is that a lurking variable is not measured. A lurking variable has potential for confounding.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
This bundle contains a full summary for the book "Statistics, the art and science of learning from data by A. Agresti (third edition". It contains the following chapters:
1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15.
Contents of this bundle:
This bundle contains a summary for the first interim exam of the course "Research Methods & Statistics" given at the University of Amsterdam. It contains the books: "Statistics, the art and science of
...This bundle contains a summary for the second interim exam of the course "Research Methods & Statistics" given at the University of Amsterdam. It contains the books: "Statistics, the art and science of learning from data by A. Agresti (third edition)" with the chapters
...This bundle contains a summary for the fourth interim exam of the course "Research Methods & Statistics" given at the University of Amsterdam. It contains the books: "Statistics, the art and science of learning from data by A. Agresti (third edition)" with the chapters
...There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Main summaries home pages:
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
2494 | 1 | 1 |
Wrong Chapter Thijs Mulder contributed on 22-10-2021 15:46
Thank you for the great summaries. They are really helpful. Just a heads up: this is chapter 3 from Agresti, it should be chapter 6.
Kind regards
Add new contribution