Which types of multivariate relationships exist? – Chapter 10

10.1 How does causality relate to associations?
10.2 How do you control whether other variables influence a causal relationship?
10.3 Which types of multivariate relationships exist?
10.4 What are the consequences of statistical control for inference?

10.1 How does causality relate to associations?

Many scientifical studies research more than two variables, requiring multivariate methods. A lot of research is focussed on the causal relationship between variables, but finding proof of causality is difficult. A relationship that appears causal may be caused by another variable. Statistical control is the method of checking whether an association between variables changes or disappears when the influence of other variables is removed. In a causal relationship, x → y, the explanatory variable x causes the response variable y. This is asymmetrical, because y does not need to cause x.

There are three criteria for a causal relationship:

Association between the variables
Appropriate time order
Elimination of alternative explanations

An association is required for a causal relationship but it doesn't necessitate it. Usually it immediately becomes clear what is a logical time order, such as an explanatory variable preceding a response variable. Apart from x and y, extra variables may provide an alternative explanation. In observational studies it can almost never be proved that a variable causes another variables, this isn't certain. Sometimes there can be outliers or anecdotes that contradict causality, but usually a single anecdote isn't enough proof to contradict causality. It's easier to find causality with randomized experiments than with observational studies. This is because randomization appoints two groups randomly and sets the time frame before starting the experiment.

10.2 How do you control whether other variables influence a causal relationship?

Eliminating alternative explanations is often tricky. A method of testing the influence of other variables is controlling them; eliminating them or keeping them on a constant value. Controlling means taking care that the control variables (the other variables) don't have an influence anymore on the association between x and y. A random experiment in a way also uses control variables; the subjects are selected randomly and the other variables manifest themselves randomly in the subjects.

Statistical control is different from experimental control. In statistical control, subjects with certain characteristics are grouped together. Observational studies in social science often form groups based on socio-economic status, education or income.

The association between two quantitative variables is shown in a scatter plot. Controlling this association for a categorical variable is done by comparing the means.

The association between two categorical variables is shown in a contingency table. Controlling this association for a third variable is done by showing each value of the third variable in a separate contingency table, called a partial table.

Usually the effect of a control variable isn't completely absent, it's just minimal.

A lurking variable is a variable that isn't measured, but that does influence the causal relationship. Sometimes researchers don't know about the existence of a variable.

10.3 Which types of multivariate relationships exist?

In multivariate relationships, the response variable y has multiple explanatory variables and control variables, written as x₁, x₂, etc.

In spurious associations, both the explanatory variable x₁ and the response variable y depend on a third variable (x₂),. The association between x₁ and y disappears when x₂ is controlled. There is no causal relationship between x₁ and y. A spurious association looks like this:

spurious association

In chain relationships the explanatory variable (x₁) causes a third variable (x₂), that in turn causes the response variable (y). The third variable (x₂) is also called the intervening variable or the mediator. Also in chain relationships the association disappears when x₂ is controlled:

chain association

The difference between a spurious relationship and a chain relationship is the causal order. In a spurious relationship x₂ precedes both x₁ and y. In a chain relationship x₂ intervenes between x₁ and y.

In reality, response variables often have more than one cause. Then y is said to have multiple causes. Sometimes these causes are independent, but usually they are connected. That means that for instance x₁ has a direct effect on y but also an indirect effect on y via x₂. This can look like this:

multiple causes

In case of a suppressor variable, there seems to be no association between x₁ and y, until x₂ is controlled and disappears. Then x₂ is a suppressor variable. This happens when for example x₂ is positively correlated with y and negatively correlated with x₁. So even when there seems to be no association between two variables, it's wise to control for other variables.

Statistical interaction happens between x₁ and x₂ and their effect on y when the actual effect of x₁ on y changes for different values of x₂. The explanatory variables, x₁ and x₂, are also called predictors.

Lots of structures are possible for multivariate associations. One of the possibilities is even an association that assumes the opposite direction (positive versus negative) when a variable is controlled, this is called Simpson's paradox.

Confounding happens when two explanatory variables both effect a response variable and they're also associated with each other. Omitted variable bias is a risk when a confounding variable is overseen. Finding confounding variables is a big challenge for social science.

10.4 What are the consequences of statistical control for inference?

When x₂ is controlled for the x₁y association, this may have consequences for inference. A certain value of x₂ can shrink the sample size. The confidence interval becomes wider and the test statistics smaller. The chi squared test can result in a smaller value, caused by the smaller sample size.

When a categorical variable is controlled, separate contingency tables need to be construed for the different categories. It is usual for an ordinal variable to require at least three or four tables.

Often the parameter values are measured for several values of the control variable. Instead of the usual confidence interval to analyze the difference between either proportions or means, a confidence interval can be calculated for the difference in parameters for several values of the control variables. The formula for measuring the effect of statistical control through a confidence interval is:

$(Estimate_2 - Estimate_1)\pm z \sqrt{(se_1)^2+(se_2)^2}$

When 0 isn't within the interval, then the parameter values are different. When the x₁y association is equal in the partial analyses, then a measure is designed for the strength of the association, in keeping with the control variable. This is called a partial association.

Access:

Public

Join: WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Check: concept of JoHo WorldSupporter

Concept of JoHo WorldSupporter

JoHo WorldSupporter mission and vision:

JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant and sustainable world. Through physical and online platforms, it supports personal development and promote international cooperation is encouraged.

JoHo concept:

As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.