Confounding and deconfounding: or, slaying the lurking variable - Pearl - 2018 - Article

The biblical story of Daniel encapsulates in a profound way the conduct of experimental science today. 'When King Nebudchadnezzar brought back thousands of captives, he wanted his followers to pick out the children who were not blemished and skillful in all wisdom. But there was a problem, because his favorite one, the boy named Daniel, refused to touch the food the King gave them out of religious reasons. The followers of the King were terrified because of this problem and how the King would react. But Daniel proposed a experiment; try for ten days giving him only vegetables and take another group of children and feed them the King's meat and wine. After ten days the two groups were compared. Daniel prospered the King's diet, and because of this and their healthy appearance, Daniel became the most important person of the Kingdom'. The followers make up a question about the causation, will a vegetarian diet cause servants to be healthy? Daniel proposes at his turn a methodology to deal with this question by comparing the two groups after ten days of experimenting. And after a suitable amount of time, you can see a difference between the two groups. Nowadays this is called a controlled experiment. 

You can't go back in time and see, when Daniel did eat the meat and wine, what will happen to him comparing to the healthy diet. But because you can compare Daniel with a group of people who will get a different treatment, you can see what will happen when you give people a different diet. But the groups need to be representative of the population and comparable with each other. 

But Daniel didn't think of one thing; confounding bias. Suppose that Daniel is healthier than the control group to start with, their robust appearance after the ten days of eating the healthy diet will have nothing to do with the diet itself. Confounding bias occurs when a variable influences both who is selected for the treatment and the outcome of the experiment. Sometimes the confounders are known as the 'lurking third variable'. 

But, statisticians both over- and underestimate the importance of adjusting for possible confounding variables. They overrate in the sense that they often control for many more variables than they need to and even for variables that they should not control for. The idea is 'the more things you control for, the stronger the your study seems', because it gives the feeling of specificity and precision. But sometimes, you can control for too much. 

Statisticians also underestimate the importance of controlling for possible confounding variables in the sense that they are loath to talk about causality at all, even if the controlling has been done correctly.

In this chapter you will get to know why you can safely use RCT's, randomized control trials, to estimate the causal effect X -> Y without falling prey to the confounder bias. 

What is meant with the 'chilling fear of confounding'? 

In 1998, an important study showed the association between regular walking and reduced death rates among retired men. The researcher wanted to know whether the men who exercised more lived longer. He found that the death rate over a twelve year period was two times higher among men who were a 'casual walker' (less than a mile a day) than among men who were 'intense walkers' (more than two miles a day). But you have to keep in mind the influence a confounding variable or bias might have. 

This classic causal diagram shows us that age is a confounder of walking and mortality. But also, maybe, physical condition could be a confounder. But by saying so, you can go on and on about what could be possible confounders. But, even if the researchers adjusted the death rate for age found that the difference between causal and intense walkers was still large.

The skillful interrogation of nature: Why do RCT's work?

An RCT is often considered the gold standard of a clinical trial, and the person to thank for this is R.A. Fisher. The question he asks are 'aimed at establishing causal relationships'. And what gets in the way is confounding. Nature is like a genie that answers exactly the question we pose, not necessarily the one we intend to ask. And around 1923/1924 Fisher began to realize that the only experimental design that the genie could not defeat was a random one. When you do an experiment multiple times, sometimes you may get lucky and apply it tot the most fertile subplots. But by generating a new random assignment each time you perform the experiment, you can guarantee that the great majority of the time you will be neither lucky nor unlucky. Now, the randomized trials are a golden standard, but in the time of Fisher an randomly designed experiment horrified him and all his statistical colleagues. But Fisher realized that an uncertain answer to the right question is much better than a highly certain answer to the wrong question. 

When you ask the genie the wrong question, you will never find out what you want to know. If you ask the right question, getting an answer that is occasionally wrong is much less of a problem. So, randomization brings two benefits:

  1. It eliminates the confounder bias.
  2. It enables the researcher to quantify his uncertainty. 

In a nonrandomized study, the experimenter must rely on her knowledge of the subject matter. If she is confident that her causal model accounts for sufficient numbers of deconfounders and she has gathered data on them, then she can estimate the effects in an unbiased way. But the danger here is that she might have missed a confounding factor, and her estimate may therefor be biased. 

But RCT's are still preferred to observational studies. In some cases, intervention may be physically impossible, or intervention could be unethical, or you have difficulties recruiting subjects for inconvenient experimental procedures and end up with only volunteers who don't quite represent the intended population.

What is the new paradigm of confounding?

While confounding is widely recognized as one of the central problems in research, a review of literature about this will reveal little consistency among the definitions of confounding or confounder. But why has the confounding problem not advanced a bit since Fisher? Because of lacking a principled understanding of confounding, scientist could not say anything meaningful in observational studies where physical control over treatments is infeasible. But how was confounding defined then and how is it defined now? It is easier to answer the second question, with the information we have now. Confounding can simply be defined as anything that leads to a discrepancy between the two P(Y|X) (the conditional probability of the outcome given the treatment) and P(Y |do(X)) (the interventional probability).But why is it so difficult? The difficulty is there because it isn't a statistical notion. It stands for the discrepancy between what we want to assess (the causal effect) and what we actually do assess using the statistical methods. But if you can't mathematically articulate what you want to assess, you can't expect to define what constitutes a discrepancy. 

The concept of 'confounding' has evolved around the two related conceptions: incomparability and lurking third variables. Both of these concepts have resisted formalization. Because how do we know what is interesting and relevant to study and to distinguish and what not? You can say it is common sense, but many scientist have struggled with finding the important things to consider. 

What are the two surrogate definitions of confounding? They fall into two main categories: declarative and procedural. A old procedural definition that goes by the scary name of 'noncollapsibilty'. You can compare the relative risk and the relative risk after adjusting for the potential confounder. And the difference indicates confounding and you should use the adjusted risk estimate. 

The declarative definition is 'the classic epidemiological definition of confounding' and it consists of three parts: A confounder of X (treatment) and Y (outcome) is a variable Z that is (1) associated with X in the population at large and (2) associated with Y among people who have not been exposed to the treatment X. In the recent years there has been supplemented a third condition: (3) Z should not be on the causal path between X and Y. But this idea is a bit confusing I would say. 

You can't always use Z as a perfect measure for M, when you do some of the influence of X on Y might 'leak through' if you control for Z. But controlling for Z is still a mistake, while the bias might be less if you controlled for M, it is still there. That is why Cox (1958) warned that you should only control for Z if you have a 'strong priori reason' to believe that it is not affected by X. This is nothing more than a causal assumption. 

Later, Robins and Greenland set out to express their conception of confounding in terms of potential outcomes. Also ideally, each person in their experiment would be exchangeable with the person in the other condition. So, you would have the same person in the treatment and in the control group that confounding variables could be very low. The outcome could be the same if you switched the treatments and controls. By using this idea, Robins and Greenland showed that both the declarative and procedural definition were wrong. 

What does the do-operator and the back-door criterion mean?

To understand the back-door criterion you first have to have an idea of how information flows in a causal diagram. It looks like links of pipes that convey information from a starting point X to a finish Y. The do-operator erases all the arrows that come into X and in this way prevents any information about X form flowing in noncausal direction. If you have longer pipes with more junctions:

A F -> G I -> J?

The answer is very simple, if a single junction is blocked, then J cannot 'find out' anything about A through this path. So, you have a lot of options to block communication between A and J. A back-door path is any path from X to Y that starts with an arrow pointing into X, X and Y will be deconfounded if we block every back-door path. So, you can almost treat deconfounding like some game. The goal of the game is to specify a set of variables that will deconfound X and Y. With other words: they should not be descended from X, and they should block all the back-door paths.

This is a new kind of bias, called the M-bias. There is only one back-door path, and this one is already blocked by a collider at B, so you don't need to control for anything else. It is incorrect to call a variable a confounder, like B, merely because it is associated with X and Y. B only becomes a confounder when you control for it! But when you are going to use identifying variables such as smoking, miscarriage etc. they are obviously not games but serious business that you are dealing with. 

Image

Access: 
Public

Image

Click & Go to more related summaries or chapters:
Join WorldSupporter!
Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: Vintage Supporter
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
1220 1