Correlation vs. Causation
Ice cream sales and drowning rates both rise in summer. Does eating ice cream cause drowning? Obviously not — both are driven by hot weather, which leads people to swim and to buy cold treats. This example sounds silly when stated plainly, but the underlying confusion it illustrates is one of the most common reasoning errors in journalism, social media, and even published research. Two things happening together — a correlation — is often treated as proof that one causes the other. It is not.
Correlation means two variables tend to change together — when one goes up, the other tends to go up (or down). Causation means one variable directly produces a change in the other. Correlation is common and easy to detect. Causation is harder to establish and requires ruling out alternative explanations.
The Confounding Variable
In the ice cream and drowning example, hot weather is what researchers call a confounding variable — a third factor that influences both of the things you are measuring, creating the appearance of a relationship between them when the real story is elsewhere. Confounding variables are everywhere, and finding them is one of the central challenges of scientific research.
Consider a study that finds people who carry lighters are more likely to develop lung cancer than people who do not. Does carrying a lighter cause cancer? No — the confounding variable is smoking. Smokers carry lighters; smoking causes cancer. The lighter is along for the ride. Identifying confounders requires thinking carefully about other factors that could plausibly explain both measurements.
Another classic example: countries with more television sets per household have longer life expectancy on average. Does watching TV lengthen your life? Probably not. Wealth is the confounder — richer countries can afford more televisions and also provide better healthcare, nutrition, and sanitation. Television ownership tracks wealth; wealth drives health outcomes.
Reverse Causation
Even when two variables truly influence each other, the direction of causation can be backwards from what people assume. A study might find that people who see doctors frequently have worse health outcomes than those who rarely visit. Does visiting the doctor cause poor health? No — people visit doctors because they are already unhealthy. The causation runs from poor health to doctor visits, not the other way. This error is called reverse causation, and it is especially common when data is collected at a single point in time rather than tracked over years.
Journalists often use causal language to describe correlational findings. Headlines like 'Coffee linked to longer life,' 'Social media causes depression,' or 'Exercise boosts test scores' are frequently based on observational studies that can only show correlation. The word 'linked' is a hedge — it signals correlation without claiming cause — but readers often miss this and remember the causal version of the story.
Match each scenario to the correct explanation of the correlation-causation issue it illustrates.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
How Researchers Establish Causation
Demonstrating causation — as opposed to merely documenting a correlation — is hard. Scientists use several strategies. Randomized controlled trials: randomly assign people to groups, so confounders are distributed equally. Natural experiments: find situations where circumstances created something close to random assignment without researchers doing it deliberately — a policy change that affected some towns but not others, for instance. Longitudinal studies: track the same individuals over years to see which variables change first, establishing temporal order. Mechanistic evidence: explain the biological or physical process that would connect cause and effect. The more of these boxes a study ticks, the stronger the causal claim.
A study finds that students who own more than 50 books at home have higher reading scores. A parent concludes that buying 50 books will raise their child's scores. What is the most likely problem with this conclusion?
What is a confounding variable?
Find the Confounder
- Step 1: Here are three correlations. For each one, brainstorm at least two possible confounding variables that could explain the relationship without any direct causation.
- Correlation A: Cities with more libraries have lower crime rates.
- Correlation B: Teenagers who play musical instruments have higher GPAs on average.
- Correlation C: People who own dogs visit the doctor less frequently.
- Step 2: For each correlation, write one sentence describing how you would design a study to test whether the relationship is truly causal.
- Step 3: Find one real headline online that implies causation. Write the headline and explain whether the underlying study is likely correlational or causal, and why.