Correlation is a statistic that measures the strength and direction of the associations between two or more variables.
Causation, on the other hand, is a relationship that describes cause and effect.
“Correlation does not imply causation” is a famous quote that warns us about the dangers of the very common practice of looking at a strong correlation and assuming causality. A strong correlation may manifest without causation in the following cases:
- Lurking variable: An unobserved variable that affects both variables of interest, causing them to exhibit a strong correlation, even when there is no direct relationship between them.
- Confounding variable: A confounding variable is one that cannot be isolated from one or more of the variables of interest. Therefore we cannot explain if the result observed is caused by the variation of the variable of interest or of the confounding variable.
- Spurious correlation: Sometimes due to coincidence, variables can be correlated even though there is no reasonably logical relationship.
Causation is tricky to be inferred. The most usual solution is to set up a randomized experiment, where the variable that’s a candidate to be the cause is isolated and tested. Unfortunately, in many fields running such an experiment is impractical or not viable, so using logic and domain knowledge becomes crucial to formulating reasonable conclusions.