Hypothesis Testing and p-values
Hypothesis testing is the standard framework for evaluating whether observed effects in data are likely real or might just be due to random chance. The procedure starts with a null hypothesis, usually the assumption that there is no effect or no difference. Then you compute a p-value, the probability of observing data at least as extreme as what you actually observed, assuming the null hypothesis is true. If the p-value is small (typically below 0.05), you reject the null hypothesis and consider the result statistically significant. The threshold of 0.05 is a convention, not a magic number; sometimes 0.01 or 0.001 is used for more rigorous standards.
P-values are widely misused. They do not tell you the probability that the null hypothesis is true. They do not tell you how big or important an effect is. They do not tell you whether a finding will replicate. They are simply a measure of how surprising the data would be under the assumption of no effect. A small p-value can come from a real effect, a large sample size detecting a tiny effect, or a fluke. P-hacking, manipulating analyses to produce a p-value below 0.05, has been a major problem in scientific research, contributing to the so-called replication crisis where many published findings turn out not to replicate.
Which best describes what a p-value of 0.03 means?
Modern statisticians increasingly emphasize effect sizes, confidence intervals, and replication, alongside (or instead of) p-values. The American Statistical Association issued a 2016 statement clarifying that p-values are useful but should not be treated as definitive measures of truth. Strong scientific work reports the magnitude of effects, the uncertainty around them, and ideally evidence of replication, in addition to p-values. Reading scientific findings critically means looking past the p-value to ask whether the underlying effect is large enough to matter, whether it has been replicated, and whether the analysis was preregistered (planned before data collection).
Read a Study
Find a recent scientific study you find interesting. Identify the p-value (often described as "statistically significant" or with a specific number). Then look for the effect size: how large was the actual difference, and is it big enough to matter in real life? Many statistically significant findings have effect sizes too small to make practical difference. The exercise builds the habit of reading beyond the p-value.
Hypothesis testing is one of the most powerful and most misused tools in applied statistics. The next lesson covers Bayesian thinking, an alternative framework that has been gaining ground in modern data analysis.
Want to keep learning?
Sign up for free to access the full curriculum — all subjects, all ages.
Start Learning Free