Null Hypothesis

What is the null hypothesis?

The null hypothesis (H₀) is the default assumption in any statistical test: that there is no real difference between two groups being compared. In A/B testing, H₀ states that your control and variation produce the same outcome — any observed difference is due to chance. The purpose of a test is not to prove H₀ true, but to gather enough evidence to reject it in favor of the alternative hypothesis (H₁).

‍

How the null hypothesis works

Every hypothesis test has two competing statements:

H₀ (null): Control and variation perform identically. Any difference is noise.
H₁ (alternative): There is a real difference between control and variation.

You analyze your test data and calculate a p-value — the probability of observing a difference at least as extreme as yours, assuming H₀ is true. If p < 0.05 (the conventional threshold), you reject H₀ and conclude that the variation likely caused a real effect.

Crucially, you never "accept" H₀. A non-significant result means you failed to reject it — the test was inconclusive, not that no difference exists.

‍

Formula and test structure

For a two-sample proportion test (the most common A/B test setup):

H₀: p₁ = p₂ (conversion rate for control equals variation)

H₁: p₁ ≠ p₂ (two-tailed) or p₁ < p₂ (one-tailed)

The test statistic (z-score) measures how many standard deviations your observed difference is from zero. Convert that to a p-value and compare against your significance level (α = 0.05 by convention).

‍

Real-world example

Booking.com runs roughly 1,000 A/B tests per year. Imagine they test a new search-result card layout:

H₀: The new card design has no effect on booking conversion rate.
Observation: Control converts at 4.20%, variant at 4.35% over 400,000 visitors per variant.
Result: z = 2.31, p = 0.021.
Decision: p < 0.05, so reject H₀ — the new layout produces a statistically significant lift.

‍

Null hypothesis, Type I, and Type II errors

Two error types matter when working with the null hypothesis:

Type I error (false positive): Rejecting H₀ when it is actually true. Your α = 0.05 caps this at 5%.
Type II error (false negative): Failing to reject H₀ when H₁ is true. Controlled by your test's power (typically 80%).

Underpowered tests (too small a sample size) frequently miss real effects, causing teams to wrongly conclude the variation "didn't work."

‍

Common misconceptions

"A non-significant result proves the variation doesn't work." Wrong. It means you didn't gather enough evidence to reject H₀. The effect may still exist — you just couldn't detect it at your sample size.
"A lower p-value means a bigger effect." Wrong. P-values measure evidence against H₀, not the size of the effect. Always report effect size alongside significance.
"Peeking at results mid-test is fine." Wrong. Checking significance repeatedly inflates your false-positive rate. Use sequential testing corrections or fix your sample size in advance.

‍

Related concepts

Deepen your understanding with these related glossary terms: alternative hypothesis, p-value, statistical significance, type-1 error, type-2 error, sample size.