A/B Testing Email: Variables, Sample Size &

What You Will Learn

Why email audiences are unique and require their own testing rather than adopting generic best practices
The variables to test in priority order — from highest to lowest leverage
How to set up a valid A/B test (not just send two versions)
Minimum sample sizes for statistically meaningful results
How to determine if a result is statistically significant or just noise
When multivariate testing is appropriate and when it over-complicates
How to build a systematic testing programme that compounds learnings

Why Test — Not Just Apply Best Practices

Email best practices — the guidelines in this guide and every other email marketing resource — are derived from aggregate data across thousands of senders and millions of subscribers. They represent average behaviour, not your specific audience's behaviour. Your audience may be different: older or younger, more professional or more casual, more price-sensitive or more brand-loyal, more responsive to humour or more responsive to data.

A/B testing tells you what works for your audience specifically. "Best practice says first-name personalisation in subject lines improves open rate" — does it for your subscribers? Test it. "Best practice says shorter emails get higher CTR" — does this hold for your audience? Test it. Only your test data tells you with confidence.

What to Test — Priority Order

Variable	Metric Affected	Priority	Example
Subject line	Open rate	Highest	Curiosity gap vs specific benefit; question vs statement
CTA button text	Click rate	Very high	"Shop Now" vs "Get Yours" vs "See the Collection"
Email length	Click rate, conversion rate	High	Short (300 words) vs long (900 words)
Primary offer / hero content	Conversion rate	High	Product vs lifestyle hero image; discount vs free shipping offer
Send time	Open rate	Medium	Tuesday 9am vs Thursday 2pm
From name	Open rate, trust	Medium	"Digital Codex" vs "James at Digital Codex"
Preview text	Open rate	Medium	Benefit-focused vs curiosity-gap preview
Body copy tone	Click rate	Lower	Professional vs conversational
Image vs no image	Click rate, load time	Lower	HTML vs plain text

Test in priority order — subject lines and CTAs have the highest leverage because they affect the core conversion funnel (open → click). Optimise these before testing lower-leverage variables like button colour.

Setting Up a Valid Test

A valid A/B test changes exactly one variable. All other elements must be identical between Version A and Version B — same send time, same audience segment, same campaign objective. If you change both the subject line and the CTA simultaneously, you cannot determine which change drove the performance difference.

Valid test setup checklist

One variable changed: ✅
Audience randomly split (not by segment): ✅
Both versions sent simultaneously (not on different days): ✅
Equal send volume to each version: ✅
Sufficient sample size per version: ✅
Win metric defined before sending (not selected post-hoc): ✅

Most ESP A/B test features handle the random split and simultaneous sending automatically. The variables you control are what you test and the win metric you select.

Sample Size and Test Duration

Too small a sample produces noise — Version B may appear to win simply because a few extra people opened during the test window, not because it is genuinely better. Minimum practical thresholds:

What You Are Testing	Minimum Events per Variation	Why
Subject line (open rate)	200+ opens per variation	Opens are relatively frequent; smaller sample viable
CTA / body (click rate)	100+ clicks per variation	Clicks are less frequent; requires larger total send
Conversion (purchase/lead)	50+ conversions per variation	Conversions are least frequent; very large sends required

Translating to list size: if your average open rate is 25% and click rate is 3%, testing a CTA (needing 100 clicks per variation) requires sending to at least 3,300 subscribers per variation (100 clicks ÷ 3% CTR) — 6,600 total. Small lists may not have sufficient volume for valid conversion-level testing.

Test duration

Run tests for a minimum of 4 hours; ideally 24 hours to capture day-of-week patterns. Do not evaluate results before the test is complete — early leaders frequently reverse as more data arrives.

Statistical Significance

Statistical significance is the probability that a test result is real and not due to random chance. A 95% confidence level means there is a 95% probability the observed difference is genuine — a 5% chance it occurred by random variation in the sample.

Most ESP A/B test features calculate this automatically and declare a winner. If your ESP does not: use an online A/B significance calculator (numerous free tools available) — input the number of sends and number of opens/clicks for each version.

Interpreting results

95%+ confidence: Result is statistically significant — implement the winner
85–95% confidence: Directional signal — the winner is probably better, but test again with a larger sample before treating as definitive
Below 85% confidence: Inconclusive — do not implement based on this test; consider the variables equivalent until a clearer result emerges

A common mistake is declaring a winner based on percentage difference alone: "Version B had 28% open rate vs Version A's 25% — 12% higher, Version B wins." Without significance testing, this 3-point difference may be random noise. A 3-point difference on 200 sends is much less reliable than a 3-point difference on 20,000 sends.

Multivariate Testing

Multivariate testing tests multiple variables simultaneously across multiple versions — for example, 4 combinations of 2 subject lines × 2 CTAs. It can find the optimal combination faster than sequential A/B tests.

The requirement: very large lists. A multivariate test with 4 variations needs 4× the sample of a single A/B test to achieve the same confidence level. For most email senders with lists under 50,000, sequential A/B tests are more practical and produce more reliable results than multivariate testing.

Multivariate testing is most appropriate for: landing pages (more traffic available); very large email lists (100,000+) with high-volume sends; and advanced email platforms with built-in multivariate support.

Building a Testing Programme

Individual tests answer individual questions. A testing programme compounds learnings over time — each test builds on the previous one, and patterns across tests reveal what your audience systematically responds to.

Test on a regular cadence. Aim for at least one A/B test per month — not every send, but regularly enough that you accumulate knowledge over time.
Maintain a test log. Record: what was tested, date, winner, confidence level, performance difference, and action taken. A log of 20+ tests starts showing clear patterns about your audience's preferences.
Prioritise tests by potential impact. Use the priority order above — test subject lines before button colours. The highest-leverage variables yield more valuable insights.
Test the same variable multiple times. One subject line test shows which of two specific subject lines won. Ten subject line tests start showing which types of subject lines (curiosity vs specificity vs urgency) consistently win for your audience — a much more valuable insight.
Implement winners promptly. The point of testing is applying learnings. A test result implemented in next week's campaign has immediate value; a test result filed and forgotten has none.

Authentic Sources

OfficialGoogle — Email Sender Guidelines

Engagement metrics that email testing aims to improve.

OfficialGoogle Postmaster Tools

Tracking deliverability impact of email programme improvements from testing.

OfficialFTC — CAN-SPAM

Commercial email requirements applicable across all test variations.

OfficialICO — Direct Marketing

GDPR requirements for processing data in email testing programmes.

A/B Testing Email · Variables, Sample Size & Analysis

What You Will Learn

Why Test — Not Just Apply Best Practices

What to Test — Priority Order

Setting Up a Valid Test

Valid test setup checklist

Sample Size and Test Duration

Test duration

Statistical Significance

Interpreting results

Multivariate Testing

Building a Testing Programme

Authentic Sources

600 guides. All authentic sources.

A/B Testing Email · Variables, Sample Size & Analysis

What You Will Learn

Why Test — Not Just Apply Best Practices

What to Test — Priority Order

Setting Up a Valid Test

Valid test setup checklist

Sample Size and Test Duration

Test duration

Statistical Significance

Interpreting results

Multivariate Testing

Building a Testing Programme

Authentic Sources

Related Guides

600 guides. All authentic sources.