← Clarigital·Clarity in Digital Marketing
Email Marketing · Session 9, Guide 17

A/B Testing Email · Variables, Sample Size & Analysis

Email A/B testing compares two versions of an email — differing in one variable — to determine which performs better. Without testing, email optimisation is based on intuition and best practices from other senders that may not apply to your specific audience. This guide covers everything needed to run valid email A/B tests: what to test in priority order, minimum sample sizes, test duration requirements, how to read statistical significance, and how to build a continuous testing programme.

Email Marketing2,700 wordsUpdated Apr 2026

What You Will Learn

  • Why email audiences are unique and require their own testing rather than adopting generic best practices
  • The variables to test in priority order — from highest to lowest leverage
  • How to set up a valid A/B test (not just send two versions)
  • Minimum sample sizes for statistically meaningful results
  • How to determine if a result is statistically significant or just noise
  • When multivariate testing is appropriate and when it over-complicates
  • How to build a systematic testing programme that compounds learnings

Why Test — Not Just Apply Best Practices

Email best practices — the guidelines in this guide and every other email marketing resource — are derived from aggregate data across thousands of senders and millions of subscribers. They represent average behaviour, not your specific audience's behaviour. Your audience may be different: older or younger, more professional or more casual, more price-sensitive or more brand-loyal, more responsive to humour or more responsive to data.

A/B testing tells you what works for your audience specifically. "Best practice says first-name personalisation in subject lines improves open rate" — does it for your subscribers? Test it. "Best practice says shorter emails get higher CTR" — does this hold for your audience? Test it. Only your test data tells you with confidence.

What to Test — Priority Order

VariableMetric AffectedPriorityExample
Subject lineOpen rateHighestCuriosity gap vs specific benefit; question vs statement
CTA button textClick rateVery high"Shop Now" vs "Get Yours" vs "See the Collection"
Email lengthClick rate, conversion rateHighShort (300 words) vs long (900 words)
Primary offer / hero contentConversion rateHighProduct vs lifestyle hero image; discount vs free shipping offer
Send timeOpen rateMediumTuesday 9am vs Thursday 2pm
From nameOpen rate, trustMedium"Digital Codex" vs "James at Digital Codex"
Preview textOpen rateMediumBenefit-focused vs curiosity-gap preview
Body copy toneClick rateLowerProfessional vs conversational
Image vs no imageClick rate, load timeLowerHTML vs plain text

Test in priority order — subject lines and CTAs have the highest leverage because they affect the core conversion funnel (open → click). Optimise these before testing lower-leverage variables like button colour.

Setting Up a Valid Test

A valid A/B test changes exactly one variable. All other elements must be identical between Version A and Version B — same send time, same audience segment, same campaign objective. If you change both the subject line and the CTA simultaneously, you cannot determine which change drove the performance difference.

Valid test setup checklist

  • One variable changed: ✅
  • Audience randomly split (not by segment): ✅
  • Both versions sent simultaneously (not on different days): ✅
  • Equal send volume to each version: ✅
  • Sufficient sample size per version: ✅
  • Win metric defined before sending (not selected post-hoc): ✅

Most ESP A/B test features handle the random split and simultaneous sending automatically. The variables you control are what you test and the win metric you select.

Sample Size and Test Duration

Too small a sample produces noise — Version B may appear to win simply because a few extra people opened during the test window, not because it is genuinely better. Minimum practical thresholds:

What You Are TestingMinimum Events per VariationWhy
Subject line (open rate)200+ opens per variationOpens are relatively frequent; smaller sample viable
CTA / body (click rate)100+ clicks per variationClicks are less frequent; requires larger total send
Conversion (purchase/lead)50+ conversions per variationConversions are least frequent; very large sends required

Translating to list size: if your average open rate is 25% and click rate is 3%, testing a CTA (needing 100 clicks per variation) requires sending to at least 3,300 subscribers per variation (100 clicks ÷ 3% CTR) — 6,600 total. Small lists may not have sufficient volume for valid conversion-level testing.

Test duration

Run tests for a minimum of 4 hours; ideally 24 hours to capture day-of-week patterns. Do not evaluate results before the test is complete — early leaders frequently reverse as more data arrives.

Statistical Significance

Statistical significance is the probability that a test result is real and not due to random chance. A 95% confidence level means there is a 95% probability the observed difference is genuine — a 5% chance it occurred by random variation in the sample.

Most ESP A/B test features calculate this automatically and declare a winner. If your ESP does not: use an online A/B significance calculator (numerous free tools available) — input the number of sends and number of opens/clicks for each version.

Interpreting results

  • 95%+ confidence: Result is statistically significant — implement the winner
  • 85–95% confidence: Directional signal — the winner is probably better, but test again with a larger sample before treating as definitive
  • Below 85% confidence: Inconclusive — do not implement based on this test; consider the variables equivalent until a clearer result emerges

A common mistake is declaring a winner based on percentage difference alone: "Version B had 28% open rate vs Version A's 25% — 12% higher, Version B wins." Without significance testing, this 3-point difference may be random noise. A 3-point difference on 200 sends is much less reliable than a 3-point difference on 20,000 sends.

Multivariate Testing

Multivariate testing tests multiple variables simultaneously across multiple versions — for example, 4 combinations of 2 subject lines × 2 CTAs. It can find the optimal combination faster than sequential A/B tests.

The requirement: very large lists. A multivariate test with 4 variations needs 4× the sample of a single A/B test to achieve the same confidence level. For most email senders with lists under 50,000, sequential A/B tests are more practical and produce more reliable results than multivariate testing.

Multivariate testing is most appropriate for: landing pages (more traffic available); very large email lists (100,000+) with high-volume sends; and advanced email platforms with built-in multivariate support.

Building a Testing Programme

Individual tests answer individual questions. A testing programme compounds learnings over time — each test builds on the previous one, and patterns across tests reveal what your audience systematically responds to.

  • Test on a regular cadence. Aim for at least one A/B test per month — not every send, but regularly enough that you accumulate knowledge over time.
  • Maintain a test log. Record: what was tested, date, winner, confidence level, performance difference, and action taken. A log of 20+ tests starts showing clear patterns about your audience's preferences.
  • Prioritise tests by potential impact. Use the priority order above — test subject lines before button colours. The highest-leverage variables yield more valuable insights.
  • Test the same variable multiple times. One subject line test shows which of two specific subject lines won. Ten subject line tests start showing which types of subject lines (curiosity vs specificity vs urgency) consistently win for your audience — a much more valuable insight.
  • Implement winners promptly. The point of testing is applying learnings. A test result implemented in next week's campaign has immediate value; a test result filed and forgotten has none.

Authentic Sources

OfficialGoogle — Email Sender Guidelines

Engagement metrics that email testing aims to improve.

OfficialGoogle Postmaster Tools

Tracking deliverability impact of email programme improvements from testing.

OfficialFTC — CAN-SPAM

Commercial email requirements applicable across all test variations.

OfficialICO — Direct Marketing

GDPR requirements for processing data in email testing programmes.

600 guides. All authentic sources.

Official documentation only.