What You Will Learn
- What CRO is — and the difference between systematic CRO and random A/B testing
- The full range of research methods — qualitative and quantitative — for CRO
- How to conduct user research, session recordings, and heatmap analysis
- How to use GA4 funnel analysis and page-level data to identify optimisation opportunities
- How to write a conversion hypothesis with a testable prediction
- The ICE and PIE prioritisation frameworks for deciding which hypotheses to test first
- How to design an A/B test — control vs variation, success metrics, and test conditions
- How to implement and monitor a live test
- How to analyse test results correctly — statistical significance, practical significance, and what to do next
- How to build a sustainable CRO programme that generates compounding improvements
What CRO Is
Conversion Rate Optimisation is the practice of systematically improving the proportion of visitors who complete a desired action (convert) on a digital property. The conversion rate is: conversions ÷ total visitors × 100%. A website with 10,000 monthly visitors and 200 purchases has a 2% conversion rate.
The economic significance of conversion rate improvement is multiplicative. Doubling a 2% conversion rate to 4% doubles revenue without any increase in traffic or advertising spend. At 10,000 monthly visitors, 2% = 200 conversions; 4% = 400 conversions — the same traffic generating twice the business outcome. Conversion rate improvement is often described as the highest-ROI marketing activity because it multiplies the return on all existing traffic and advertising investment simultaneously.
CRO is distinct from good design or user experience (UX). Good UX creates a pleasant experience; CRO creates a converting experience. These often overlap but are not identical — a beautiful page that does not convert has good UX but poor CRO. The measure of CRO success is not how the page looks or feels but whether more users take the desired action.
Research Methods Overview
CRO research is divided into quantitative research (what is happening, how often, where) and qualitative research (why it is happening, what users experience, what barriers they encounter). Both are necessary — quantitative research identifies where problems exist; qualitative research reveals why they exist. Testing without both types of research produces hypotheses that may address the right pages but the wrong problems.
| Research Type | Methods | Answers |
|---|---|---|
| Quantitative | GA4 funnel analysis, heatmaps (click density, scroll), A/B test data, form analytics | "Where" — which pages, which steps, which elements have the highest drop-off or lowest engagement |
| Qualitative | User testing, session recordings, surveys, customer interviews, customer support review | "Why" — what causes users to hesitate, abandon, or not convert; what is confusing or missing |
Qualitative Research Methods
User testing
Ask 5–8 target users to complete a specific task on the site while speaking their thoughts aloud (think-aloud protocol). Observe where they hesitate, what confuses them, what they try to click that does not work, and what questions they have that the page does not answer. Five user tests typically reveal 85% of usability problems on a page — a finding from Nielsen Norman Group's usability research that underpins the standard "5 users" recommendation in usability testing practice.
Session recordings
Session recording tools (Microsoft Clarity — free; Hotjar — freemium) record individual user sessions as anonymised videos showing mouse movement, clicks, scrolls, and form interactions. Reviewing sessions of users who abandoned a specific step reveals the specific moment of hesitation or confusion — the micro-behaviour that quantitative data cannot capture. Look specifically for: rage clicks (rapid repeated clicks on non-clickable elements, indicating frustration); hesitation before form fields (indicating uncertainty about what to enter); scroll patterns that stop short of important content.
On-site surveys
Brief exit surveys (one or two questions triggered when a user shows exit intent or after a specific interaction) can directly ask users about their experience: "What stopped you from completing your purchase today?" "What information were you looking for that you didn't find?" Open-text survey responses from users actively on the site at the moment of decision are among the most valuable sources of conversion barrier insights available.
Quantitative Research Methods
GA4 funnel analysis
Build funnel explorations for every major conversion journey on the site — checkout funnel, lead form completion, sign-up flow, key content consumption journeys. The funnel reveals absolute drop-off counts and rates by step. High absolute drop-off at a specific step = high CRO priority for that step. See the Funnel Analysis guide in this series for full implementation details.
Heatmaps
Heatmaps aggregate where users click, how far they scroll, and where they move their cursor across a page — producing visual heat maps that reveal attention and interaction patterns. Click heatmaps identify: elements users click that are not links (frustration); which CTAs receive the most clicks; elements in the visual field that are never interacted with despite their prominence. Scroll heatmaps identify: how far down the page most users scroll; the fold line below which most users do not scroll; whether key content or CTAs are placed below the visible fold for most users.
Form analytics
Form analytics tools (available in Hotjar and Microsoft Clarity) show which form fields have the highest drop-off rate, which fields users leave blank and return to, and which fields take the longest to complete. This data directly identifies the specific form fields causing the most friction — enabling targeted field optimisation rather than complete form redesigns.
Hypothesis Formation
A CRO hypothesis is a specific, testable prediction about how a change to a page element will affect a conversion metric. A well-formed hypothesis has three components: the observation (what the research found); the proposed change (what you will test); and the expected outcome (how the change will affect the conversion metric and why).
Hypothesis template
"We observed [research finding]. We believe that [proposed change] will [expected effect on conversion metric] because [reasoning based on behavioural principle or evidence]."
Examples
- Weak hypothesis: "We will test a new CTA button colour." (No research basis; no specific outcome prediction; does not explain why the change should work.)
- Strong hypothesis: "Session recordings show users repeatedly clicking the product image expecting a zoom feature that does not exist. We believe that adding an image zoom function will increase add-to-cart rate because users who can view product details more clearly will have higher purchase confidence."
- Strong hypothesis: "GA4 funnel analysis shows 68% of users who begin the checkout form abandon it at the credit card entry step. We believe that adding PayPal Express Checkout as an option will increase checkout completion rate because it removes the friction of manually entering card details."
Prioritisation Frameworks
A CRO programme typically generates more hypotheses than can be tested simultaneously. Prioritisation frameworks score hypotheses to identify which to test first.
ICE framework
Score each hypothesis on three dimensions (1–10 scale): Impact — how large a conversion improvement could this produce if successful? Confidence — how strongly does the research evidence support this hypothesis? Ease — how easy is this to implement technically? ICE Score = (Impact + Confidence + Ease) ÷ 3. Test in descending ICE score order.
PIE framework
Score on: Potential — how much improvement is possible? Importance — how much traffic/value does this page have? Ease — how easy to implement? PIE Score = (Potential + Importance + Ease) ÷ 3. The Importance dimension in PIE explicitly weights high-traffic, high-value pages above low-traffic ones — useful for focusing effort on pages that will produce the most absolute conversion gain.
Test Design Principles
- Test one change at a time (A/B, not multivariate) unless you have high traffic. A/B tests (one control, one variation, one change) produce clear, interpretable results. Multivariate tests (multiple simultaneous changes) require significantly higher traffic to reach statistical significance and produce results that are harder to act on. Most sites should default to A/B testing.
- Define the success metric before testing. The primary success metric for the test must be defined before the test launches — not chosen after seeing the results. Changing the metric after seeing results to find a statistically significant outcome is called "p-hacking" — it produces false positives that do not hold up when the winning variation is implemented permanently.
- Calculate required sample size before launching. Statistical power calculations (available via free online calculators) determine how many visitors are needed in each variant before results can be reliably interpreted. Testing with insufficient sample sizes produces unreliable results — a test that shows a 10% improvement with 200 visitors in each variant is not statistically reliable.
- Run tests for a minimum of one full business cycle. Running a test for only 3 days and stopping when one variant appears to be winning produces false positives due to day-of-week effects. Most tests should run for at least one full week (to cover the complete weekly traffic cycle) and ideally two weeks to smooth out anomalies.
Implementation and Monitoring
A/B tests are typically implemented via testing platforms — tools that split traffic between variants and measure the difference in conversion rates. Options: Google Optimize was discontinued in 2023; current alternatives include VWO (Visual Website Optimizer), Optimizely, AB Tasty, and Unbounce (for landing pages). Some CMS platforms (HubSpot, Unbounce) include native A/B testing without separate tool integration.
Monitor live tests for implementation errors in the first 24 hours: verify that traffic is splitting correctly (approximately 50/50); verify that the test variant is rendering correctly across browsers and devices; verify that conversion events are firing correctly in both variants. A test with a broken conversion event in one variant will produce misleading results that are impossible to interpret correctly.
Analysing Test Results
Statistical significance is the threshold at which the results are unlikely to be due to random chance — conventionally set at 95% confidence (p < 0.05). A test with 95% confidence and a positive result means there is a 5% probability the result is a false positive — the observed improvement occurred by chance rather than because of the change.
Statistical significance is necessary but not sufficient for acting on test results. Consider also:
- Practical significance. Is the measured improvement large enough to matter? A 0.1% conversion rate improvement from 2.0% to 2.1% may be statistically significant (if traffic is large enough) but may not be worth the cost of implementing the change permanently.
- Segment-level results. A test that shows no overall winner may show strong positive results for mobile users and negative results for desktop users. Always check results by device type, new vs returning users, and key traffic channels before declaring a test inconclusive.
- Secondary metric effects. A change that improves checkout conversion rate but reduces average order value may not be a net positive. Always measure the impact on secondary business metrics alongside the primary success metric.
Building a Sustainable CRO Programme
Individual tests do not build a CRO programme — a documented, iterative process does. Key programme elements:
- Test log. A shared document recording every test run: the hypothesis, test design, duration, results, statistical confidence, and the decision taken (implement, reject, or investigate further). This institutional knowledge base prevents repeating failed tests and accelerates hypothesis development by revealing patterns in what works.
- Research cycle. Schedule monthly research sessions (heatmap analysis, session recording review, survey data analysis) to continuously generate new hypotheses rather than depleting the initial research backlog and stopping.
- Velocity target. Set a target for tests run per month and optimise for meeting it. CRO programmes that run 2–4 tests per month generate more improvement than programmes that run 1 large test per quarter — because more tests generate more learnings, and faster learnings compound more quickly.
- Reporting cadence. Monthly reporting of CRO results to business leadership — including tests completed, conversion rate changes, and estimated revenue impact — maintains organisational investment in the programme.
Authentic Sources
Every factual claim in this guide is drawn from official Google documentation, regulatory bodies, or platform-published technical specifications. No third-party blogs or marketing tools are used as primary sources. All content is written in our own words — we learn from official sources and explain them; we never copy.
GA4 Explorations for quantitative CRO research — funnel analysis, path analysis, and segment comparison.
Free session recording and heatmap tool from Microsoft — official documentation and product.
GA4 funnel analysis for quantitative identification of conversion drop-off points.
Google's A/B testing documentation — reference for test design principles even after the product's discontinuation.