The Most Common A/B Testing Mistakes

AB testing mistakes

A short collection of ideas and pitfalls to have in mind before designing an A/B test. Shuffle through and bookmark. It will be updated in the future.

The mistakes currently added to the list

  • Never let people stop tests early
  • Too small a sample size
  • Stopping the test just when you reach 95% confidence
  • Not paying attention to the margin of error
  • Measuring page goals instead of final goals
  • Ending a test too early
  • Testing minute design changes instead of big impact opportunities
  • Not having a testing hypothesis
  • Being tricked by false positives
  • Impatience – don’t peek
  • Not optimizing for traffic sources
  • Focusing only on conversion rate
  • Low traffic websites treated as high traffic
  • Sequential testing – just don’t

Never stop tests early

This is the most common beginner mistake. Getting pressured by management, falling for the first sign that your new variation is winning, or just simply forgetting to do the math in advance. Be meticulous about the validity and keep in mind that a misinterpretation of a test can cause serious business damage.

Too small a sample size

Always determine the needed sample size in advance. This step will enable you to make an estimation of the test duration and not get impatient by drawing conclusions too early on.

Check out this sample size calculator.

Ballpark figure: 350 conversions per variation (each segment needs to be in its own variation, each channel well …).

Stopping the test just when you reach 95% confidence 

The decision to stop the test early is usually made immediately after the statistical significance is achieved. But, if you didn’t do a sample size estimation in advance, this can be way too soon. Use the calculator above and be patient!

Not paying attention to the margin of error

A test result will always have the margin of error attached to it. 

Let’s say that you have tested a single variation of your landing page with the intention to increase conversion rate. After the test is finished, your results are 4.2 (±1.8%) for the original and 5.0 (±2%) for the variant. It is true that the variant has a higher probability of being the winner, but the overlap in those two results is substantial. In this case, you would definitely need to collect more data.

Measuring page goals instead of final goals

Measuring a simple, on-page goal, like a lead, is a straightforward but not the most optimal solution. If possible, measure the final goals. Measure the results at the end of the funnel. So in the example of leads you would also want to measure the sales performance of those leads. 

Example: we had a SaaS client for whom we were optimizing a lead collection page. The variant test result showed that it had an almost negligible increase in the conversion rate. Since we had put a lot of effort in making the landing page content clearer, we were quite surprised by that. But some more analysis showed that the variant was producing higher quality leads, which led to more closed deals. The reason was, that the clearer copy had a ‘sieve’ effect – users knew better what to expect and total CR went up.

Ending a test too early

Your test should be no shorter than:

  • 7 days. It needs to cover everyday in the calendar week.
  • The duration of 2 buying cycles, if possible. In no case should it be shorter than the duration of a single cycle.
  • Your pre-calculated numbers are reached: unique visitors, number of conversions, statistical significance.

Holiday periods are not representative. They can not be compared to the rest of the year. People buy more stuff, the proportion of impulse buys is greater, the average order value is higher. If you test during holidays, you should only compare the test results with previous holidays!

Testing minute design changes instead of big impact opportunities

Don’t test things that don’t present any learning opportunity.

  1. Test for big impact first,
  2. then make incremental changes.

Use each test as an opportunity to draw behavioural insights.

Not having a testing hypothesis

Just testing random things will lead you nowhere. Create a test hypothesis. Start collecting and managing your insights. This is the only way to get customer understanding compound over time. Get ideas for hypotheses from looking at your data and user behaviour.

Being tricked by false positives

The more variations you run, the higher the risk of a false positive. How do you avoid this? Be sure to use the calculator!

Impatience – it’s dangerous to peek at the results

We all look at the results before the tests are finished. It makes sense to do it: to check if everything is working ok and to detect major errors. But be extra careful when doing it. Don’t draw early conclusions and certainly don’t discuss the results with stakeholders. Wait until you have a 1000 visitors on each variation and 25 conversions, and the statistical significance is 95 or better.

Not optimizing for traffic sources

Treat traffic sources as a separate entity.

  1. Use specific LPs for each unique traffic source.
  2. Prioritize sources. Use the 80/20 rule to focus on the sources that matter the most.

Focusing only on conversion rate

Look at the big picture metrics:

  • CAC 
  • Average order value

This is important to avoid scenarios, where the CR goes up, but the business results go down (e.g: decreasing the cost of a product).

Once you confirm the effect of the variant is producing positive business results, then move on to conversion rate.

Low traffic websites treated as high traffic

Small websites just don’t have the traffic to allow for testing strategies used on high traffic websites. The only option left is qualitative testing. User surveys and even user testing are the tools for these kinds of sites. It’s going to take a lot more time, but there is basically no way around that.

Sequential testing – just don’t

Sequential testing is the practice of making a change and observing the change in a metric with the idea that if it goes up the hypothesis is confirmed. This approach is seriously flawed, as the metrics change all the time due to a bunch of factors. This is a recipe for a mistake. Besides, with today’s technology and ease of simultaneous A/B tests, it takes almost no additional effort to do the tests right.

Hi there!

Let’s start a project, schedule a call, or just say hello.

stay informed!

Subscribe to receive exclusive content and notifications

Success!

Thank you for connecting with us. We will get back to you soon!