a/b testing mistakes

The Most Common A/B Testing Mistakes

31.05.2020.

A short collection of ideas and pitfalls to have on your mind before designing an A/B test. Shuffle through and bookmark. Will be updated in the future.

The mistakes currently added to the list:

  • Never let people stop tests early 1
  • Too small sample size 1
  • Stopping the test just when you reach 95% confidence 2
  • Not paying attention to the margin of error 2
  • Measuring page goals instead of final goals 2
  • Ending a test too early 2
  • Testing minute design changes instead of big impact opportunities 3
  • Not having a testing hypothesis 3
  • Being tricked by false positives 3
  • Impatience – don’t peek 4
  • Not optimizing for traffic sources 4
  • Focusing only on conversion rate 4
  • Low traffic websites treated as high traffic 4
  • Sequential testing – just don’t 4

Never stop tests early

This is the most common beginner mistake. Getting pressured by management, falling for the first sign, that your new variation is winning or just simply forgetting to do the math in advance. Be meticulous about the validity and keep in mind, that a misinterpretation of a test can cause serious business damage.

Too small sample size

Always determine the needed sample size in advance. This step will enable you to make an estimation of the test duration and not get impatient to draw conclusions too early on.

Check out this sample size calculator.

Ballpark figure: 350 conversions per variation (each segment needs to be in its own variation, each channel also …).

Stopping the test just when you reach 95% confidence 

The decision to stop the test early is usually made immediately after the statistical significance is achieved. But, if you have not done a sample size estimation in advance this can be waay too soon. Use the calculator above and be patient!

Not paying attention to the margin of error

A test result will always have the margin of error attached to it. 

Let’s say, that you have tested a single variation of your landing page with the intention to increase conversion rate. After the test is finished your results are 4.2 (±1.8%) for the original and 5.0 (±2%) for the variant. It is true that the variant has a higher probability of being the winner, but the overlap in those two results is substantial. In this case you would definitely need to collect more data.

Measuring page goals instead of final goals

Measuring a simple, on-page goal, like a lead, is straightforward but not the most optimal. If possible measure the final goals. Measure the results at the end of the funnel. So in the example with leads you would also wan’t to measure the sales performance of those leads. 

Example: we had a SaaS client for which we were optimizing a lead collection page. The variant test result showed, that it had an almost negligible increase in the conversion rate. Since we had put a lot of effort in making the landing page content clearer, we were quite surprised by that. But some more analysis showed that the variant was producing higher quality leads, which led to more closed deals. The reason was, that the clearer copy had a ‘sieve’ effect – users knew better what to expect and total CR went up.

Ending a test too early

Your test should be no shorter than:

  • 7 days. It needs to cover everyday in the calendar week.
  • The duration of 2 buying cycles if possible. In no case, it should be shorter than the duration of a single cycle.
  • Your pre-calculated numbers are reached: unique visitors, number of conversions, statistical significance.

Holiday periods are not representative. They can not be compared with the rest of the year. People are buying more stuff, the proportion of impulse buys is greater, average order value is higher. If you test during holidays you should compare the test results only with previous holidays!

Testing minute design changes instead of big impact opportunities

Don’t test things that don’t present any learning opportunity.

  1. Test for big impact first,
  2. Then make incremental changes.

Use each test as an opportunity to drive behavioural insights.

Not having a testing hypothesis

Just testing random things will lead you nowhere. Create a test hypothesis. Start collecting and managing your insights. This is the only way to make the customer understanding compound over time. Get ideas for hypothesis from looking at your data and user behaviour.

Being tricked by false positives

The more variations you run, the higher the risk of a false positive. How to avoid this? Be sure to use the calculator!

Impatience – it’s dangerous to peek at the results

We all look at the results before the tests are finished. It makes sense to do it: to check if everything is working ok and to detect major errors. But be extra careful when doing it. Don’t draw early conclusions and certainly don’t discuss the results with stakeholders. Wait until you have 1000 visitors on each variation and 25 conversions, and the statistical significance is 95 or better.

Not optimizing for traffic sources

Treat traffic sources as a separate entity

  1. Use specific LPs for each unique traffic source
  2. Prioritize sources. Use the 80/20 rule to focus on the sources that matter the most.

Focusing only on conversion rate

Look at big picture metrics:

  • CAC 
  • Average order value

This is important to avoid scenarios, where the CR goes up, but the business result go down (eg: decreasing the cost of a product)

Once you confirm the effect of the variant is producing positive business results, then move to conversion rate.

Low traffic websites treated as high traffic

Small websites just don’t have the traffic to allow for testing strategies used on high traffic websites. The only option left is qualitative testing. User surveys and even user testing are the tools for this kind of sites. It’s going to take a lot more time, but there is basically no way around that.

Sequential testing – just don’t

Sequential testing is the practice of making a change and observing the change in a metric with the idea that if it goes up the hypothesis is confirmed. This approach is seriously flawed, as the metrics change all the time due to a bunch of factors. This is a recipe for a mistake. Besides, with today’s technology and ease of simultaneous A/B tests it takes almost no additional effort to do the tests right.