Optimization Maverick Blog: Statistical Significance

You can hire my services

How do you know when your AB test, split test, multivariate test or any other conversion rate optimization task is complete? When do you know you've found a winning experience that you can confidently apply to your web experience and expect the conversions to come rolling in?

Well the key thing is statistical significance of course, where essentially the volume of visitors to your test have voted with their feet, you've separated signal from noise and so on...

BUT, and it's a big but, have you allowed your test to play out through key trading times?

In my current company we have a number of trading factors that mirror many other eCommerce businesses:

1. Day of week

2. Visits by hour

Once a CRO test has lived through several of these, plus a number of other factors specific to our industry (seasonality) we can safely say the new experience has done it's tour of duty and is fit to implement.

Reaching a statistically significant result in itself is therefore not enough. Conversion optimization experts must ensure that their test audience takes into account critical factors that impact their tests. This is important because one variation may appeal to one season or segment more than others and ultimately misguide the result.

I've sort of overlooked this topic since establishing this blog but for subject completeness shall we say, I think I should now mention the role of statistical significance in optimisation testing.

One of the biggest headaches to running an AB test or Multivariate test on your website is knowing when your test is complete, or heading towards conclusion at least. Essentially how do you determine signal from noise?

Many 3rd party tools give you the metrics to determine a tests conclusiveness, for example the Maxymiser testing tool displays a 'Chance to beat all' metric for each page combination or test variant within your test.
But more importantly, what underpins these tests is the concept of statistical significance. Essentially a test result is deemed significant if it is unlikely to have occurred through pure chance. A statistically significant difference means that there is statistical evidence that there is indeed a difference.

Establishing statistical significance between two sets of results allows us to be confident that we have results that can be relied upon.

As an example, you have an AB test that has two different page designs. Analysing the data shows there are two results:

Page 1 - 1,529 generations with 118 responses or actions - giving a conversion rate of 7.72%.
Page 2 - 1,434 generations with 106 responses or actions - giving a conversion rate of 7.39%.

Looking at the two results which do you think is the better? Is page 1 better because it as a higher conversion rate that page 2? Using statistics and firing those 2 results through a basic Statistical Significance calculator (I'm using this one Google's Optimizer test duration calculator) tells us that the two results are 0.335218 standard deviations apart and are therefore not statistically significant. This suggests that it is highly likely that it is noise causing the difference in conversion rates, so plough on with your testing. If a 95% statistical significance is acheived you can safely say the test is onclusive with a clear winner. This is also indicative of a strong signal and gives you a result based upon a wholly statistical basis as opposed to human interpretation.

You can hire my services

Blog Post Labels

CRO statistical significance and beyond

What is Statistical Significance?