Thursday, 26 March 2009

What is Statistical Significance?

I've sort of overlooked this topic since establishing this blog but for subject completeness shall we say, I think I should now mention the role of statistical significance in optimisation testing.

One of the biggest headaches to running an AB test or Multivariate test on your website is knowing when your test is complete, or heading towards conclusion at least. Essentially how do you determine signal from noise?

Many 3rd party tools give you the metrics to determine a tests conclusiveness, for example the Maxymiser testing tool displays a 'Chance to beat all' metric for each page combination or test variant within your test.
But more importantly, what underpins these tests is the concept of statistical significance. Essentially a test result is deemed significant if it is unlikely to have occurred through pure chance. A statistically significant difference means that there is statistical evidence that there is indeed a difference.

Establishing statistical significance between two sets of results allows us to be confident that we have results that can be relied upon.

As an example, you have an AB test that has two different page designs. Analysing the data shows there are two results:

Page 1 - 1,529 generations with 118 responses or actions - giving a conversion rate of 7.72%.
Page 2 - 1,434 generations with 106 responses or actions - giving a conversion rate of 7.39%.

Looking at the two results which do you think is the better? Is page 1 better because it as a higher conversion rate that page 2? Using statistics and firing those 2 results through a basic Statistical Significance calculator (I'm using this one Google's Optimizer test duration calculator) tells us that the two results are 0.335218 standard deviations apart and are therefore not statistically significant. This suggests that it is highly likely that it is noise causing the difference in conversion rates, so plough on with your testing. If a 95% statistical significance is acheived you can safely say the test is onclusive with a clear winner. This is also indicative of a strong signal and gives you a result based upon a wholly statistical basis as opposed to human interpretation.