Optimization Maverick Blog: Google experiments

You can hire my services

Showing posts with label Google experiments. Show all posts

Unequal visits in Google Experiments

Having run my new Google Experiment for a couple of weeks I have noticed that the distribution of traffic between the default and test page has become hugely unequal.
After a bit of trawling around the web I discovered that this was most likely caused by Google applying a Multi-armed bandit test approach (here)

Google states "a multi-armed bandit is a type of experiment where:
the goal is to find the best, or most profitable action, and
you learn about payoff probabilities as the experiment progresses."

"Once per day, we take a fresh look at your experiment to see how each of the variations has performed, and we adjust the fraction of traffic that each variation will receive going forward. A variation that appears to be doing well gets more traffic, and a variation that is clearly under performing get less."

So essentially Google are no longer retaining an equal traffic share for each variation in the test. If a variation is perceived to be doing worse than another or the default it gets less number of visitors. Equally if a test variant is doing better than another variant or the default it gets a greater share of the traffic. This is fine and will be rewarding for people who want a quick win during experiementation time but it's harder to swallow for those that have been doing conventional A/B or MVT testing up until now. In the past Google have let you adjust test weighting manually and it would be good to offer an overide option whereby the Multi Arm Bandit approach can be disabled by those who want clarity of their tests.

Google Content Experiments

Google has announced its moving away from Google Website Optimizer to 'Content Experiments'. This effectively means Optimizer has been fully integrated into the new version of Google Analytics. There are as ever positives & negatives involved. First off it means less coding to be uploaded to your test pages. Great. On the downside tests are now wholley driven around Google Goals. In the past I've found setting up goals to be hard work and they dont always work so coupling testing to goals fills me with concern. Finally I think Google have missed a trick. I think they had an opportunity to make their testing tool a 'tagless solution' under this approach. If they worked in a bit of dynamic javascript into their normal analytics tags you would never have to upload test code for each new experiment. This would greatly increase peoples abilities to conduct testing without further IT input, essentially the holy grail of professional testers. I'm just lining up my 1st test under this regime, I'll you know how it goes.

First mobile landing page test

I launched a Google Otimizer Ab test at the start of the week to see if I could drive an uplift in Mobile Banking registrations. Essentially the newly launched landing page for mobile banking had a CTA for the registration form but it was available on only the last tab of a 4 tab page design. Using GWO I added the same CTA to the footer of the page so that it was available across all tabs. Within just a few days the new design was beating the old design hands down with a +200% uplift in people clicking through to the registration form with a 99% confidence index. I'll add an image of the design soon. Update: the design has sustained it's uplift 3 weeks on. We now have a 10.5% conversion versus 1.5% for the old design.

Tracking uplift post ab testing

Once you've ran either an A/B test or a Multivariate (MVT) test on your website and you're told by your testing tool of choice that your test has achieved statistical conclusion how do you continue to measure performance, and more importantly, should you continue to monitor performance?

I know it sounds reckless, to continue to track uplift is the responsible thing to do right? But in reality there's an important aspect of optimisation testing that needs to be taken into consideration here. Testing outcome is usually the result of the following variables:

product benefit + moment in time + market position + customer experience

Can you continue to accurately measure and account for each of these factors after you've conducted a test? The honest answer is probably no. Also I don't know many people with the personal bandwidth to monitor every single test once it's finished. If I wanted to double my workload I certainly would!

The important thing to do is ensure is that your test has been given enough testing time and traffic volume in the first place before you conclude the test as this post reasonably states Don't Fool Yourself with A/B Testing. If you've done that you should have a reasonable level of confidence in it's future performance.

If you still want to track uplift after testing I would suggest the following are available options for you:

1. Set up a Google Analytics Goal. This gives you the ability to track the performance of a specific customer journey within your normal web analytic. Yes you have to use Google for this one, but any web metrics tool worth it's salt will have the same functionality.

2. Leave your test running. This to me is the fail-safe option. Once you have a test winner up-weight this in favour of the default content but leave a small percentage of your traffic going to the default as a benchmark for continued performance. I usually leave 5% going to the default for a period of time where possible to ensure I've made the right decision.

3. Run a Follow-up experiment. This is a great feature in Google Optimizer but you can do the same in any other testing tool if you have the resource to do it and there's lingering doubt about the original test outcome.

4. Bespoke tracking. On the pages I optimise I append tracking values to the application form that when submitted to a sales database can be used to tie back sales to a specific landing page. Using this approach you can monitor conversion rate performance before, during and after a test. I cant recommend this approach enough and is entirely dependent upon your on-line application forms particular design as to whether you can implement it.

That's about it really. If I think of any other methods for ongoing tracking I'll add them in.

Happy testing!

The conception of Short Wave Testing

Right, well this is really work in progress. I think I've invented a new form of multivariate testing on the web. And for clarity this has nothing at all to do with Short Wave Radio. However, a couple of points to start off with; A) I'm not entirely sure it hasn't been done before and B) It's a valid test methodology.

Well hang it this blog is all about being a testing 'Maverick' so here goes nothing....

First off, let's not get confused by Iterative Wave Testing as used by Optimost. I think I'm right in saying that's where you test the same variants over a sustained period in 'waves' of testing to ensure what you have is validated and statistically significant. All very worthy, good stuff.

What I've been experimenting with is trying a set of test variants in one brief wave of testing and then ditching or culling any negative or lesser performing variants in favor of an entirely new variant in a new wave of testing that sees the positive or successful variants carried forward from the last wave. The whole process is repeated for as many waves as it takes to get a robust set of variants that out-perform everything else pitted against them. The only qualifying criteria for a variant to be carried forward to the next wave of testing is that they either continue to outperform the original default design or better the performance of anything that has gone before them, i.e; anything that has been previously removed.

I hope this simple (ish) diagram illustrates how this short wave testing works. Below we have 4 test areas in a web page and we have 4 phases of testing. As we can see in Test Area 1, Variant A is successful enough never to be culled from the test and ultimately becomes the winner for Test Area 1. Test Area 2 shows an initially unsuccessful Variant A that is culled after the first phase of testing and replaced with a new variant B which goes on to be the winning variant of Test Area 2. Test Area 3 has a different story, in the end it takes 4 different variants over 4 phases of testing to find a variant that is positive enough to be declared a winner. And Test Area 4 arrives at a winner on the third phase of testing with variant C.

Now I'm aware that this form of testing is both labour intensive and resource-heavy in it's undertaking. I was able to do this kind of testing because I was both motivated enough to dedicate resource to it and had enough ideas in the locker that I wanted to test for each test area and test wave. I used Google Optimizer to do it and coded the variants myself and the outcome has been, well staggering. A sustained uplift in the region of 18% for product purchase has been achieved (a personal best BTW) and to me I am reasonably confident in the results because the final variants I had, had reported consistently the same uplift over 9 separate waves of testing.
What I'm hoping for now is the counter-argument from my testing peers (drop me a line at farmerfudge@googlemail.com). I'm aware of the shortcomings of this approach but want others to have their say on this kind of testing methodology. Here's my bonfire, feel free to piddle all over it : ) Happy Testing!

UPDATE: One thing worth noting with this testing approach is that if it goes right your conversion rate for the test variants should improve for each wave where you attain, keep or build on positive performing variants but at the same time you will also see a diminishing uplift for each wave. This is because you are continually testing against improved and stronger performing variants in the test segment. Ultimately though you should still see a good uplift against the underlying original default design.

An Offline Call To Action

A recent MVT test using Google Website Optimizer answered the question.

"Exactly what impact does having an off-line Call To Action next to an on-line have?"

In this test I would measure the impact upon the click to apply rate on a landing page where using MVT I would serve up a link to a pop-up window which would show both a telephone sales number and a branch locator to a section of the page visitors.

During the test period, in addition to monitoring the test console results I monitored the Google Analytics report for the pop-up window.

Here's the summary of results:

675 Visitors saw the default (no offline CTA)

248 of which click Apply = 36.7% Conversion rate

678 Visitors saw the offline CTA variant

205 of which click Apply = 30.2% Conversion rate

The offline CTA variant is down –17.7% in Conversion rate

against the default page

The offline CTA pop-up received 569 Unique Views in the

test period. Therefore 83.9% of people who see an offline

CTA will click it.

Good old button testing

Everyone, at some point, does some optimisation testing of buttons designs. Some people think it's a trivial exercise to undertake when there's bigger fish to fry. Well I disagree, button design testing is exactly the kind of thing you can be doing quickly and easily with Google Optimiser or similar. We've done loads of testing in the past on buttons, testing colours, sizes, Apply text and so on, but I read an interesting article from Get Elastic on how unusual button designs can give you an easy uplift in conversion. So I tried over the course of a couple of months on a landing page testing all the designs you see here. No.1 was the default design, and the winner was...No.3 the 'boxed arrow' design with a 32% uplift in click to apply rate. The arrow-based designs were in the upper end of the winning designs overall, but the, *cough* phallus-based designs stole an early lead but didn't win out overall. Give it a go on your website, it's quick easy and surprisingly fun.

Google Experiments Follow-Up experiments

First off - What is a follow-up experiment?

Google says:

"

When the results of an experiment suggest a winning combination, you can choose to stop that experiment and run another where the only two combinations are the original and the winning combination. The winning combination will get most of the traffic while the original gets the remaining. This way, you can effectively install the winner and check to see how it performs against the original to verify your previous results."

And why should I run a follow-up experiment?

"Running a follow-up experiment will give you two benefits. First, it will enable you to verify the results of your original experiment by running a winning combination alongside the original. Second, it will maximize conversions, by delivering the winning combination to the majority of your users. We encourage you to run follow-up experiments to get the best, most confident results for any changes you make to your site."

But what happens when a follow-up experiment delivers contradictory results?

The screenshot below shows the original MVT test results....

I commenced a follow-up test running the the winning variant from this test in a head to head with the original default. And this is what happened...

The blue line is the original design beating the first test winning variant. This has happened time & again with my follow-up experiments. Then I noticed something. When you set up a follow-up experiment it's easy to overlook the weightings setting or the 'choose the percentage of visitors that will see your selected combination' option of a follow-up test. By default it's set to 95% for your selected combination.

Now I cant offer any explanation but from previous testing with other tools such as Maxymiser we've seen when you up-weight a particular variant in a test in favour of another, invariably it's conversion performance goes down, sometimes radically so. I recommend only doing a 50-50 weighting at anytime in any follow-up experiment because for whatever reason an unequal weighting seems to skew performance. Just be aware of this possibility and you'll be fine : )

If anyone can offer me a scientific explanation for this behaviour I'm all ears!

By the way, below shows the test after the weightings are reset to a 50/50 split. Bit different from the original follow-up experment no?

Google Optimizer - Landing Page A/B Testing

I launched a split test on one of our highly trafficed Current Account landing pages last week.

This basically saw the rather stale current champion page design challenged by a much more creative led design for the same page. The graphic above shows the readout for the visitor conversion rate which shows the creative led design as the outright winner (16.9% uplift in click to apply rate). However, I have tagged both pages so that we can identify which pages actually result in submitted current account applications, and the results show the original design leading over the creative led design. This is yet another example of how pretty design may compell people to click an apply button but if they havn't read the fine detail they're less likely to go through the entire end to end transactional process.

The original page design (champion) is shown below on the left and the creative led design (challenger) is shown below on the right.

An unbiased review of Google Experiments

I notice that a lot of reviews for Google Website Optimizer (GWO) have been done by what at first appear to be independent companies & individuals, but at second glance are actually affiliated or partnered to Google in some way. So this being the case, here's my unbiased opinion & experience of GWO for what it's worth. And please note: I am not pushing or offering any service here relating to Google.

Anyway, I've been itching to give GWO a trial for the past couple of years. Up until now our websites have not been conducive to implementing Google tags and besides which we've been using a managed testing service from Maxymiser. However a few things attract me towards GWO over a paid service.

First off, it's free. If you have the means to implement Google tracking tags into your web pages, why wouldn't you at least try a test or two?

Secondly, even if like our company your paying a third party company to build and run multivariate and AB tests for you on your behalf, no matter what contract you're on there's never enough time and resource to run with all testing concepts and ideas you might like. Which is why I've persisted with GWO for the past couple of months as it's relieved a bottleneck in our testing schedule being able to run quick & dirty tests on the fly alongside our more formal testing.

If your business is to be an optimization expert you need to try a variety of tools, especially the most commonly used one to add to your knowledge base.

We now have a sub domain now where we can do whatever we like on landing pages including testing and tagging with GWO. I've now conducted three landing page tests ranging from AB tests to MVT and just launched my fourth.

My first test was on a Personal Loans page where we tried a dozen different page designs and copy changes to see if we could get more people to apply for a loan. It was a fairly simplistic affair but has yielded a 14.77% uplift in visitor conversion by offering up a different page header and product introduction copy.
The second test was on a High Interest Current Account landing page (phase 1). This looked at simplifying overly complicated product information and has yielded a 20% uplift in conversion. I've just launched a phase 2 of this test where I champion/challenge the previous winner, a rather stodgy design against a more creative led design.
My third test was recreating a Bank Accounts comparison page from our main website but in a landing page environment. We then employed the usability web experts Foolproof to come up with an improved design for the page and test it (amongst other designs) against the default page. It's still running and yielding a 9% uplift in submitted current account applications to date.

Now the problems with GWO.

For whatever reason, aside from my most recent test, none of the tests have displayed progress data in the reporting interface of the GWO console! Fortunately in every test I have been able to tag each variant with a bespoke/unique tracking value which appears in our downstream sales database when someone submits an application . Because of this I have been able run successful tests with a reliable MI to go by.

Customer support for GWO is non-existent and you're reliable on forums to get any kind of useful information about troubleshooting. I have had to solve a number of glitches with the tests ran to date myself using significant time & effort!

Because I have web developer experience I am able to setup a test and build the page variants for my test also. But this is not the norm and I can imagine working with an internal or external agency to implement and update GWO test tags and content might make the whole experience unworkable.

Actually coding the test variants is badly handled in that they just give you a single unformatted text area to do your HTML editing in, so lot's of cutting and pasting from a conventional web editing tool is involved, such as in Dream Weaver or Visual Studio (even Notepad is better!).

The positives of GWO

I love the GWO console. Both thr reporting interface (below is the reporting interface showing a current MVT test in progress) and the test build interface are simplistic but highly functional in design. The step by step code implementation are really straight forward. You tell GWO what your original, test and conversion pages are and it tells you what code to paste into those pages to get the test going.

It allows you to weight test variants so that they are displayed to a set portion of your test audience, and the test follow-up feature is great too for validating the results of a concluded test.

If Google could come up with a solution that didn't involve placing tags on test and conversion pages then I think the technology could take off like a rocket.

In conclusion

So besides some teething problems with getting up to speed with GWO its been a good learning curve and I've been able to turn in some very reasonable test results to compliment my other testing.

If you want to see a previous article comparing GWO to other optimisation providers click here

You can hire my services

Blog Post Labels

First off - What is a follow-up experiment?

Google says:

"

And why should I run a follow-up experiment?

But what happens when a follow-up experiment delivers contradictory results?