It's not that they don't put in 'language like Bayesian', it's a different metho...

leo_pekelis · on Jan 20, 2015

While I do agree that a sequential hypothesis test like the one we implemented in Stats Engine is different than a completely Bayesian method, I wouldn’t necessarily call it less powerful. In fact, numerous optimality properties exist showing that a properly implemented sequential test minimizes the expected number of visitors needed to correctly reject a null hypothesis of zero difference. I should note that our particular implementation does use some Bayesian ideas as well.

I agree that a benefit of Bayesian analysis is flexibility. Different posterior results are possible with different priors. But in practice this can be a hindrance as well as a benefit. When answer depend on a choice of prior, misusing, or misunderstanding the prior can lead to incorrect conclusions.

There is also a very attractive feature of Frequentist guarantees specifically for A/B testing. They make statements on the long-run average lift, which is a quantity that many businesses care about: what will my average lift be if I implement a variation after my A/B test?

That said, we have, and continue to look at Bayesian methods because we don’t feel that we have to be in either a Frequentist or Bayesian framework, but rather use the tools that are best suited to answer the sorts of statistical questions our customers encounter.

Finally, there have been some very interesting results lately on the connections between sequential testing and bandits! (for example, see here: http://auduno.github.io/SeGLiR/documentation/reference.html )

yummyfajitas · on Jan 21, 2015

They make statements on the long-run average lift, which is a quantity that many businesses care about: what will my average lift be if I implement a variation after my A/B test?

Could you state clearly what this guarantee is? Unless I'm making a stupid mistake, such guarantees are impossible even in principle with frequentist statistics.

yummyfajitas · on Jan 21, 2015

You do not want to use a classical bandit for A/B testing. The problem is that most bandit algorithms assume the conversion rate is constant - i.e., saturday and tuesday are the same. If sat and tues have different conversion rates, this will horribly break a bandit.

This is not a theoretical problem. I have a client who wasted months on this.

I know how to fix this (a Bayesian method, BTW), but I haven't published it. As far as I know, there is very little published research into using Bayesian bandits in assorted real world cases like this.