Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not that they don't put in 'language like Bayesian', it's a different method. Yes, it is an improvement on the t-test straw-man they mention, but it's less flexible and powerful than Bayesian methods. Once you have a posterior, you can ask different questions that their p-values/confidence intervals don't address. For example, probability of an x% increase in conversion rate, or the risk associated with choosing an alternative. Not too mention multi-armed bandits, which not only are expected to arrive at an answer faster, but also maximize conversions along the way.


While I do agree that a sequential hypothesis test like the one we implemented in Stats Engine is different than a completely Bayesian method, I wouldn’t necessarily call it less powerful. In fact, numerous optimality properties exist showing that a properly implemented sequential test minimizes the expected number of visitors needed to correctly reject a null hypothesis of zero difference. I should note that our particular implementation does use some Bayesian ideas as well.

I agree that a benefit of Bayesian analysis is flexibility. Different posterior results are possible with different priors. But in practice this can be a hindrance as well as a benefit. When answer depend on a choice of prior, misusing, or misunderstanding the prior can lead to incorrect conclusions.

There is also a very attractive feature of Frequentist guarantees specifically for A/B testing. They make statements on the long-run average lift, which is a quantity that many businesses care about: what will my average lift be if I implement a variation after my A/B test?

That said, we have, and continue to look at Bayesian methods because we don’t feel that we have to be in either a Frequentist or Bayesian framework, but rather use the tools that are best suited to answer the sorts of statistical questions our customers encounter.

Finally, there have been some very interesting results lately on the connections between sequential testing and bandits! (for example, see here: http://auduno.github.io/SeGLiR/documentation/reference.html )


They make statements on the long-run average lift, which is a quantity that many businesses care about: what will my average lift be if I implement a variation after my A/B test?

Could you state clearly what this guarantee is? Unless I'm making a stupid mistake, such guarantees are impossible even in principle with frequentist statistics.


You do not want to use a classical bandit for A/B testing. The problem is that most bandit algorithms assume the conversion rate is constant - i.e., saturday and tuesday are the same. If sat and tues have different conversion rates, this will horribly break a bandit.

This is not a theoretical problem. I have a client who wasted months on this.

I know how to fix this (a Bayesian method, BTW), but I haven't published it. As far as I know, there is very little published research into using Bayesian bandits in assorted real world cases like this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: