It's not that they don't put in 'language like Bayesian', it's a different method. Yes, it is an improvement on the t-test straw-man they mention, but it's less flexible and powerful than Bayesian methods. Once you have a posterior, you can ask different questions that their p-values/confidence intervals don't address. For example, probability of an x% increase in conversion rate, or the risk associated with choosing an alternative. Not too mention multi-armed bandits, which not only are expected to arrive at an answer faster, but also maximize conversions along the way.
While I do agree that a sequential hypothesis test like the one we implemented in Stats Engine is different than a completely Bayesian method, I wouldn’t necessarily call it less powerful. In fact, numerous optimality properties exist showing that a properly implemented sequential test minimizes the expected number of visitors needed to correctly reject a null hypothesis of zero difference. I should note that our particular implementation does use some Bayesian ideas as well.
I agree that a benefit of Bayesian analysis is flexibility. Different posterior results are possible with different priors. But in practice this can be a hindrance as well as a benefit. When answer depend on a choice of prior, misusing, or misunderstanding the prior can lead to incorrect conclusions.
There is also a very attractive feature of Frequentist guarantees specifically for A/B testing. They make statements on the long-run average lift, which is a quantity that many businesses care about: what will my average lift be if I implement a variation after my A/B test?
That said, we have, and continue to look at Bayesian methods because we don’t feel that we have to be in either a Frequentist or Bayesian framework, but rather use the tools that are best suited to answer the sorts of statistical questions our customers encounter.
They make statements on the long-run average lift, which is a quantity that many businesses care about: what will my average lift be if I implement a variation after my A/B test?
Could you state clearly what this guarantee is? Unless I'm making a stupid mistake, such guarantees are impossible even in principle with frequentist statistics.
You do not want to use a classical bandit for A/B testing. The problem is that most bandit algorithms assume the conversion rate is constant - i.e., saturday and tuesday are the same. If sat and tues have different conversion rates, this will horribly break a bandit.
This is not a theoretical problem. I have a client who wasted months on this.
I know how to fix this (a Bayesian method, BTW), but I haven't published it. As far as I know, there is very little published research into using Bayesian bandits in assorted real world cases like this.