I am surprised by all the negative commentary here. On the whole, companies like Optimizely, RJMetrics, Custora, and others are doing more to push statistical analysis to the mass market than anyone else. These tools are not designed for statisticians or ML practitioners so it makes sense they do not put language like Bayesian, etc. front and center. IMO, the more people using data to make decisions, the better.
It's not that they don't put in 'language like Bayesian', it's a different method. Yes, it is an improvement on the t-test straw-man they mention, but it's less flexible and powerful than Bayesian methods. Once you have a posterior, you can ask different questions that their p-values/confidence intervals don't address. For example, probability of an x% increase in conversion rate, or the risk associated with choosing an alternative. Not too mention multi-armed bandits, which not only are expected to arrive at an answer faster, but also maximize conversions along the way.
While I do agree that a sequential hypothesis test like the one we implemented in Stats Engine is different than a completely Bayesian method, I wouldn’t necessarily call it less powerful. In fact, numerous optimality properties exist showing that a properly implemented sequential test minimizes the expected number of visitors needed to correctly reject a null hypothesis of zero difference. I should note that our particular implementation does use some Bayesian ideas as well.
I agree that a benefit of Bayesian analysis is flexibility. Different posterior results are possible with different priors. But in practice this can be a hindrance as well as a benefit. When answer depend on a choice of prior, misusing, or misunderstanding the prior can lead to incorrect conclusions.
There is also a very attractive feature of Frequentist guarantees specifically for A/B testing. They make statements on the long-run average lift, which is a quantity that many businesses care about: what will my average lift be if I implement a variation after my A/B test?
That said, we have, and continue to look at Bayesian methods because we don’t feel that we have to be in either a Frequentist or Bayesian framework, but rather use the tools that are best suited to answer the sorts of statistical questions our customers encounter.
They make statements on the long-run average lift, which is a quantity that many businesses care about: what will my average lift be if I implement a variation after my A/B test?
Could you state clearly what this guarantee is? Unless I'm making a stupid mistake, such guarantees are impossible even in principle with frequentist statistics.
You do not want to use a classical bandit for A/B testing. The problem is that most bandit algorithms assume the conversion rate is constant - i.e., saturday and tuesday are the same. If sat and tues have different conversion rates, this will horribly break a bandit.
This is not a theoretical problem. I have a client who wasted months on this.
I know how to fix this (a Bayesian method, BTW), but I haven't published it. As far as I know, there is very little published research into using Bayesian bandits in assorted real world cases like this.
I very much like that people are starting to care about data-driven decisions... However I find it quite aggravating that these tools don't use the best available methods. Optimizely is celebrating that they built a strange, proprietary solution to a very well studied problem.
The situation to me feels a lot like acupuncture, homeopathic medicine, etc. I agree that these doctors and patients have their hearts are in the right place... I just wish they'd channel that energy in a more positive direction. It's frustrating.
While our solution is different than the current industry standard in A/B testing platforms, all the techniques we are using have been around in the statistics literature for decades, and are tried and true. The particular sequential test of power one that we use has been around since the 1970s and goes back to the time of Herbert Robbins. And FDR control has been well documented in the past 25 years, most notably by Yoav Benjamini, and Yosef Hochberg. We really are standing on the shoulders of giants.
I think our biggest contribution is presenting a principled, powerful mathematical solution in a way that is accessible to practitioners without a formal statistical background. Even if you do have this knowledge, it’s a chance to use these methods without having to reinvent the wheel every time.
There are various methods which could have been used as solutions, and we looked at many different ones to determine a fit to the user model and experience Optimizely is presenting. We are currently doing an AMA on our community portal and I would be happy to discuss potential solutions or any other comments with you there, https://community.optimizely.com/t5/Product-What-s-New/Ask-m...
All optimizely, VWO and other such services provide is WYSIWYG editor and a redirect script. Some pretty (but meaningless) graphs and lots of bullshitting.
More importantly, I'm sure they have people who know that their "A/B tests" most definitely do not work as advertised, so they are misleading their customers on purpose.