Give everything a default traffic weight of, say, 10. Make it alterable as above. Divide traffic proportionately between all versions according to their weight.
The reason to do it this way is so that when you go from 5 versions to 4, then 4 to 3, then 3 to 2 you just eliminate versions and don't have to calculate the percentages to rebalance. Trust me, calculating percentages gets very old, very fast.
While you're at it, add the ability for versions to be marked as not to be reported on. This allows people to have versions test, control, and unreported control. You start with a lot in unreported control. You ramp up the test by dropping the weight of unreported control.
+1 on the unreported control. It's a common use case to start an A/B test at a small test percentage, then ramp up over time. Without an unreported control, you can't do that and have valid samples for each group.
(E.g. your sample mix will be invalid if you start at 10 test / 90 control then ramp up to 30/70. But if you start 10 test / 10 control / 80 unreported, then ramp to 30/30/40, your samples will be valid.)
Let's hypothetically say you have a new feature Foo. Foo is under active development and works on the test and staging environments, but you're concerned it might not be ready for prime time. You first release Foo to your staff's accounts on the production servers. After they break Foo in a while, you roll it out to 10% of the user base selected randomly, while watching your automated instrumentation to see how it reacts (does it blow up anything? do users care about it? does anyone actually use the thing?). After you've proven Foo out you release it to the entire userbase. Should you at some point have a problem with Foo, you desire the ability to yank it back from all users while you get back to tinkering on it privately.
Feature flags are a way to do that. By happy coincidence, they share semantics almost verbatim with A/B testing. (At a high level of abstraction, the most interesting API is basically User#should_see?(feature_name_goes_here). They typically have a bit more going on in the API than that -- for example, the ability to assign users to groups (like, say, "our employees", "friends & family", "our relentlessly dedicated True Fans (TM) who are willing to suffer the odd bug", "10% of people who signed up last Monday", etc) and assign groups as being able to view a feature. There is often a UI visible for that.
I'm broadly in favor of any technology which increases the number of firms which are able to make A/B testing a routine practice at their organizations. This solves an issue which occurs in some deployments. It being available as OSS is therefore unmitigated good news, though I probably won't use it myself.