Many years ago I was working for a large gaming company and I was the one who developed a very optimal and cheap way to split any cluster of users into A/B groups. The company was extremely happy with how well that worked. However I did some investigation on my own a year later to see how the business development people were using it and... Yeah, pretty much what you said. They were literally brute forcing different configuration until they(more or less) got the desired results.
Microsoft has a seed finder specifically aimed at avoiding a priori bias in experiment groups, but IMO the main effect is pushing whales (which are possibly bots) into different groups until the bias evens out.
I find it hard to imagine obtaining much bias from a random hash seed in a large group of small-scale users, but I haven't looked at the problem closely.
We definitely saw bias, and it made experiments hard to launch until the system started pre-identifying unbiased population samples ahead of time, so the experiment could just pull pre-vetted users.