I honestly think people really blow out of proportion the effect of "being in th...

I honestly think people really blow out of proportion the effect of "being in the training set". The internet is ridden with examples of problem/solution posts that many models definitely trained on, but still get wrong.

More important would be post training, where the labs specifically train on the exact question. But it doesn't seem like this is happening for most amateur benchmarks at least. All the models that are good at pelican bike have been good at whatever else you throw at them to SVG.