One of the key aspects of ARC is that its testing dataset is secret.
The usefulness of the ARC challenge is to figure out how much of the "intelligence" that current models trained on the entire internet is an emergent property and true generalization or how much it is just due to the fact that the training set truly contains an unfathomable amount of examples and thus the models may surprise us with what appears to be genuine insight but it's actually just lookup + interpolation.
The usefulness of the ARC challenge is to figure out how much of the "intelligence" that current models trained on the entire internet is an emergent property and true generalization or how much it is just due to the fact that the training set truly contains an unfathomable amount of examples and thus the models may surprise us with what appears to be genuine insight but it's actually just lookup + interpolation.