Let's say you have a candidate AI and assert that it indeed has passed the above benchmark. How do you prove that? Don't you have to say which tasks?
Let's say you have a candidate AI and assert that it indeed has passed the above benchmark. How do you prove that? Don't you have to say which tasks?