Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

and what prompt you gave them to generate program? Did you tell explicitly that they need to fill cornered cells? If yes, it is not what benchmark is about. Benchmark is to ask LLM to figure out what is the pattern.

I entered task to Claude and asked to write py code, and it failed to recognize pattern:

To solve this puzzle, we need to implement a program that follows the pattern observed in the given examples. It appears that the rule is to replace 'O' with 'X' when it's adjacent (horizontally, vertically, or diagonally) to exactly two '@' symbols. Let's write a Python program to solve this:



arc reasoning challenge. I'm going to give you 2 example input/output pairs and then a third bare input. Please produce the correct third output.

It used its COT to understand cornering -- then I got it to write a program.

But as I try again, it's not reliable.


> But as I try again, it's not reliable.

this is why I will never try anything like this on a remote server I don't control. all my toy experiments are with local llms that I can make sure are the same ones day after day.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: