Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if the authors can explain the aparent inconsistency between what we now know about R1 and their statement “They don’t engage in logical reasoning” from the first lesson. My simple-minded view of logical reasoning by LLMs is that the hard question (say a math puzzle) has a verifiable answer that is hard to produce and is easy to verify, yet within the realm of knowledge of humans or the LLM itself, so the “thought” stream allows the LLM to increase its confidence by a self-discovered process that resembles human reasoning, before starting to write the answer stream. Much of the thought process that these LLMs use looks like conventional reasoning and logic, or more generally higher level algorithms to gain confidence in an answer, and other parts are not possible for humans to understand (yet?) despite the best efforts by DeepSeek. When combined with tools for the boring parts, these “reasoning” approaches can start to resemble human research processes as per the Deep Research by OpenAI.


I think part of this is that you can't trust the "thinking" output of the LLM to accurately convey what is going on internally to the LLM. The "thought" stream is just more statistically derived tokens based on the corpus. If you take the question "Is A a member of the set {A, B}?", the LLM doesn't internally develop a discrete representation of "A" as an object that belongs to a two-object set and then come to a distinct and absolute answer. The generated token "yes" is just the statistically most-likely next token that comes after those tokens in its corpus. And logical reasoning is definitionally not a process of "gaining confidence", which is all an LLM can really do so far.


As an example, I have asked tools like deepseek to solve fairly simple Sudoku puzzles, and while they output a bunch of stuff that looks like logical reasoning, no system has yet produced a correct answer.

When solving combinatorics puzzles, deepseek will again produce stuff that looks convincing, but often makes incorrect logical steps and ends up with wrong answers.


Then one has to ask: is it producing a facsimile of reasoning with no logic behind it, or is it just reasoning poorly?


Teaching an LLM to solve a full sized Sudoku is not a goal right now. As an RLHF I’d estimate it would take 10-20 hours for a single RLHF’er to guide a model to the right answer for a single board.

Then you’d need thousands of these for the model (or next model) to ingest. And each RLHF’s work needs checking which at least doubles the hours per task.

It can’t do it because RLHF’ers haven’t taught models on large enough boards en masse yet.

And there are thousands of pen and paper games, each one needing thousands of RLHF’ers to train them on. Each game starting at the smallest non trivial board size and taking a year for a modest jump in board size. Doing this in not in any AI company’s budget.


If it were actually reasoning generally, though, it wouldn't need to be trained on each game. It could be told the rules and figure things out from there.


Even worse, the LLM is supposed to already "know" Sudoku rules. Either that, or it doesn't "know" anything that was scraped from the web...


Here is o3-mini on a simple sudoku. In general the puzzle can be hard to explore combinatorially even with modern SAT solvers, so I picked one marked as “easy”. It looks to me like it solved it but I didnt confirm beyond a quick visual inspection.

https://chatgpt.com/share/67aa1bcc-eb44-8007-807f-0a49900ad6...


And thus we have the AI problems in a nutshell. You think it can reason because it can describe the process in well written language. Anyone who can state the below reasoning clearly "understands" the problem:

> For example, in the top‐left 3×3 block (rows 1–3, columns 1–3) the givens are 7, 5, 9, 3, and 4 so the missing digits {1,2,6,8} must appear in the three blank cells. (Later, other intersections force, say, one cell to be 1 or 6, etc.)

It's good logic. Clearly it "knows" if it can break the problem down like this.

Of course if we stretch ourselves slightly to actually check beyond a quick visual inspection you'd quickly see it actually put a second 4 in that first box despite "knowing" it shouldn't. In fact several of the boxes have duplicate numbers, despite the clear reasoning aboving.

Does the reasoning just not get used in the solving part? Or maybe a machine built to regurgitate plausible text, can also regurgitate plausible reasoning?


Thanks for spotting this. The solution is indeed wrong. And I agree that the machine can regurgitate plausible reasoning in principle. If it run in a loop, I would bet that it could probably figure this particular problem out eventually, but not sure it matters much in the end. The only plausible way for some of these Sudoku puzzles is a SAT solver and I'm sure that if given the right environment an LLM could just code and execute one and get the answer. Does that mean it can't "reason" because it couldn't solve this Sudoku puzzle, or know that it made a mistake. I'm not sure I'd go this far, but I agree that my example didn't match my claim. The model didnt do a careful job and didn't quadruple check its work as I would have expected from an advanced AI, but remember that this is o3-mini, and not something that is supposed to be full-blown AI yet. If you asked GPT-3.5 for something similar the answer would have been amusingly simplistic, not it is at least starting to get close.

I now wonder if I had a typo when I copied this puzzle from an image to my phone app thus rendering it unsolveable.. the model should still have spotted such an error anyways but ofc it is not tuned to perfection


Yeah I think this was a wrong puzzle to try according to:

https://sudoku.com/sudoku-solver

A bummer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: