Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe a stupid question, how do you verify the verification program? If an LLM is writing it too, isn’t it turtles all the way down, especially with the propensity of AI to modify tests so they pass?




Yes, you’re right, it is turtles all the way down. But, a huge part of the prover can be untrusted or proven correct by a smaller part of the prover! That leaves a small part that cannot prove itself correct. That is called the “kernel”.

Kernels are usually verified by humans. A good design can make them very small: 500 to 5000 lines of code. Systems designers brag about how small their kernels are!

The kernel could be proved correct by another system, which introduces another turtle below. Or the kernel can be reflected upwards and proved by the system itself. That is virtually putting the bottom turtle on top of a turtle higher in the stack. It will find some problems, but it still leaves the possibility that the kernel has a flaw that accepts bad proofs, including the kernel itself.


> how do you verify the verification program?

The program used to check the validity of a proof is called a kernel. It just need to check one step at a time and the possible steps can be taken are just basic logic rules. People can gain more confidence on its validity by:

- Reading it very carefully (doable since it's very small)

- Having multiple independent implementations and compare the results

- Proving it in some meta-theory. Here the result is not correctness per se, but relative consistency. (Although it can be argued all other points are about relative consistency as well.)


The proof is verified mechanically - it's very easy to verify that a proof is correct, what's hard is coming up with the proof (it's an NP problem). There can still be gotchas, especially if the statement proved is complex, but it does help a lot in keeping bugs away.

How often can the hardness be exchanged with tediousness though? Can at least some of those problems be solved by letting the AI try until it succeeds?

To simplify for a moment, consider asking an LLM to come up with tests for a function. The tests pass. But did it come up with exhaustive tests? Did it understand the full intent of the function? How would it know? How would the operator know? (Even if it's wrangling simpler iterative prop/fuzz/etc testing systems underneath...)

Verification is substantially more challenging.

Currently, even for an expert in the domains of the software to be verified and the process of verification, defining a specification (even partial) is both difficult and tedious. Try reading/comparing the specifications of e.g. a pure crypto function, then a storage or clustering algorithm, then seL4.

(It's possible that brute force specification generation, iteration, and simplification by an LLM might help. It's possible an LLM could help eat away complexity from the other direction, unifying methods and languages, optimising provers, etc.)


There is tons of work on this question. Super recent: https://link.springer.com/article/10.1007/s10817-025-09743-8



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: