Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Will be interesting to see is there is some way to train a decompilation module based on who we know developed the application and use their previous code used as training. For example: Super Mario 64 and Zelda 64 were fully decompiled and a handful of other N64 games are in the process. I wonder if we could map which developers worked on these two games (maybe even guess who did what module) and then use that to more easily decompile any other game that had those developers working on it.

If this gets really good, maybe we can dream of having a fully de-obfuscated and open source life. All the layers of binary blobs in a PC can finally be decoded. All the drivers can be open. Why not do the OS as well! We don't have to settle for Linux, we can bring back Windows XP and back port modern security and app compatibility into the OS and Microsoft can keep their Windows 11 junk...at least one can dream! :D



If this gets really good, maybe we can dream of having a fully de-obfuscated and open source life. All the layers of binary blobs in a PC can finally be decoded. All the drivers can be open. Why not do the OS as well!

Decompilers already exist and are really good. If an LLM can do the same as these existing compilers, you can bet the lawyers will consider it an equivalent process. The main problem is legal/political, not technical.


I don't know if I'd call the output of modern de-compilers "very good", not for native code anyway. They're just a little better than raw disassembly. Even state of the art de-compilers struggle to reconstruct control flow, distinguish data from code, identify the presence of a variable, let alone its type, and they fundamentally lack context. If a LLM could be used even just to reliably reconstruct symbol information it would be game-changing.


I wrote my bachelor thesis on something tangential — basically, some researchers found that it was possible in some very specific circumstances to train a classifier to do author attribution (i.e. figure out who wrote the program) based just on the compiled binaries they produced. I don’t think the technique has been used for anything actually useful, but it’s cool to see that individual coding style survives the compilation process, so much so that you can tell one person’s compiled programs apart from another’s.


Do you mean the whole binary or just the text segment/instructions?

Because I think this gets a lot easier if you can look at the symbol table, strings, and codesigning certificate.


I doubt the code would be identifiable. It wouldn’t be the actual code written, but it would be very similar. But I assume many elements of code style would be lost, and any semblance of code style would be more or less hallucinated.


if it can make test from the decompiled code, we could reimplement it with our code style. might be cool to have some bunch of llms working together with feedback loops.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: