Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is possibly a silly question but... when it comes to the threat of backdoor vulnerabilities inserted in the compiler itself, which would you say is the safer option? From what I can tell LLVM has a simpler codebase so that's a point in its favor, but I think GCC being a GNU project is less likely to have developers who could be pressured to insert malicious code.

Am I crazy in worrying about this? I know initiatives like reproducible builds are supposed to help solve that kind of threat, but it's still not clear to me how it all fits together.



"when it comes to the threat of backdoor vulnerabilities inserted in the compiler itself, which would you say is the safer option?"

Neither. They're equally unsafe with relatively low risk of this specific attack outside maybe distribution. Although clever idea, AsyncAwait's ideology doesn't work since spies will pose as them. Good news is the Karger compiler-compiler attack has only happened two or three times that I know of.

What's burned projects many more times and most worth worrying about are security-related compiler errors. They transform your code in a way that removes safety/security checks or adds a security problem (eg timing channel). So, the real problem is compiler correctness more than anything. That requires so-called certifying compilers that put lots of effort into ensuring each step or pass is correct. CompCert and CakeML are probably the champions there with formal verification. You could also do rigorous testing, SQLite-style, of each aspect of the compiler on top of using a memory-safe language. If you restrict features, then the bootstrapping can be done in an interpreter written and tested the same way before being ported by hand to designed-for-readability assembly.

It didn't stop there, though. Recent work in verification camp is doing compilation designed to be secure despite using multiple, abstraction levels such as source or assembly or mixed languages. Here's a nice survey on that:

http://theory.stanford.edu/~mp/mp/Publications_files/a125-pa...

The group putting it all together the most is DeepSpec. They have both papers and nice diagrams here:

https://deepspec.org/main


"What's burned projects many more times and most worth worrying about are security-related compiler errors. They transform your code in a way that removes safety/security checks or adds a security problem (eg timing channel)."

I'm not sure if the removal of safety or security checks were caused by compiler correctness issues rather than misunderstanding of language semantics / memory model of a programming language. If we take removal of a memset (to erase sensitive data) or of erroneous integer overflow checks (because of relying on undefined behavior) as an example, they are based on language / programmer error than compiler errors. These issues should be fixed at the language level so that first, a programmer can express his intentions more easily and second, that it is hard to write code which doesn't align with the programmers intention.


Yeah, most of them were tied to undefined behavior. Example with detection method:

https://pdos.csail.mit.edu/papers/stack:sosp13.pdf

One of the main ones that can do it without undefined behavior, or at least what I was told was undefined behavior, is optimizations getting rid of "dead" code. It doesn't have to be using memset: just an assignment. That assignment would sometimes get removed because the compiler thought nothing would be done with the assigned data. I never read if that was in C specification since it seemed to be a common problem in optimizations. Here's a recent solution just in case you find it interesting:

https://www.usenix.org/system/files/conference/usenixsecurit...

"These issues should be fixed at the language level so that first, a programmer can express his intentions more easily and second, that it is hard to write code which doesn't align with the programmers intention."

Being a fan of Ada, SPARK, and Rust, I can't agree with you more. The problem is legacy code, esp useful FOSS, that isn't getting ported any time soon. The OS's, web browsers, and media players come to mind. We need ways to analyze, test, and compile them that mitigate risks. Hence, all these projects targeting things like C.


> GCC being a GNU project is less likely to have developers who could be pressured to insert malicious code.

I'd agree with this, since GNU projects tend to have at least some devs who are in it for the ideology.


> From what I can tell LLVM has a simpler codebase so that's a point in its favor

That may have been true at one point, but I don't think it is any more.


Are you referring to the first or the second part of that sentence?


The simplicity of the LLVM source. Edited for clarity.


LLVM sources are equally, if not more so, complicated than GCC's sources. To make matters worse, LLVM functionality is broken up into dozens of libraries, so there's a fair bit more tracing that needs to be done to understand what's going on in Clang vs GCC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: