As much as I hate LLVM, and as many times as I've been burned by bad code it has...

eschaton · on Dec 31, 2011

You lost me at "As much as I hate LLVM."

What's to hate?

saurik · on Dec 31, 2011

I am often seen arguing against LLVM on purely philosophical grounds from my asserted position in the world of open devices and jailbreaking; specifically that it, and Clang, are now products heavily funded (nearly owned) by Apple with the goal of decreasing their reliance on a project (gcc) that was relicensed under GPLv3, an event that caused Apple to immediately retract all of their engineers from contributing code, or even merging newer versions. This opinion has been strongly forged after numerous dealings with Apple's open source release department, having to pester them over and over again to get updated versions of gcc, gdb, and WebCore for their various systems (most specifically the iPhone, for which they like to redact all open-source code).

w0utert · on Dec 31, 2011

So you are basically saying you hate Apple and the way they contribute to OSS (which IMO has been a very significant contribution benefiting many other OSS projects), not LLVM or Clang. Good to have that sorted out... :-/

dextorious · on Dec 31, 2011

"""I am often seen arguing against LLVM on purely philosophical grounds from my asserted position in the world of open devices and jailbreaking; specifically that it, and Clang, are now products heavily funded (nearly owned) by Apple with the goal of decreasing their reliance on a project (gcc) that was relicensed under GPLv3, an event that caused Apple to immediately retract all of their engineers from contributing code, or even merging newer versions"""

No, he means whats a valid, actual, technical reason to hate.

Also, not it's not just the GPLv3 transition --thought that gave Apple migrating away a huge boost--, it's also tons of technical inefficiencies in the ancient design of GCC.

As an example, you couldn't do XCode style AST-aware autocompletion with GCC without tons of hurt.

mikeash · on Dec 31, 2011

Once you invoke undefined behavior, all bets are off. If the result of alloca(0) is undefined (and reading the man page, it appears to be) then any return value is valid. Likewise, the optimizer is allowed to assume that alloca(0) never happens, because once you invoke undefined behavior, anything can happen.

C is a dangerous language. A large part of that danger is to allow compilers to produce efficient code. A major point of "undefined behavior" is to allow compilers to assume that such things can never happen, and thus avoid generating code that has to deal with them.

To take a more sane example:

    int *x = ...;
    *x = 42;
    if(x == NULL)
        foo();

A C compiler would be entirely within its rights to completely delete both the if check and the call too foo() in this code. It's illegal to dereference a NULL pointer, so the compiler may assume that that code path can never happen.

Now, you may set up your system so that 0x0 is actually a valid address that can be written to, with clever mmap tricks or whatever. You may then initialize x with NULL and run this code, expecting the dereference to work, and foo() to be called. From a naive point of view, that's what should happen. However, even though 0x0 is a valid address in this environment, dereferencing it is still undefined behavior according to the language you're using. If you manually wrote the above in assembly you're fine, but the moment you do this in C, all bets are off.

It's the same deal with alloca(0). You can't count on the return value being anything in particular, and simultaneously the compiler can assume that the return value is anything it feels like.

Edit: thought of a more pertinent example. Consider the following:

    int x = INT_MAX;
    int y = 1;
    int z = -INT_MAX - 1;
    if(x + y == z)
        foo();
    else
        bar();

Directly translating this code to assembly and running it on any modern architecture would result in foo() being called (assuming I didn't screw up my arithmetic). The addition of x + y wraps around, and with a two's complement representation this results in the most negative integer being produced. That same integer is generated more directly in z, and the two compare equal.

But! integer overflow produces undefined results in C. The above program is ill-formed, and the compiler is completely within its rights to assume that the conditional is never true, because the moment you wrote x + y you gave up any right to expect any particular value in the result.

Interestingly, the version of clang on my computer optimizes the above to always call foo(), rather than always call bar(). However, either choice would be correct.

One certainly could get used to integer overflow always wrapping around according to two's complement and expect the above code to work. Upon encountering a compiler that optimizes the above to always call bar(), one might first suspect that the compiler is broken. However, it is the code that is broken, and the compiler is correct.

saurik · on Dec 31, 2011

A) I am pretty certain that alloca(0) is not "undefined" per C, as I'm pretty certain the C standard does not mention alloca at all. As far as the C language is concerned, alloca is a function like any other function.

(edit: To be 100% clear of the ramifications of this, even if alloca itself is implemented using horrible undefined black magic, a pedantic C compiler would not have advanced knowledge of that happening inside the function, and could not prematurely optimize it away.)

(Note: I use the term "pedantic" in this edit to describe such a C compiler, as the real-world behavior of practical systems does not conform to the view that "undefined" means "could order a kill strike on your children".)

B) Your example is kind of off, btw, as NULL and a pointer with dynamic value 0x0 do not mean the same thing: I am allowed to deference a pointer that is at the address 0x0, and the dynamic value of NULL need not be 0x0. ;P

(edit: This example was deleted by the poster I am responding to, but was an example involving comparison of a pointer value to NULL being allowed to be optimized to false if it had been previously dereferenced.)

(edit:) C) Your new example is at least internally consistent, but is still specifically relying on behavior that is undefined. The C standard does not define any undefined behavior with respect to calling the function "alloca" any more than it does calling the function "hello".

If the behavior of the function itself is undefined with respect to being passed 0, that does not affect the language's implementation of what to do when calling that function: it does not know how undefined it is.

What is actually going on here is that gcc has an optimized version of alloca that it declares as a "builtin"; this is both to make alloca itself performant (one instruction), but also to allow it to make further optimizations in the function.

These optimizations should be compliant with the C language standard, and in this case they are not. Honestly, in the real world (as opposed to pedantic standard land), that's fine, but this case is just egregiously confusing.

In fact, sufficiently confusing that I can't imagine the developers of llvm-gcc would not consider it a bug; in essence, gcc and llvm's translation layers are being layered, and they are interacting "poorly".

However, as llvm-gcc is a discontinued product, this bug will not get fixed. However, this is currently the best compiler that Apple has provided us for use on their platforms as of Xcode 4.2, which is "unfortunate".

(Note: I just say "unfortunate". It is not necessarily "horrible"; it is simply "unfortunate". There are many bugs in llvm-gcc that are not present in gcc, and it is "unfortunate" that Apple hates GPL3 sufficiently to have not only thrown a ton of money at replacing it, but have now even stopped shipping the old stalwart.)

mikeash · on Dec 31, 2011

A C compiler is not required to know what the alloca() call does, but it is allowed to do so. If your call to alloca() resolves to the one in your system's standard C library, the compiler is allowed to make use of any knowledge it may have about that call, how it works, and what its semantics are. It is then perfectly legal for it to make optimizations based on those semantics.

I know that NULL isn't necessarily 0x0. My example simply assumes that it is, which is usually the case. Note that even if NULL is not 0x0, and the address that is 0x0 is valid, it's still not legal to do * (int * )0 = 0, as (int *)0 is NULL, regardless of the actual underlying value of NULL.

mikeash · on Dec 31, 2011

"This example was deleted by the poster I am responding to, but was an example involving comparison of a pointer value to NULL being allowed to be optimized to false if it had been previously dereferenced."

What are you talking about? I didn't delete anything.

saurik · on Dec 31, 2011

Sorry, I saw the new example appear and I thought it had appeared into the slot where the old one had been. That is my mistake for finding myself in a confusing edit/edit thread and not managing to internally track the diffs well enough.