Vulnerability in Microsoft TLS library could allow remote code execution

mike_hearn · on Nov 11, 2014

After Heartbleed, I decided to run a Java servlet container as my direct web server so I can use JSSE rather than OpenSSL or some other SSL stack written in C. It seems to me that if you can, an SSL stack written in a safe language is not a bad idea. JSSE is pretty modern though doesn't support every possible feature (it's missing OCSP stapling at the moment). But it can do forward secrecy and AES-GCM. I get an A- from the Qualys test: for some reason PFS doesn't work with IE and it doesn't like that my cert has a SHA1 based signature (I'll go get my cert reissued at some point). Oh and the SCSV fallback hack is missing. Otherwise it's doing OK. And ... no buffer overflows.

richm44 · on Nov 11, 2014

You do realise that JSSE has lots of timing attacks etc. and that's pretty much unfixable in managed code? It's also had numerous bugs in it's SSL/TLS implementation (mainly because no one uses it) such as mishandling zero length extensions (used for flagging support for a feature), failing to handle DH params that aren't a multiple of 64 bits (no sane reason) etc.

mike_hearn · on Nov 11, 2014

Yes, I know there have been timing attacks, and the ones I know about were fixed. So I'm not sure they're pretty much unfixable.

Regardless, I am more afraid of buffer overflows and general memory management errors than I am of timing attacks. Heartbleed was orders of magnitude easier to exploit than (say) the Bleichenbacher attacks that JSSE has been vulnerable to.

richm44 · on Nov 11, 2014

I'm saying they're unfixable because there's no way to tell the JIT to create constant-time code. Certainly Christopher Meyer has said that he's reported some that remain private since they've not been fixed.

Heartbleed was certainly easier to exploit, than a timing attack - no disagreement there. But the Java SSL stack is pretty flakey and largely seems to be unused. It has lots of interoperability problems with other implementations and generally seems immature. It's not something I'd trust in production myself.

mike_hearn · on Nov 11, 2014

Right - it is looking more and more like truly constant time code can only be obtained by using hand written assembly or hardware. In that eventuality, I guess the core crypto code in the JVM will have to be changed to use hand written assembly as well, however, SSL is huge and most of it doesn't need to be constant time (certificate parsing, managing network buffers, message parsing etc). That is often where the bugs creep in.

Going with JSSE does have the downside that it's less widely used. Browser makers don't patch it with the latest gizmos like they do with OpenSSL. However, it's at least got a full time development team (unlike OpenSSL until very recently), and ultimately if there are odd conformance bugs lurking the only way they'll be shaken out is by people using it for real. Passing the Qualys test is a good start, but for now my little website is ideal because it is unlikely to ever be popular enough to have serious scaling issues and I can tolerate some incompatibility with odd clients.

ars · on Nov 11, 2014

> can only be obtained by using hand written assembly or hardware

Even that is harder than it looks.

For example, a common way of doing a constant time string compare is:

    $result |= (ord($safe[$i % $safeLen]) ^ ord($user[$i]));

i.e. get the character to compare mod the length, so you wrap around if the length of the two strings is not the same.

Unfortunately the mod operator does not take a constant time on a CPU! If the result is evenly divisible it's faster.

So even in assembly you might not be able to protect from a timing attack. Even if you check your current CPU a new one might be different.

(In this case a better way to do the compare is if the strings are not the same length, compare the attackers string against itself, and then return false.)

annnnd · on Nov 12, 2014

For this case, could one just introduce another mod operation which produces result + 1? In this case:

    ($i + 1) % $safeLen

Even so, I guess the only way to eliminate timing attacks would be to write a set of primitive operations (XOR, AND, OR, ORD,...) which are constant-time (on the supported hardware platform) and only use those wherever the private information is handled. Is this possible to do? Are there any attempts at this?

nitrogen · on Nov 12, 2014

(In this case a better way to do the compare is if the strings are not the same length, compare the attackers string against itself, and then return false.)

Is there a reason for using the attacker's string rather than the valid string? Edit: it doesn't seem to me that using the valid string would leak its length unless the attacker knew quite a lot about the target machine.

kentonv · on Nov 12, 2014

> Edit: it doesn't seem to me that using the valid string would leak its length unless the attacker knew quite a lot about the target machine.

It sounds like you are skeptical that timing attacks can be practically exploited. This is a reasonable skepticism that most people go through on initially learning about them. Unfortunately, there are in fact working exploits for this sort of thing.

nitrogen · on Nov 12, 2014

No, I definitely realize the seriousness of timing attacks. I'm just curious about this particular implementation detail. I would have guessed that using the attacker's length would reveal more about the system than using a fixed length (unless the length of the valid string changed frequently). I'd like to understand why that isn't the case.

kentonv · on Nov 13, 2014

Ah, I see, you're assuming the "valid" string is fixed-width across all requests. I guess in that specific case you might be right. But it's easy for that not to be the case: e.g. if there are several different resources an attacker can request that have different-length strings. For example, an attacker might scan for users with short (=bad) passwords by comparing the time it takes to complete an authentication check against various users. So the safe thing to go is to go with the attacker's string.

plorkyeran · on Nov 12, 2014

How long it takes to do the comparison reveals the length of the string being checked, so you have to check the string which the attacker knows the length of already.

jamesaguilar · on Nov 12, 2014

Question: if you know the attacker doesn't have local access, is it sufficient to simply introduce random response delays to mitigate JVM-JIT level timing differences? Even if you only delayed responses by a few hundred micros on average (with randomness), I imagine all but the smallest amount of fast-path/slow-path entropy would be lost in the noise.

cjg · on Nov 12, 2014

It requires the attacker to acquire more data - perhaps so much data that the attack is not feasible - but it doesn't rule out a timing attack.

JonnieCache · on Nov 12, 2014

It's pretty easy to filter out the randomness using statistics.

As I understand it, the correct thing to do is to derive a sleep time from hashing the request content, along with some secret. This makes the delay "random" from the attackers point of view, but still deterministic and therefore impossible to filter out.

Note: I am not a security person, just an interested bystander. Take this half-remembered advice with a pinch of salt.

TheLoneWolfling · on Nov 12, 2014

That won't work if there's an irrelevant parameter in the request that can be varied. Say spaces ignored somewhere or additional parameters ignored, for instance.

tedunangst · on Nov 12, 2014

What is an actual attack scenario where leaking string length mattered?

brianberns · on Nov 12, 2014

If you know the length of a user's password, it narrows the search space considerably.

tedunangst · on Nov 12, 2014

1. Please don't store raw passwords and compare them with strcmp().

2. Less than you'd think. e.g., the total count of all 1 and 2 and 3 digit numbers is only 11% of the count of 4 digit numbers. (And that stat gets worse with just lower case letters.) Searching all shorter passwords ends up being an insignificant amount of time compared to searching all correctly sized passwords.

brianberns · on Nov 12, 2014

I agree. Perhaps there's a more practical example?

richm44 · on Nov 11, 2014

"Right - it is looking more and more like truly constant time code can only be obtained by using hand written assembly or hardware. In that eventuality, I guess the core crypto code in the JVM will have to be changed to use hand written assembly as well, however, SSL is huge and most of it doesn't need to be constant time (certificate parsing, managing network buffers, message parsing etc). That is often where the bugs creep in."

I completely agree with you there. X.509 is a nightmare and it's not an area where constant time matters at all. Being resistant to buffer overruns and general logic errors is much more important there.

dvanduzer · on Nov 12, 2014

X.509 is a nightmare because it is complex, and the political infrastructure is vulnerable.

There is no reason ciphers like djb's that use data-independent code paths couldn't be integrated into X.509 if the will was there. Totally separate issues.

pjmlp · on Nov 11, 2014

> I'm saying they're unfixable because there's no way to tell the JIT to create constant-time code.

Your fallacy is that there are several JVMs to choose from, each with its own set of JIT and AOT compilers.

Don't judge Java by OpenJDK, it is only the reference implementation.

Someone1234 · on Nov 11, 2014

> I'm saying they're unfixable because there's no way to tell the JIT to create constant-time code.

What does that even mean? If you write your fail and success states to follow the same exact code paths (i.e. no branches, breaks, returns, or similar for failed) then you've created a "constant time" function.

The inputted parameters are dynamic so the JIT-compiler cannot optimise code away, and even if it could it would do so equally for both the failed or success states.

> Certainly Christopher Meyer has said that he's reported some that remain private since they've not been fixed.

I tried Googling that but nothing. Can you link whatever it is you're talking about?

mike_hearn · on Nov 11, 2014

He's talking about a couple of recent very clever timing attacks on JSSE. However they are not comparable to Heartbleed or this SChannel overflow in severity at all:

(1) They require you to have a pre-recorded SSL session you're trying to crack. Random script kiddies won't find it useful.

(2) They require sending a lot of traffic to the server to try and break that single recording (you can get caught quite easily!)

(3) They don't reveal the private key, only per connection premaster secrets (if I understood correctly).

Basically, the first attack was possible because JSSE returned "internal server error" when faced with a particular kind of malformed packet instead of "bad padding" - this binary yes/no thing was enough to allow the attacker to divine some internal state and with enough queries retrieve the PMS. The second was a similar trick but instead of observing a different error code, it exploited the fact that internally the code was throwing an exception and this made a difference of some microseconds in the response. Over a LAN, with enough queries, that was enough to eventually reveal the PMS. However when running over a much noisier environment like the internet, number of queries required goes up quite a bit. I believe, they did not test that.

As I said before, they were both fixed, and I think Rich now agrees with me that they were fixed. There may be others that would require compiler support or hand coding of assembly to fix. Certainly there's nothing in the design of Java or the JVM that makes that impossible however. Stuff handling key material is only a small part of the overall SSL picture.

richm44 · on Nov 11, 2014

Yeah, like I said in my other comment, I'd misread which issues have been fixed. I still stand by point that for constant time you need to be close to the metal but I think we're in agreement there anyway.

I think using native code for the low-level implementation of the ciphers, padding etc. would be a good move from a security point of view and is the only way to get constant time implementation. As you say, there's nothing that prevents that in the design of Java.

richm44 · on Nov 11, 2014

Sorry, I realised I'd failed to address your other point:

"What does that even mean? If you write your fail and success states to follow the same exact code paths (i.e. no branches, breaks, returns, or similar for failed) then you've created a "constant time" function."

Well, if you've got a JIT that is analysing which code actually executes, and no control over the optimisations then how do you achieve this? If the hotpath is normally the success condition for example then the JIT will optimise that more. The rare and more dangerous condition will be optimised less. If this happens of course will depend on the engine, but you can't control that without working at a much lower level than managed code offers.

Someone1234 · on Nov 11, 2014

> Well, if you've got a JIT that is analysing which code actually executes, and no control over the optimisations then how do you achieve this?

The same is true with native code optimising.

> If the hotpath is normally the success condition for example then the JIT will optimise that more.

The paths should be identical for both failure and success states. That's fundamentally how you "fix" timing attacks.

> If this happens of course will depend on the engine, but you can't control that without working at a much lower level than managed code offers.

You write code that has no hot paths regardless of if correct/wrong. That's how timing attacks are mitigated in all languages.

richm44 · on Nov 11, 2014

> The same is true with native code optimising.

This is rubbish. If I use custom assembler then that's what gets run. Even if I use C code then I know what the result will be since I can actually check.

> The paths should be identical for both failure and success states. That's fundamentally how you "fix" timing attacks.

That's certainly the ideal. But it's impossible - one result will succeed the other won't. The aim is to make sure both take the same time which is an achievable goal, however to do this you need to know what will execute. You might have that guarantee in practice on a particular JVM for example, but the whole point of using something like a JIT is that it will be smart about optimising stuff based on what actually happens. That's normally great, but this is a situation where all that needed is predictability not performance.

Someone1234 · on Nov 11, 2014

> This is rubbish. If I use custom assembler then that's what gets run. Even if I use C code then I know what the result will be since I can actually check.

It is "rubbish?" Nobody does that. Most crypto libraries are written in C or C++.

You can pre-JIT (AOT) managed code and check there too.

> That's certainly the ideal. But it's impossible - one result will succeed the other won't.

It is absolutely possible with high coding standards. Keep in mind you only have to write "correct" code in sections which have access to things like crypto keys (or things that are derived from similar), since that's what at risk with timing attacks.

> You might have that guarantee in practice on a particular JVM for example, but the whole point of using something like a JIT is that it will be smart about optimising stuff based on what actually happens.

If the code paths are identical (failure and success) what is it optimising out exactly?

richm44 · on Nov 11, 2014

> It is "rubbish?" Nobody does that. Most crypto libraries are written in C or C++. > You can pre-JIT (AOT) managed code and check there too.

Take a look. You'll see that for example openssl uses perl to generate assembly, and nettle also uses native code. This is needed if you want to use things like the AES instructions on modern CPUs.

nitrogen · on Nov 12, 2014

If the code paths are identical (failure and success) what is it optimising out exactly?

JITs can create separate versions of a function for different inputs. A good JIT can turn one general case function into multiple optimized special cases using techniques that include eliminating pseudoconstants (variables that always have the same or a small number of values) and skipping computation of never-referenced results. Even normal compilers can do a lot of that.

yohanatan · on Nov 12, 2014

I think a succinct way to put it is: on-the-fly "profile-guided optimization". A good JITter will do this automatically (as you said).

richm44 · on Nov 11, 2014

Certainly. I was referring to this blog post, http://armoredbarista.blogspot.co.uk/2014/04/easter-hack-eve... however rereading it I see he was saying it was the ones in the hardware accelerators that haven't yet been fixed.

tptacek · on Nov 11, 2014

It is not a good idea to act on the premise that things like the BB e=3 or BB "million message attack" are hard to exploit. They aren't.

The e=3 vulnerability (or its more modern "BERserk" variant) are arguably much easier to exploit than Heartbleed. They allow you to --- offline, in advance --- make your own valid CA certificates.

Someone1234 · on Nov 11, 2014

> You do realise that JSSE has lots of timing attacks etc. and that's pretty much unfixable in managed code?

That makes no sense at all. Timing attacks aren't more or less exploitable in managed code than unmanaged code. Timing attacks are often the result of optimisations within the crypto library which inadvertently give away information, for example a loop which breaks on X != Y, instead of setting a failed = false bool and continuing to iterate through the rest of the array.

Please explain how managed code makes timing attacks more likely.

xnull · on Nov 11, 2014

> Timing attacks are often the result of optimisations within the crypto library which inadvertently give away information, for example a loop which breaks on X != Y, instead of setting a failed = false bool and continuing to iterate through the rest of the array.

I would say this is false. Simple differences in time caused by cache line ejection in table-lookup implementations of AES provide a very strong timing attack. (http://cr.yp.to/antiforgery/cachetiming-20050414.pdf)

In RSA (and in fact DL based cryptosystems), modular exponentiation without extreme care leak tons of timing information about private exponents. 'Blinding' is one way to handle this, but performant solutions typically fiddle at the bit level and exploit CPU guards and features to minimize branch prediction/cache line/etc leaks.

In higher level languages absolute control and care of crypto implementations can not be taken and the JIT layer adds another layer of obfuscation (though I know of no attack employing that...).

The out for memory safe languages is to provide built in crypto operations that have been implemented at a lower level.

ufo · on Nov 12, 2014

I'm curious now: how do the AES implementations nowadays avoid the timing attack explained in that paper? From what I understood, its very hard to write an efficient AES implementation without using input-dependent table lookups.

xnull2guest · on Nov 12, 2014

For the most part by bitslicing. Some implementations calculate the S-box explicitly using the algebraic relationships in the finite field but doing so is awfully slow.

xnull2guest · on Nov 14, 2014

I should add here that I met an incredibly intelligent young man named Julian from Dartmouth and doing some work with MIT who is proving with COQ and a model of a CPU that his implementation of cache lookups for (various) crypto algorithms results in exactly the same line patterns and the number of cpu ticks is similarly invariant. Some people go the extra mile.

Someone1234 · on Nov 11, 2014

You say it's "false" but then fail to explain why. None of your examples offer that, and your whole explanation boils down to "managed languages are more complex, therefore worse."

Please point me to the specific native features which mitigate timing attacks. Because the majority of fixes I have seen are purely in altering the libraries themselves using high level constructs to remove hot paths and make it so both failure and success state take a constant time to execute (which has nothing to do with managed/unmanaged code).

tptacek · on Nov 11, 2014

The issue isn't that native code has special features that mitigate timing attacks. It's that you can look at native code and predict its side effects more easily than you can with high-level code.

Another important difference between native code and high-level code is that timing leaks in high-level code tend to be larger. For instance, it's very difficult to exploit a memcmp timing leak in practice. But Java's string comparison, depending on your JVM, is exploitable over the Internet.

For what it's worth: I wouldn't select C over Java simply to avoid timing attacks. Side channels in JVM code are a legit concern, but not a dispositive one.

Someone1234 · on Nov 11, 2014

[flagged]

richm44 · on Nov 11, 2014

If you're just talking about me then fair enough - the problems I've publicly identified in TLS are pretty much edge cases. Thomas on the other hand has a pretty good track record if you'd care to check.

xnull · on Nov 11, 2014

> whole explanation boils down to "managed languages are more complex, therefore worse."

I hope that's not what I said...

> Please point me to the specific native features which mitigate timing attacks.

How am I supposed to implement bitslicing to vectorize operations in Java? I can't. Fine grained control of code is important for implementations of ciphers that are both fast and side-channel free. Fine grained control isn't something Java can give you, by definition.

Take the 'countermeasures' section of 'Efficient Cache Attacks on AES, and Countermeasures' (http://www.cs.tau.ac.il/~tromer/papers/cache-joc-20090619.pd...).

I count exactly two countermeasures that apply to high level languages. Of the first they say "We conclude that overall, this approach (by itself) is of very limited value" and of the second "beside the practical difficulties in implementing this, it means that all encryptions have to be as slow as the worst case... neither of these provide protection against prime+probe/etc".

The rest of the countermeasures suggest bitslicing, use of direct calls to hardware instructions, memory alignment tricks, invocation of hardware modes (i.e. to disable caching), forcing cache ejections, normalizing cache states on interrupt processing, etc.

It is purely the case that high level languages do not offer you the flexibility and control to implement side-channel free crypto.

Crypto is brittle. High level languages are awesome for so many things. But bitslicing isn't one of them. The entire premise of high level languages is that you are freed from working directly on the innards pertinent to the specific target architecture. The entire premise of side-channel free crypto is that you need visibility and control of exactly these things.

pjmlp · on Nov 12, 2014

> How am I supposed to implement bitslicing to vectorize operations in Java?

By using unsafe (not ideal), the GPGPU bindings like Aparavi/JCuda or the future GPGPU API?

Honest question. Wondering about the possibilities.

> It is purely the case that high level languages do not offer you the flexibility and control to implement side-channel free crypto.

I would say Ada is an high level language that offers C and C++ flexibility, while being safe.

xnull · on Nov 12, 2014

The overall question is whether bindings or language features that expose direct control of the underlying architecture (such as D) can still be used to implement crypto. The answer is likely yes, though it is uncharted territory that only someone who knows what they are doing should attempt.

pjmlp · on Nov 12, 2014

Sure I don't have any experience with crypto algorithms to have a proper opinion.

richm44 · on Nov 11, 2014

Sure. The JIT's code generator isn't attempting to make code constant time, it's designed for optimising for the common case. To make the code constant time you need to be very close to the metal (and even then you really need tools like ctgrind to make sure you've got it right). A JIT isn't the right tool for this particular job (nothing wrong with them for other things of course).

Someone1234 · on Nov 11, 2014

Constant time just means that a fail and success state take the same amount of time to execute. They both go through the same level of computation regardless of if you know from the first instruction that it will eventually fail.

For example:

        const string password = "password";
        static bool isAllowed(string code)
        {
            if (code.Length != password.Length)
                return false;

            for (int x = 0; x < password.Length; x++)
            {
                if (code[x] != password[x])
                    return false;
            }
            return true;
        }

Is not constant time because the failure state returns sooner than the success state.

        const string password = "password";
        static bool isAllowed(string codex)
        {
            bool allowed = true;
            char[] code = new char[Math.Max(password.Length, codex.Length)];
            codex.CopyTo(0, code, 0, codex.Length);   

            for (int x = 0; x < password.Length; x++)
            {
                if (code[x] != password[x])
                    allowed = false;
            }
            return allowed;
        }

This is an imperfect constant time function as both states (failure/success) return near after the same amount of time (although I fully admit that it might be possible to impose the length of the password constant).

richm44 · on Nov 11, 2014

And the run-time of a managed language is 100% able to change all the timing of that if it's smart enough. It provides a guarantee of outcome not of timing. There are examples of unexpected optimisations even from static compilers such as gcc (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 for example). There's nothing stopping the jvm from bailing out early even in your second example. If you'd done something like xoring all the bytes together then checking the result then maybe we'd be approaching something that it's unlike a jit could handle, but the code as is seems ripe for a decent optimiser to me. And remember that according to the spec you're now coding for a perfect optimiser, not just a decent one...

tptacek · on Nov 11, 2014

The canonical constant-time string compare function simply accumulates the XORs of each byte of the string:

http://codahale.com/a-lesson-in-timing-attacks/

TheLoneWolfling · on Nov 12, 2014

And that's still easily able to be optimized to non-constant time by a "sufficiently evil compiler".

e12e · on Nov 12, 2014

The typical timing attack in this case, would be leaking password length (the time taken in the for-loop), no? Then there's the possibility of assignment (allowed=false) taking enough time that guessing with, say "00000" and "x00000" might allow one to verify that password starts with x (and then build up to the full password).

But more fundamentally, what is string.Length? Are these Pascal-style strings, or is that a call to a method that walks the string (O(n))? Those kind of issues (along with the full nature of the code path taken by something like assignment and memory allocation) are abundantly more clear in assembler (or byte code, assuming a predictable vm. But a vm likely isn't -- as far as I know it would at least never be more predictable than machine code).

MagicWishMonkey · on Nov 11, 2014

If anything running code in a managed environment would be less vulnerable to timing attacks, given the non-deterministic nature of the garbage collector.

bmm6o · on Nov 12, 2014

If there's timing information leaked by the implementation then non-deterministic interference by things like GC pauses (CPU load, network traffic, etc) are noise that raises the work effort of the attack but does not in general make it impossible. This is why "introduce a random sleep" is a terrible defense to a timing attack. Statistics doesn't care about the cause of the variability, if there are samples coming from different populations it can detect that.

jamesaguilar · on Nov 12, 2014

> terrible defense

In what sense? In the sense of not mathematically fixing the problem, or in the sense of leaving it feasible to exploit in reality?

bmm6o · on Nov 12, 2014

Both, unless I don't understand the distinction you're trying to make.

The problem is that the law of large numbers is on the attacker's side. If the attacker gets N tries his statistical power goes as sqrt(N), which means that to stay safe the variance in your random delay has to be large enough to cover that. That is, if d is the timing difference between the slow path and the fast path and the attacker gets N tries, the variance in your random delay has to be on the order of d sqrt(N). This is huge even for modest values of N.

Someone1234 · on Nov 11, 2014

I'm not sure I agree with that either.

I'd argue they're both as bad as one another. Since GC is non-deterministic it means you just need more cycles for an accurate result (and plus you're already having to ignore other sources of latency, like network, disk IO, OS lock contention, etc).

Timing attacks are generally a coding problems, both a JIT-ed managed codebase and a native block of code can contain them.

Someone1234 · on Nov 11, 2014

Exactly the reason why I would seriously contemplate running an entire server (OS upon up) based purely on managed code. For relatively slow things like web-traffic it would likely suffice.

Let's hope Midori is more than just a rumour. Singularity was pretty darn good for a research OS.

dmix · on Nov 11, 2014

Well, if you patch a linux system with GRSecurity buffer overflows and the like are much harder to exploit. I'm still amazed it's not more popular than it is. https://grsecurity.net/

mike_hearn · on Nov 11, 2014

I was hearing about Midori years ago, since then nothing. Singularity is clearly a shelved project at this point. Shame: as you say, it was pretty amazing work. Every time I read a kernel CVE I think .... sigh. If only Singularity had gone further.

pjmlp · on Nov 12, 2014

There was a project to build a Linux userland in Ada, but the project died off.

diminoten · on Nov 11, 2014

How performant is that setup, out of curiosity? I've always assumed the tradeoff here is a safe language is slower, and slow crypto is no bueno.

bhauer · on Nov 11, 2014

We (TechEmpower) have run preliminary tests of JSSE in consideration of a future round of our framework benchmarks project. Although we do not yet have any SSL tests in our project, our preliminary findings were that JSSE was considerably higher-performance than we had been lead to believe by popular opinion. I don't have the data in front of me, but using JSSE in lieu of OpenSSL did not affect request-per-second results sufficiently to make me worry about using JSSE.

Since then, my chief concern with JSSE has been simply not knowing how solid it is from a correctness of the algorithms perspective, but that is something I'm not well versed in. In other words, my concern was just one of uncertainty. Provided a credible security analysis suggests JSSE is just as secure (if not moreso) than alternatives in unsafe languages, I will be confident deploying future apps on JSSE.

pjmlp · on Nov 11, 2014

> I've always assumed the tradeoff here is a safe language is slower, and slow crypto is no bueno.

Back in the day Extended Pascal, Modula-2,Mesa... and C compilers had similar code quality.

If today compilers for safer languages are slower than C compilers, it is mainly a consequence of compiler vendors focusing in improving C optimizers.

Someone1234 · on Nov 11, 2014

At runtime modern Java is around 10x slower than modern C++ (assuming both are fully optimised and the benchmark is relatively unbiased).

That isn't significant on modern hardware relative to other bottlenecks (network, IO, etc). Plus people keep adding additional security layers between C/C++ and the CPU which eat away at some of its advantages (e.g. Docker containers, virtual machines, exploit detection libraries, etc).

Java speed matters in certain situations and for certain tasks. For example I wouldn't rewrite an SQL database into Java since performance could definitely be impactful there. But realistically CPU times are such a tiny minority of latency that it stopped being relevant a very long time ago.

mike_hearn · on Nov 11, 2014

At runtime modern Java is around 10x slower than modern C++ (assuming both are fully optimised and the benchmark is relatively unbiased).

I don't think you can pick a single number like that, it depends very much on what the code is doing.

You probably got this 10x figure from a Google paper published a few years ago (https://days2011.scala-lang.org/sites/days2011/files/ws3-1-H...). Redux: the default out of the box 32 bit JVM (in 2011) was 12x slower than the same algorithm implemented in Java. But with a few simple GC flag changes, it became 3.7x slower. And then the Java version had a few further simple optimisations applied to the code and it became as fast as the original C++ version.

Meanwhile the C++ version was optimised again and became around 3x to 5x faster, but that version relied heavily on Google proprietary data structure code and could not be open sourced (!). So for most programmers on most projects, it seems likely to be a wash.

Meanwhile an alternative benchmark that reimplemented a non-trivial C++ program in Java found it became 1.09 to 1.5 times slower, but that was with Java 6 which is now two generations out of date:

http://www.best-of-robotics.org/pages/publications/gherardi1...

It would be interesting to get more recent benchmarks with the latest JVMs and C++ compilers.

TheLoneWolfling · on Nov 12, 2014

As you say, it depends very much on what the code is doing.

In particular, anything requiring manipulating unsigned values all over the place will be a lot slower.

Also, see this[1] for talk about JGit. An actual project with direct comparisons.

[1] http://marc.info/?l=git&m=124111702609723

pjmlp · on Nov 12, 2014

Those comparisons will be again out of date when Java 9+ comes out with value types, official unsafe package and lots of other goodies for performance being worked on.

mike_hearn · on Nov 12, 2014

I'm not aware of any intent on adding unsigned types in Java 9 though. The lack of them is a real PITA, for sure. Not only for performance but it's incredibly easy to forget a & 0xFF here or there and introduce bugs.

The MMAP thing is also a good point. I believe HotSpot will compile the get methods on the MappedByteBuffer down to raw access so it shouldn't matter much for performance, in theory, but the code is still damned ugly. I never understood why they can't expose it as a byte[].

pjmlp · on Nov 12, 2014

> I'm not aware of any intent on adding unsigned types in Java 9 though.

You kind of have them in Java 8, but not as primitive type.

https://blogs.oracle.com/darcy/entry/unsigned_api

I also think a very least having ubyte would be quite nice.

Apparently the reason behind the decision has to do with overflows and underflows in unsigned arithmetic.

Hotspot is just one of many Java native compilers out there , so it is better not to rely on what Hotspot does.

On the other hand, you can make use of JIT Watch or Solaris Studio debugging to see what Assembly is being generated.

> I never understood why they can't expose it as a byte[].

Maybe the official unsafe class will make this better.

0x0 · on Nov 11, 2014

Do you have a source for that 10x number?

I don't understand why a just-in-time java compiler would produce slower code than an ahead-of-time c++ compiler, especially since the java compiler has much more information available (exact cpu revision, cache and ram sizes, and hot spot profiling).

mike_hearn · on Nov 11, 2014

Java has certain qualities that make it sometimes harder to optimise, which is one reason it relies so heavily on profile guided optimisation. For example, it's very vulnerable to GC stalls if escape analysis can't identify all your variables as eligible for stack allocation. Escape analysis is complex and a little fragile, from what I've heard (I never tested it myself).

pjmlp · on Nov 12, 2014

> Escape analysis is complex and a little fragile, from what I've heard (I never tested it myself).

Depends which JVM you are talking about.

For example Graal is more aggressive than Hotspot in selecting variables for escape analysis.

Each JVM (IBM, HP, Aonix, ...) has different quality levels in escape analysis.

_r5wf · on Nov 11, 2014

That is totally wrong. I dealt with a lot of code from numerical computing to general data structures and I have never seen C++ code runs 10x faster than current JVMs. Heck not even with SIMD. From what I see for actual applications what you get is at best x1.5 speed up and a lot of agony on the C++ part.

throwawayaway · on Nov 12, 2014

time ./hello

one in c++, one in java

10x is giving it too much credit.

pluies_public · on Nov 12, 2014

That example will be horribly skewed by the JVM startup time, which is a known problem, but doesn't affect your program once it's started. It's pretty meaningless to compare programs that short, eg. scripts because noone sane would write them in either C++ or Java.

throwawayaway · on Nov 12, 2014

If you were to reimplement Unix in Java: what would happen to the paradigm of small utilities that do one job well, that you combine into something greater than the sum of its parts? I think you'll find a lot of short programs are written in C++.

Java wouldn't work very well!

mike_hearn · on Nov 12, 2014

Um, if you reimplemented UNIX in Java you'd probably do a much better job than the UNIX creators were able to do.

In particular, running a program from bash would simply classload it into the existing VM, not invoke a whole separate VM, and then it'd be more or less instant, except you'd have the potential for much more flexible APIs and combinations of tools. Look at PowerShell for an example.

throwawayaway · on Nov 12, 2014

I don't have the technical chops to disagree with you. But rest assured I would not do a better job. Last time I checked java programs do not like operating within the same VM[0]. Last time I looked at it, it was to get a bunch of people running Eclipse remotely from the same JVM instance on a huge box in 2012 or something. It was a no-goer, but some experimental JVM's claimed to support it. [1]

Powershell is a terrible example for the performance point I am trying to make, whatever about its flexibility.

[0] http://www.javalobby.org/java/forums/t72620.html

[0] http://stackoverflow.com/a/13496610

[1]http://www.ibm.com/developerworks/library/j-multitenant-java... (this now looks less experimental but I'm willing to bet >1 instance of eclipse, or bash for that matter would not work very well).

I'm not sure how this works: http://www.excelsiorjet.com/

but it might help to alleviate the problem even though it still has a jvm.

https://en.wikipedia.org/wiki/Java_Classloader

looking at what you mention now, can a program classload itself? seems not: A class with a given name can only be loaded once by a given classloader.

mike_hearn · on Nov 12, 2014

In practice big GUI apps like Eclipse are not intended to share a single VM with lots of other things, let alone multiple instances of itself. If people wanted to write tools that did that (i.e. if it was common) then they could, but it's not enforced. For example even if at the Java level you can separate stuff out, native code to handle the GUI framework might not be expecting it.

throwawayaway · on Nov 12, 2014

I know they weren't written to do so obviously but you can see how it's a related example. You can also understand why if it was the sort of thing that could work it would be worthwhile, what with the amount of RAM each of the Eclipse users had devoted to its (in theory duplicated per user) JVM.

I doubt you would even have to go as far as native extensions for GUI before you start running into problems even though the programs are written in managed code.

Can you better explain how in practice Java programs could share a VM in replacing a typical Unix bash environment/userland?

Would they have to use the special IBM JVM? Would bash have to contain grep as a class? e.g.

machine:~$ grep include Source | grep -v 32 | grep -v 16

Reading about "JAR hell" I really don't think it would work very well.

Interestingly in the IBM link provided earlier they load up substantial non GUI servers such as Tomcat, Jetty and JRuby and achieve a startup time that is twice as fast.

They also have hello world:

Hello World Print "HelloWorld" and then sleep

Multi-tenant JVM: 309

Hand-tuned: 73

Default: 63

Improvement with multitenant: 4.2X to 4.9X

Even with this I think a C++ version would eat it for breakfast. It would be interesting to find out what would happen if a Single JVM were loaded on boot, all programs were loaded into that and compare the JVM based Unix on those terms with e.g. Solaris or something.

tshtf · on Nov 11, 2014

[Citation required]

Someone1234 · on Nov 11, 2014

mike_hearn beat me to it. See his post for a citation (and some other numbers).

_r5wf · on Nov 11, 2014

And you stood corrected.

frankchn · on Nov 11, 2014

May I ask why not? I would personally prefer slower (if safer) crypto over fast but potentially vulnerable crypto.

mike_hearn · on Nov 11, 2014

The website in question doesn't get much traffic at the moment, so performance is not an issue.

Modern JVMs can use AES-NI hardware instructions when available, in some cases at least, so I'd imagine it's not too awful. But I honestly don't know. Benchmarks would be interesting.

umanwizard · on Nov 11, 2014

Does it matter? Web servers are normally not CPU-bound.

mike_hearn · on Nov 11, 2014

One of the reasons websites have been slow to roll out SSL for everything is CPU load, so it's a legitimate concern for a high traffic website. However, most of the ways SSL was speeded up involved algorithmic / protocol level improvements and not micro-optimisations of the code. So I'm not sure that a safe language has a permanent disadvantage here.

_ondq · on Nov 11, 2014

This was possibly true 10-20 years ago (SSL is that old). It is no longer reasonable and is usually just an excuse.

That said, in my experience, native SSL/TLS within IIS is incredibly slow/inefficient. Running on 2012R2 (so a modern version), terminating SSL at IIS resulted in me being able to trivially peg the server's CPU using Apache Ab on my laptop. So clearly you need a good implementation.

lazyjones · on Nov 12, 2014

Another reason was/is that handshakes add noticeable latency (because several packets are sent in both directions), so web browsing experience and consequently, retention and conversion rates will suffer.

The worst barrier was probably the cost and hassle of obtaining browser-accepted certificates though.

emillon · on Nov 11, 2014

Unikernels can be a nice solution for this: it's possible to have a termination proxy mirage kernel sitting directly on Xen that only does TLS using ocaml-tls. It does not support everything either but it's coming.

dsl · on Nov 11, 2014

Two new Xen vulnerabilities will be dropping soon.

mpyne · on Nov 11, 2014

No acknowledgement of the source, either in the Technet article or the security bulletin's acknowledgements. I wonder who the private source must be that would ask for anonymity?

stusmall · on Nov 11, 2014

Here[1] I saw that it was from an internal audit.

[1] http://blogs.technet.com/b/srd/archive/2014/11/11/assessing-...

EDIT: Added a better source.

codys · on Nov 11, 2014

I'm interested in knowing if this affects Windows XP. From the page:

> Other versions or editions are either past their support life cycle or are not affected

And XP is unlisted. Seems to imply that it could be either one.

yuhong · on Nov 11, 2014

Well, Server 2003 is very similar to WinXP and I think http://support.microsoft.com/kb/894199 lists a WEPOS/POSReady 2009 version.

ams6110 · on Nov 12, 2014

XP is not supported. Why would they make a statement one way or the other?

hueving · on Nov 12, 2014

To face the reality that there are a ton of XP deployments. It would be in Microsoft's interest to at least notify them of the problem and even incentivize them to upgrade.

tracker1 · on Nov 12, 2014

Well, XP likely has the same issues in their implementation, given similar origins, especially for older protocols. XP requires SP3 for TLS1 iirc, and doesn't support > TLS 1.0, let alone 1.1 or 1.2 (current).

There are other issues with browsers (namely that a lot of XP users are still using IE8, which has a host of other issues).

The project I am now on isn't supporting IE8, we're going to load some HTML5/ES6 shims, and a notice to users that it may not work, but given how poorly MS's VMs for testing IE8 on XP are, it's really a non-starter. IE8 is about 3-4% of our current traffic, which will likely be displaced by mobile traffic once our site/app is no longer mobile hostile.

I wouldn't consider XP a viable OS at this point, and many users would be better off with a more recent ubuntu, and wine.

yuhong · on Nov 11, 2014

Wonder why didn't they add GCM with ECDHE even though it is already in Win10 preview. I considered it strange that they added GCM with DHE in the first place when all the rest of the suites are ECDHE.

mike_hearn · on Nov 11, 2014

From the article:

Does this update contain any additional security-related changes to functionality?

Yes. In addition to the changes that are listed in the Vulnerability Information section of this bulletin, this update includes changes to available TLS cipher suites. This update includes new TLS cipher suites that offer more robust encryption to protect customer information. These new cipher suites all operate in Galois/counter mode (GCM), and two of them offer perfect forward secrecy (PFS) by using DHE key exchange together with RSA authentication.

Perseids · on Nov 11, 2014

I don't see the relevance of the quote. It only states the changes and not why other changes were not made.

higherpurpose · on Nov 11, 2014

Too bad they didn't add support for ChaCha20 and Poly1305

https://www.imperialviolet.org/2013/10/07/chacha20.html

Also, did they even add support for Curve25519? Or are they still forcing us all to trust the (almost certainly) tainted NIST curves?

Strom · on Nov 12, 2014

While ChaCha20/Poly1305/Curve25519 are nice, they aren't part of standard TLS, and even OpenSSL doesn't support them.

m0dest · on Nov 12, 2014

Agree, this is super strange. No one wants to use DHE because of the performance impact.

Windows 7 and later do already support ECDHE + GCM, but only when combined with ECDSA. In practice, nobody can use ECDSA because old clients still need RSA certificates.

So we continue to wait for TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, which is clearly the cipher suite that everyone wants. Years later, still not available for Windows Server.

servowire · on Nov 12, 2014

The attack vector is a bit vague, and I'm not sure how to bring this news to some of my clients.

Is the exploit only for when running services on/to the internet (IIS, Exchange webmail, etc - ) , or is visiting an https (TLS) website on and end-user enough to make the exploit happen (even in Firefox/Chrome and behind a tradional proxy server).

Sadly Microsoft does not explain the exact parameters that make this exploit tick - this makes risk assessment hard.

el_duderino · on Nov 12, 2014

Indeed it does make it tricky, but I think they purposely left out some details for the time being.

See more: http://adi.is/winshock.txt

dmix · on Nov 11, 2014

Triggers the tptacek bat signal (or other kind infosec people)

How worried should people reasonably be?

jeffmcjunkin · on Nov 11, 2014

I'm not tptacek, but right now I think the answer is: "We don't know."

There isn't an acknowledged proof-of-concept, so we're not sure that it's exploitable. It hasn't been made clear whether it's wormable, either.

My bet is it will affect XP if it's exploitable, but only for those who added IIS (not default, and not terribly common). It will likely remain unpatched forever, as Microsoft is unlikely to send a patch to an "unsupported" OS again, like they did with the Internet Explorer 0-day [0]

[0]: http://blogs.technet.com/b/msrc/archive/2014/05/01/out-of-ba...

userbinator · on Nov 12, 2014

It will likely remain unpatched forever

By Microsoft, but I'd bet someone else is definitely going to fix it and distribute a patch. If the amount of effort put forth by the Windows 98SE community in making unofficial patches that let much newer applications work is any indication (look up KernelEx and "98SE unofficial service pack"), XP is going to enjoy an even larger community of unofficial support.

cesarb · on Nov 12, 2014

> exploitable, but only for those who added IIS

If I recall correctly, SQL Slammer also targeted a server component, but the vulnerable component was also present on some desktop systems, which were then affected.

It wouldn't surprise me if for instance some printer driver listening on a high port also uses the schannel component, and is thus vulnerable.

tptacek · on Nov 11, 2014

I'unno. Looks pretty bad?

willvarfar · on Nov 11, 2014

Can you go comment in the STARTTLS thread too please? ;)

AlyssaRowan · on Nov 12, 2014

It is Very Bad Indeed™. It is exploitable to get RCE, so actually worse in impact than both Heartbleed and goto fail. (Other lesser, but still very serious effects like MITM are also implicated.)

IBM reported, MS did code review it seems, MS knew about some of these issues for ~6 months.

Patch immediately, people (like me) are running bindiff/etc and a public exploit won't be too far behind.

Not sure how far back it goes yet. All the way? The changes cover code going all the way back to the first SChannel code push, I think. (If XP is exploitable, this may be the XP killing vuln we've all been waiting for.)

swartkrans · on Nov 11, 2014

Well it's patched now, but it didn't affect you unless you were running a "Windows Server" although all recent operating systems were affected. If your Windows machine is behind your home router and you were not forwarding ports to it you're probably fine. I doubt this vulnerability was known well enough that enough people were scanning for vulnerable IPs to exploit them.

xnull · on Nov 11, 2014

The window from disclosure of patches to duplication is narrowing and it appears from the bulletin that client connections are affected as well. Furthermore any computer you take anywhere outside your home router (and can you really trust your home router as security boundary nowdays?!) will be easy to manipulate into an SChannel connection. Inside your home network, clients are still vulnerable to attack - any javascript/flash ad/referer can point a computer behind a router at an attacker server and serve up malicious SChannel packets. That is to say your home computer can be attacked on outgoing connections which your router will be happy to allow.

It's very serious. Patch immediately.

ars · on Nov 11, 2014

Does this also affect firefox on XP? Does firefox use the Windows TLS library, or does it have its own?

xnull · on Nov 11, 2014

It has its own ("Network Security Services" or NSS).

But that's not a reason to use Firefox on XP. ;)

benjaminRRR · on Nov 12, 2014

For anyone that runs any web-facing IIS machines this has to be treated as a DEFCON 1 situation.

Robin_Message · on Nov 12, 2014

We upgraded to this, only to find it activates some new encryption modes (4 new GCM suites) that seem to cause RST packets when used. Anyone else seen that issue?

(Technical details: If the client offer one of the suites, the server is accepting it in the ServerHello, but then RSTing the connection after the client sends their encrypted handshake, and the event log says "none of the cipher suites supported by the client application are supported by the server". Browser and curl don't use that suite, but Amazon ELB does.)

0x0 · on Nov 11, 2014

Is this the beginning of another sasser/codered worm?

executive · on Nov 11, 2014

could allow? or allows?

13 · on Nov 11, 2014

Go with the assumption that it does.

aluhut · on Nov 11, 2014

4 hours for 5MB? Wow...