After Heartbleed, I decided to run a Java servlet container as my direct web server so I can use JSSE rather than OpenSSL or some other SSL stack written in C. It seems to me that if you can, an SSL stack written in a safe language is not a bad idea. JSSE is pretty modern though doesn't support every possible feature (it's missing OCSP stapling at the moment). But it can do forward secrecy and AES-GCM. I get an A- from the Qualys test: for some reason PFS doesn't work with IE and it doesn't like that my cert has a SHA1 based signature (I'll go get my cert reissued at some point). Oh and the SCSV fallback hack is missing. Otherwise it's doing OK. And ... no buffer overflows.
You do realise that JSSE has lots of timing attacks etc. and that's pretty much unfixable in managed code? It's also had numerous bugs in it's SSL/TLS implementation (mainly because no one uses it) such as mishandling zero length extensions (used for flagging support for a feature), failing to handle DH params that aren't a multiple of 64 bits (no sane reason) etc.
Yes, I know there have been timing attacks, and the ones I know about were fixed. So I'm not sure they're pretty much unfixable.
Regardless, I am more afraid of buffer overflows and general memory management errors than I am of timing attacks. Heartbleed was orders of magnitude easier to exploit than (say) the Bleichenbacher attacks that JSSE has been vulnerable to.
I'm saying they're unfixable because there's no way to tell the JIT to create constant-time code. Certainly Christopher Meyer has said that he's reported some that remain private since they've not been fixed.
Heartbleed was certainly easier to exploit, than a timing attack - no disagreement there. But the Java SSL stack is pretty flakey and largely seems to be unused. It has lots of interoperability problems with other implementations and generally seems immature. It's not something I'd trust in production myself.
Right - it is looking more and more like truly constant time code can only be obtained by using hand written assembly or hardware. In that eventuality, I guess the core crypto code in the JVM will have to be changed to use hand written assembly as well, however, SSL is huge and most of it doesn't need to be constant time (certificate parsing, managing network buffers, message parsing etc). That is often where the bugs creep in.
Going with JSSE does have the downside that it's less widely used. Browser makers don't patch it with the latest gizmos like they do with OpenSSL. However, it's at least got a full time development team (unlike OpenSSL until very recently), and ultimately if there are odd conformance bugs lurking the only way they'll be shaken out is by people using it for real. Passing the Qualys test is a good start, but for now my little website is ideal because it is unlikely to ever be popular enough to have serious scaling issues and I can tolerate some incompatibility with odd clients.
i.e. get the character to compare mod the length, so you wrap around if the length of the two strings is not the same.
Unfortunately the mod operator does not take a constant time on a CPU! If the result is evenly divisible it's faster.
So even in assembly you might not be able to protect from a timing attack. Even if you check your current CPU a new one might be different.
(In this case a better way to do the compare is if the strings are not the same length, compare the attackers string against itself, and then return false.)
For this case, could one just introduce another mod operation which produces result + 1? In this case:
($i + 1) % $safeLen
Even so, I guess the only way to eliminate timing attacks would be to write a set of primitive operations (XOR, AND, OR, ORD,...) which are constant-time (on the supported hardware platform) and only use those wherever the private information is handled. Is this possible to do? Are there any attempts at this?
(In this case a better way to do the compare is if the strings are not the same length, compare the attackers string against itself, and then return false.)
Is there a reason for using the attacker's string rather than the valid string? Edit: it doesn't seem to me that using the valid string would leak its length unless the attacker knew quite a lot about the target machine.
> Edit: it doesn't seem to me that using the valid string would leak its length unless the attacker knew quite a lot about the target machine.
It sounds like you are skeptical that timing attacks can be practically exploited. This is a reasonable skepticism that most people go through on initially learning about them. Unfortunately, there are in fact working exploits for this sort of thing.
No, I definitely realize the seriousness of timing attacks. I'm just curious about this particular implementation detail. I would have guessed that using the attacker's length would reveal more about the system than using a fixed length (unless the length of the valid string changed frequently). I'd like to understand why that isn't the case.
Ah, I see, you're assuming the "valid" string is fixed-width across all requests. I guess in that specific case you might be right. But it's easy for that not to be the case: e.g. if there are several different resources an attacker can request that have different-length strings. For example, an attacker might scan for users with short (=bad) passwords by comparing the time it takes to complete an authentication check against various users. So the safe thing to go is to go with the attacker's string.
How long it takes to do the comparison reveals the length of the string being checked, so you have to check the string which the attacker knows the length of already.
Question: if you know the attacker doesn't have local access, is it sufficient to simply introduce random response delays to mitigate JVM-JIT level timing differences? Even if you only delayed responses by a few hundred micros on average (with randomness), I imagine all but the smallest amount of fast-path/slow-path entropy would be lost in the noise.
It's pretty easy to filter out the randomness using statistics.
As I understand it, the correct thing to do is to derive a sleep time from hashing the request content, along with some secret. This makes the delay "random" from the attackers point of view, but still deterministic and therefore impossible to filter out.
Note: I am not a security person, just an interested bystander. Take this half-remembered advice with a pinch of salt.
That won't work if there's an irrelevant parameter in the request that can be varied. Say spaces ignored somewhere or additional parameters ignored, for instance.
1. Please don't store raw passwords and compare them with strcmp().
2. Less than you'd think. e.g., the total count of all 1 and 2 and 3 digit numbers is only 11% of the count of 4 digit numbers. (And that stat gets worse with just lower case letters.) Searching all shorter passwords ends up being an insignificant amount of time compared to searching all correctly sized passwords.
"Right - it is looking more and more like truly constant time code can only be obtained by using hand written assembly or hardware. In that eventuality, I guess the core crypto code in the JVM will have to be changed to use hand written assembly as well, however, SSL is huge and most of it doesn't need to be constant time (certificate parsing, managing network buffers, message parsing etc). That is often where the bugs creep in."
I completely agree with you there. X.509 is a nightmare and it's not an area where constant time matters at all. Being resistant to buffer overruns and general logic errors is much more important there.
X.509 is a nightmare because it is complex, and the political infrastructure is vulnerable.
There is no reason ciphers like djb's that use data-independent code paths couldn't be integrated into X.509 if the will was there. Totally separate issues.
> I'm saying they're unfixable because there's no way to tell the JIT to create constant-time code.
What does that even mean? If you write your fail and success states to follow the same exact code paths (i.e. no branches, breaks, returns, or similar for failed) then you've created a "constant time" function.
The inputted parameters are dynamic so the JIT-compiler cannot optimise code away, and even if it could it would do so equally for both the failed or success states.
> Certainly Christopher Meyer has said that he's reported some that remain private since they've not been fixed.
I tried Googling that but nothing. Can you link whatever it is you're talking about?
He's talking about a couple of recent very clever timing attacks on JSSE. However they are not comparable to Heartbleed or this SChannel overflow in severity at all:
(1) They require you to have a pre-recorded SSL session you're trying to crack. Random script kiddies won't find it useful.
(2) They require sending a lot of traffic to the server to try and break that single recording (you can get caught quite easily!)
(3) They don't reveal the private key, only per connection premaster secrets (if I understood correctly).
Basically, the first attack was possible because JSSE returned "internal server error" when faced with a particular kind of malformed packet instead of "bad padding" - this binary yes/no thing was enough to allow the attacker to divine some internal state and with enough queries retrieve the PMS. The second was a similar trick but instead of observing a different error code, it exploited the fact that internally the code was throwing an exception and this made a difference of some microseconds in the response. Over a LAN, with enough queries, that was enough to eventually reveal the PMS. However when running over a much noisier environment like the internet, number of queries required goes up quite a bit. I believe, they did not test that.
As I said before, they were both fixed, and I think Rich now agrees with me that they were fixed. There may be others that would require compiler support or hand coding of assembly to fix. Certainly there's nothing in the design of Java or the JVM that makes that impossible however. Stuff handling key material is only a small part of the overall SSL picture.
Yeah, like I said in my other comment, I'd misread which issues have been fixed. I still stand by point that for constant time you need to be close to the metal but I think we're in agreement there anyway.
I think using native code for the low-level implementation of the ciphers, padding etc. would be a good move from a security point of view and is the only way to get constant time implementation. As you say, there's nothing that prevents that in the design of Java.
Sorry, I realised I'd failed to address your other point:
"What does that even mean? If you write your fail and success states to follow the same exact code paths (i.e. no branches, breaks, returns, or similar for failed) then you've created a "constant time" function."
Well, if you've got a JIT that is analysing which code actually executes, and no control over the optimisations then how do you achieve this? If the hotpath is normally the success condition for example then the JIT will optimise that more. The rare and more dangerous condition will be optimised less. If this happens of course will depend on the engine, but you can't control that without working at a much lower level than managed code offers.
This is rubbish. If I use custom assembler then that's what gets run. Even if I use C code then I know what the result will be since I can actually check.
> The paths should be identical for both failure and success states. That's fundamentally how you "fix" timing attacks.
That's certainly the ideal. But it's impossible - one result will succeed the other won't. The aim is to make sure both take the same time which is an achievable goal, however to do this you need to know what will execute. You might have that guarantee in practice on a particular JVM for example, but the whole point of using something like a JIT is that it will be smart about optimising stuff based on what actually happens. That's normally great, but this is a situation where all that needed is predictability not performance.
> This is rubbish. If I use custom assembler then that's what gets run. Even if I use C code then I know what the result will be since I can actually check.
It is "rubbish?" Nobody does that. Most crypto libraries are written in C or C++.
You can pre-JIT (AOT) managed code and check there too.
> That's certainly the ideal. But it's impossible - one result will succeed the other won't.
It is absolutely possible with high coding standards. Keep in mind you only have to write "correct" code in sections which have access to things like crypto keys (or things that are derived from similar), since that's what at risk with timing attacks.
> You might have that guarantee in practice on a particular JVM for example, but the whole point of using something like a JIT is that it will be smart about optimising stuff based on what actually happens.
If the code paths are identical (failure and success) what is it optimising out exactly?
> It is "rubbish?" Nobody does that. Most crypto libraries are written in C or C++.
> You can pre-JIT (AOT) managed code and check there too.
Take a look. You'll see that for example openssl uses perl to generate assembly, and nettle also uses native code. This is needed if you want to use things like the AES instructions on modern CPUs.
If the code paths are identical (failure and success) what is it optimising out exactly?
JITs can create separate versions of a function for different inputs. A good JIT can turn one general case function into multiple optimized special cases using techniques that include eliminating pseudoconstants (variables that always have the same or a small number of values) and skipping computation of never-referenced results. Even normal compilers can do a lot of that.
It is not a good idea to act on the premise that things like the BB e=3 or BB "million message attack" are hard to exploit. They aren't.
The e=3 vulnerability (or its more modern "BERserk" variant) are arguably much easier to exploit than Heartbleed. They allow you to --- offline, in advance --- make your own valid CA certificates.
> You do realise that JSSE has lots of timing attacks etc. and that's pretty much unfixable in managed code?
That makes no sense at all. Timing attacks aren't more or less exploitable in managed code than unmanaged code. Timing attacks are often the result of optimisations within the crypto library which inadvertently give away information, for example a loop which breaks on X != Y, instead of setting a failed = false bool and continuing to iterate through the rest of the array.
Please explain how managed code makes timing attacks more likely.
> Timing attacks are often the result of optimisations within the crypto library which inadvertently give away information, for example a loop which breaks on X != Y, instead of setting a failed = false bool and continuing to iterate through the rest of the array.
I would say this is false. Simple differences in time caused by cache line ejection in table-lookup implementations of AES provide a very strong timing attack. (http://cr.yp.to/antiforgery/cachetiming-20050414.pdf)
In RSA (and in fact DL based cryptosystems), modular exponentiation without extreme care leak tons of timing information about private exponents. 'Blinding' is one way to handle this, but performant solutions typically fiddle at the bit level and exploit CPU guards and features to minimize branch prediction/cache line/etc leaks.
In higher level languages absolute control and care of crypto implementations can not be taken and the JIT layer adds another layer of obfuscation (though I know of no attack employing that...).
The out for memory safe languages is to provide built in crypto operations that have been implemented at a lower level.
I'm curious now: how do the AES implementations nowadays avoid the timing attack explained in that paper? From what I understood, its very hard to write an efficient AES implementation without using input-dependent table lookups.
For the most part by bitslicing. Some implementations calculate the S-box explicitly using the algebraic relationships in the finite field but doing so is awfully slow.
I should add here that I met an incredibly intelligent young man named Julian from Dartmouth and doing some work with MIT who is proving with COQ and a model of a CPU that his implementation of cache lookups for (various) crypto algorithms results in exactly the same line patterns and the number of cpu ticks is similarly invariant. Some people go the extra mile.
You say it's "false" but then fail to explain why. None of your examples offer that, and your whole explanation boils down to "managed languages are more complex, therefore worse."
Please point me to the specific native features which mitigate timing attacks. Because the majority of fixes I have seen are purely in altering the libraries themselves using high level constructs to remove hot paths and make it so both failure and success state take a constant time to execute (which has nothing to do with managed/unmanaged code).
The issue isn't that native code has special features that mitigate timing attacks. It's that you can look at native code and predict its side effects more easily than you can with high-level code.
Another important difference between native code and high-level code is that timing leaks in high-level code tend to be larger. For instance, it's very difficult to exploit a memcmp timing leak in practice. But Java's string comparison, depending on your JVM, is exploitable over the Internet.
For what it's worth: I wouldn't select C over Java simply to avoid timing attacks. Side channels in JVM code are a legit concern, but not a dispositive one.
If you're just talking about me then fair enough - the problems I've publicly identified in TLS are pretty much edge cases. Thomas on the other hand has a pretty good track record if you'd care to check.
> whole explanation boils down to "managed languages are more complex, therefore worse."
I hope that's not what I said...
> Please point me to the specific native features which mitigate timing attacks.
How am I supposed to implement bitslicing to vectorize operations in Java? I can't. Fine grained control of code is important for implementations of ciphers that are both fast and side-channel free. Fine grained control isn't something Java can give you, by definition.
I count exactly two countermeasures that apply to high level languages. Of the first they say "We conclude that overall, this approach (by itself) is of very limited value" and of the second "beside the practical difficulties in implementing this, it means that all encryptions have to be as slow as the worst case... neither of these provide protection against prime+probe/etc".
The rest of the countermeasures suggest bitslicing, use of direct calls to hardware instructions, memory alignment tricks, invocation of hardware modes (i.e. to disable caching), forcing cache ejections, normalizing cache states on interrupt processing, etc.
It is purely the case that high level languages do not offer you the flexibility and control to implement side-channel free crypto.
Crypto is brittle. High level languages are awesome for so many things. But bitslicing isn't one of them. The entire premise of high level languages is that you are freed from working directly on the innards pertinent to the specific target architecture. The entire premise of side-channel free crypto is that you need visibility and control of exactly these things.
The overall question is whether bindings or language features that expose direct control of the underlying architecture (such as D) can still be used to implement crypto. The answer is likely yes, though it is uncharted territory that only someone who knows what they are doing should attempt.
Sure. The JIT's code generator isn't attempting to make code constant time, it's designed for optimising for the common case. To make the code constant time you need to be very close to the metal (and even then you really need tools like ctgrind to make sure you've got it right). A JIT isn't the right tool for this particular job (nothing wrong with them for other things of course).
Constant time just means that a fail and success state take the same amount of time to execute. They both go through the same level of computation regardless of if you know from the first instruction that it will eventually fail.
For example:
const string password = "password";
static bool isAllowed(string code)
{
if (code.Length != password.Length)
return false;
for (int x = 0; x < password.Length; x++)
{
if (code[x] != password[x])
return false;
}
return true;
}
Is not constant time because the failure state returns sooner than the success state.
const string password = "password";
static bool isAllowed(string codex)
{
bool allowed = true;
char[] code = new char[Math.Max(password.Length, codex.Length)];
codex.CopyTo(0, code, 0, codex.Length);
for (int x = 0; x < password.Length; x++)
{
if (code[x] != password[x])
allowed = false;
}
return allowed;
}
This is an imperfect constant time function as both states (failure/success) return near after the same amount of time (although I fully admit that it might be possible to impose the length of the password constant).
And the run-time of a managed language is 100% able to change all the timing of that if it's smart enough. It provides a guarantee of outcome not of timing. There are examples of unexpected optimisations even from static compilers such as gcc (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 for example). There's nothing stopping the jvm from bailing out early even in your second example. If you'd done something like xoring all the bytes together then checking the result then maybe we'd be approaching something that it's unlike a jit could handle, but the code as is seems ripe for a decent optimiser to me. And remember that according to the spec you're now coding for a perfect optimiser, not just a decent one...
The typical timing attack in this case, would be leaking password length (the time taken in the for-loop), no? Then there's the possibility of assignment (allowed=false) taking enough time that guessing with, say "00000" and "x00000" might allow one to verify that password starts with x (and then build up to the full password).
But more fundamentally, what is string.Length? Are these Pascal-style strings, or is that a call to a method that walks the string (O(n))? Those kind of issues (along with the full nature of the code path taken by something like assignment and memory allocation) are abundantly more clear in assembler (or byte code, assuming a predictable vm. But a vm likely isn't -- as far as I know it would at least never be more predictable than machine code).
If anything running code in a managed environment would be less vulnerable to timing attacks, given the non-deterministic nature of the garbage collector.
If there's timing information leaked by the implementation then non-deterministic interference by things like GC pauses (CPU load, network traffic, etc) are noise that raises the work effort of the attack but does not in general make it impossible. This is why "introduce a random sleep" is a terrible defense to a timing attack. Statistics doesn't care about the cause of the variability, if there are samples coming from different populations it can detect that.
Both, unless I don't understand the distinction you're trying to make.
The problem is that the law of large numbers is on the attacker's side. If the attacker gets N tries his statistical power goes as sqrt(N), which means that to stay safe the variance in your random delay has to be large enough to cover that. That is, if d is the timing difference between the slow path and the fast path and the attacker gets N tries, the variance in your random delay has to be on the order of d sqrt(N). This is huge even for modest values of N.
I'd argue they're both as bad as one another. Since GC is non-deterministic it means you just need more cycles for an accurate result (and plus you're already having to ignore other sources of latency, like network, disk IO, OS lock contention, etc).
Timing attacks are generally a coding problems, both a JIT-ed managed codebase and a native block of code can contain them.
Exactly the reason why I would seriously contemplate running an entire server (OS upon up) based purely on managed code. For relatively slow things like web-traffic it would likely suffice.
Let's hope Midori is more than just a rumour. Singularity was pretty darn good for a research OS.
Well, if you patch a linux system with GRSecurity buffer overflows and the like are much harder to exploit. I'm still amazed it's not more popular than it is. https://grsecurity.net/
I was hearing about Midori years ago, since then nothing. Singularity is clearly a shelved project at this point. Shame: as you say, it was pretty amazing work. Every time I read a kernel CVE I think .... sigh. If only Singularity had gone further.
We (TechEmpower) have run preliminary tests of JSSE in consideration of a future round of our framework benchmarks project. Although we do not yet have any SSL tests in our project, our preliminary findings were that JSSE was considerably higher-performance than we had been lead to believe by popular opinion. I don't have the data in front of me, but using JSSE in lieu of OpenSSL did not affect request-per-second results sufficiently to make me worry about using JSSE.
Since then, my chief concern with JSSE has been simply not knowing how solid it is from a correctness of the algorithms perspective, but that is something I'm not well versed in. In other words, my concern was just one of uncertainty. Provided a credible security analysis suggests JSSE is just as secure (if not moreso) than alternatives in unsafe languages, I will be confident deploying future apps on JSSE.
At runtime modern Java is around 10x slower than modern C++ (assuming both are fully optimised and the benchmark is relatively unbiased).
That isn't significant on modern hardware relative to other bottlenecks (network, IO, etc). Plus people keep adding additional security layers between C/C++ and the CPU which eat away at some of its advantages (e.g. Docker containers, virtual machines, exploit detection libraries, etc).
Java speed matters in certain situations and for certain tasks. For example I wouldn't rewrite an SQL database into Java since performance could definitely be impactful there. But realistically CPU times are such a tiny minority of latency that it stopped being relevant a very long time ago.
At runtime modern Java is around 10x slower than modern C++ (assuming both are fully optimised and the benchmark is relatively unbiased).
I don't think you can pick a single number like that, it depends very much on what the code is doing.
You probably got this 10x figure from a Google paper published a few years ago (https://days2011.scala-lang.org/sites/days2011/files/ws3-1-H...). Redux: the default out of the box 32 bit JVM (in 2011) was 12x slower than the same algorithm implemented in Java. But with a few simple GC flag changes, it became 3.7x slower. And then the Java version had a few further simple optimisations applied to the code and it became as fast as the original C++ version.
Meanwhile the C++ version was optimised again and became around 3x to 5x faster, but that version relied heavily on Google proprietary data structure code and could not be open sourced (!). So for most programmers on most projects, it seems likely to be a wash.
Meanwhile an alternative benchmark that reimplemented a non-trivial C++ program in Java found it became 1.09 to 1.5 times slower, but that was with Java 6 which is now two generations out of date:
Those comparisons will be again out of date when Java 9+ comes out with value types, official unsafe package and lots of other goodies for performance being worked on.
I'm not aware of any intent on adding unsigned types in Java 9 though. The lack of them is a real PITA, for sure. Not only for performance but it's incredibly easy to forget a & 0xFF here or there and introduce bugs.
The MMAP thing is also a good point. I believe HotSpot will compile the get methods on the MappedByteBuffer down to raw access so it shouldn't matter much for performance, in theory, but the code is still damned ugly. I never understood why they can't expose it as a byte[].
I don't understand why a just-in-time java compiler would produce slower code than an ahead-of-time c++ compiler, especially since the java compiler has much more information available (exact cpu revision, cache and ram sizes, and hot spot profiling).
Java has certain qualities that make it sometimes harder to optimise, which is one reason it relies so heavily on profile guided optimisation. For example, it's very vulnerable to GC stalls if escape analysis can't identify all your variables as eligible for stack allocation. Escape analysis is complex and a little fragile, from what I've heard (I never tested it myself).
That is totally wrong. I dealt with a lot of code from numerical computing to general data structures and I have never seen C++ code runs 10x faster than current JVMs. Heck not even with SIMD. From what I see for actual applications what you get is at best x1.5 speed up and a lot of agony on the C++ part.
That example will be horribly skewed by the JVM startup time, which is a known problem, but doesn't affect your program once it's started. It's pretty meaningless to compare programs that short, eg. scripts because noone sane would write them in either C++ or Java.
If you were to reimplement Unix in Java: what would happen to the paradigm of small utilities that do one job well, that you combine into something greater than the sum of its parts? I think you'll find a lot of short programs are written in C++.
Um, if you reimplemented UNIX in Java you'd probably do a much better job than the UNIX creators were able to do.
In particular, running a program from bash would simply classload it into the existing VM, not invoke a whole separate VM, and then it'd be more or less instant, except you'd have the potential for much more flexible APIs and combinations of tools. Look at PowerShell for an example.
I don't have the technical chops to disagree with you. But rest assured I would not do a better job. Last time I checked java programs do not like operating within the same VM[0]. Last time I looked at it, it was to get a bunch of people running Eclipse remotely from the same JVM instance on a huge box in 2012 or something. It was a no-goer, but some experimental JVM's claimed to support it. [1]
Powershell is a terrible example for the performance point I am trying to make, whatever about its flexibility.
In practice big GUI apps like Eclipse are not intended to share a single VM with lots of other things, let alone multiple instances of itself. If people wanted to write tools that did that (i.e. if it was common) then they could, but it's not enforced. For example even if at the Java level you can separate stuff out, native code to handle the GUI framework might not be expecting it.
I know they weren't written to do so obviously but you can see how it's a related example. You can also understand why if it was the sort of thing that could work it would be worthwhile, what with the amount of RAM each of the Eclipse users had devoted to its (in theory duplicated per user) JVM.
I doubt you would even have to go as far as native extensions for GUI before you start running into problems even though the programs are written in managed code.
Can you better explain how in practice Java programs could share a VM in replacing a typical Unix bash environment/userland?
Would they have to use the special IBM JVM? Would bash have to contain grep as a class? e.g.
Reading about "JAR hell" I really don't think it would work very well.
Interestingly in the IBM link provided earlier they load up substantial non GUI servers such as Tomcat, Jetty and JRuby and achieve a startup time that is twice as fast.
They also have hello world:
Hello World Print "HelloWorld" and then sleep
Multi-tenant JVM:
309
Hand-tuned:
73
Default:
63
Improvement with multitenant:
4.2X to 4.9X
Even with this I think a C++ version would eat it for breakfast. It would be interesting to find out what would happen if a Single JVM were loaded on boot, all programs were loaded into that and compare the JVM based Unix on those terms with e.g. Solaris or something.
The website in question doesn't get much traffic at the moment, so performance is not an issue.
Modern JVMs can use AES-NI hardware instructions when available, in some cases at least, so I'd imagine it's not too awful. But I honestly don't know. Benchmarks would be interesting.
One of the reasons websites have been slow to roll out SSL for everything is CPU load, so it's a legitimate concern for a high traffic website. However, most of the ways SSL was speeded up involved algorithmic / protocol level improvements and not micro-optimisations of the code. So I'm not sure that a safe language has a permanent disadvantage here.
This was possibly true 10-20 years ago (SSL is that old). It is no longer reasonable and is usually just an excuse.
That said, in my experience, native SSL/TLS within IIS is incredibly slow/inefficient. Running on 2012R2 (so a modern version), terminating SSL at IIS resulted in me being able to trivially peg the server's CPU using Apache Ab on my laptop. So clearly you need a good implementation.
Another reason was/is that handshakes add noticeable latency (because several packets are sent in both directions), so web browsing experience and consequently, retention and conversion rates will suffer.
The worst barrier was probably the cost and hassle of obtaining browser-accepted certificates though.
Unikernels can be a nice solution for this: it's possible to have a termination proxy mirage kernel sitting directly on Xen that only does TLS using ocaml-tls. It does not support everything either but it's coming.
No acknowledgement of the source, either in the Technet article or the security bulletin's acknowledgements. I wonder who the private source must be that would ask for anonymity?
To face the reality that there are a ton of XP deployments. It would be in Microsoft's interest to at least notify them of the problem and even incentivize them to upgrade.
Well, XP likely has the same issues in their implementation, given similar origins, especially for older protocols. XP requires SP3 for TLS1 iirc, and doesn't support > TLS 1.0, let alone 1.1 or 1.2 (current).
There are other issues with browsers (namely that a lot of XP users are still using IE8, which has a host of other issues).
The project I am now on isn't supporting IE8, we're going to load some HTML5/ES6 shims, and a notice to users that it may not work, but given how poorly MS's VMs for testing IE8 on XP are, it's really a non-starter. IE8 is about 3-4% of our current traffic, which will likely be displaced by mobile traffic once our site/app is no longer mobile hostile.
I wouldn't consider XP a viable OS at this point, and many users would be better off with a more recent ubuntu, and wine.
Wonder why didn't they add GCM with ECDHE even though it is already in Win10 preview. I considered it strange that they added GCM with DHE in the first place when all the rest of the suites are ECDHE.
Does this update contain any additional security-related changes to functionality?
Yes. In addition to the changes that are listed in the Vulnerability Information section of this bulletin, this update includes changes to available TLS cipher suites. This update includes new TLS cipher suites that offer more robust encryption to protect customer information. These new cipher suites all operate in Galois/counter mode (GCM), and two of them offer perfect forward secrecy (PFS) by using DHE key exchange together with RSA authentication.
Agree, this is super strange. No one wants to use DHE because of the performance impact.
Windows 7 and later do already support ECDHE + GCM, but only when combined with ECDSA. In practice, nobody can use ECDSA because old clients still need RSA certificates.
So we continue to wait for TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, which is clearly the cipher suite that everyone wants. Years later, still not available for Windows Server.
The attack vector is a bit vague, and I'm not sure how to bring this news to some of my clients.
Is the exploit only for when running services on/to the internet (IIS, Exchange webmail, etc - ) , or is visiting an https (TLS) website on and end-user enough to make the exploit happen (even in Firefox/Chrome and behind a tradional proxy server).
Sadly Microsoft does not explain the exact parameters that make this exploit tick - this makes risk assessment hard.
I'm not tptacek, but right now I think the answer is: "We don't know."
There isn't an acknowledged proof-of-concept, so we're not sure that it's exploitable. It hasn't been made clear whether it's wormable, either.
My bet is it will affect XP if it's exploitable, but only for those who added IIS (not default, and not terribly common). It will likely remain unpatched forever, as Microsoft is unlikely to send a patch to an "unsupported" OS again, like they did with the Internet Explorer 0-day [0]
By Microsoft, but I'd bet someone else is definitely going to fix it and distribute a patch. If the amount of effort put forth by the Windows 98SE community in making unofficial patches that let much newer applications work is any indication (look up KernelEx and "98SE unofficial service pack"), XP is going to enjoy an even larger community of unofficial support.
If I recall correctly, SQL Slammer also targeted a server component, but the vulnerable component was also present on some desktop systems, which were then affected.
It wouldn't surprise me if for instance some printer driver listening on a high port also uses the schannel component, and is thus vulnerable.
It is Very Bad Indeed™. It is exploitable to get RCE, so actually worse in impact than both Heartbleed and goto fail. (Other lesser, but still very serious effects like MITM are also implicated.)
IBM reported, MS did code review it seems, MS knew about some of these issues for ~6 months.
Patch immediately, people (like me) are running bindiff/etc and a public exploit won't be too far behind.
Not sure how far back it goes yet. All the way? The changes cover code going all the way back to the first SChannel code push, I think. (If XP is exploitable, this may be the XP killing vuln we've all been waiting for.)
Well it's patched now, but it didn't affect you unless you were running a "Windows Server" although all recent operating systems were affected. If your Windows machine is behind your home router and you were not forwarding ports to it you're probably fine. I doubt this vulnerability was known well enough that enough people were scanning for vulnerable IPs to exploit them.
The window from disclosure of patches to duplication is narrowing and it appears from the bulletin that client connections are affected as well. Furthermore any computer you take anywhere outside your home router (and can you really trust your home router as security boundary nowdays?!) will be easy to manipulate into an SChannel connection. Inside your home network, clients are still vulnerable to attack - any javascript/flash ad/referer can point a computer behind a router at an attacker server and serve up malicious SChannel packets. That is to say your home computer can be attacked on outgoing connections which your router will be happy to allow.
We upgraded to this, only to find it activates some new encryption modes (4 new GCM suites) that seem to cause RST packets when used. Anyone else seen that issue?
(Technical details: If the client offer one of the suites, the server is accepting it in the ServerHello, but then RSTing the connection after the client sends their encrypted handshake, and the event log says "none of the cipher suites supported by the client application are supported by the server". Browser and curl don't use that suite, but Amazon ELB does.)