This is interesting -- they actually manage to get greater density out of this s...

ansible · on May 6, 2015

This is interesting -- they actually manage to get greater density out of this setup than many traditional rack mount systems offer.

If they are GPU limited...

A full 4U rack of Mac Pros is 8 AMD Fire GPUs (6GB VRAM each), 256GB main RAM, 48 2.7GHz Xeon cores (using the 12-core option), and 4TB of SSD. 10G Ethernet via Thunderbolt2.

Let's set aside differences in GPU and processor performance; we're just looking at the base stats. All for about $36K USD, not including the rack itself.

An alternative is the SuperMicro 4027GR-TR:

http://www.supermicro.com/products/system/4U/4027/SYS-4027GR...

So, maxed out, you've got 8 Nvidia Tesla K80 cards (dual GPU), 1.5TB RAM, 28 2.6GHz Xeon cores, and a lot of storage (24 hot-swap bays). That's in a 4U rack too.

Call it about $13K USD for the server, and $5K per GPU. Plus a little storage, call it about $56K USD with 10G Ethernet.

The SuperMicro system is designed to be remotely managed. Each GPU has double the VRAM of the AMD Fire ones (12GB vs. 6GB).

I don't know the exact performance figures of the AMD Fire vs. the Kepler GK210, but I'm sure the Fire it isn't nearly as good. And you've got twice as many Nvidia chips on top of that.

At some point its going to get cheaper to re-write the software...

skuhn · on May 6, 2015

The Tesla K80 didn't exist when I started this project, but to do some quick math:

K80 gflop/s: 8740 2x FirePro D500 gflop/s: 3500

K80 runs about $4900 a card, whereas the entire Mac Pro (list price) is $4000. So it's 2.5x the performance at easily 2x the cost if not more.

You're right that there is a cost advantage to going with commodity server hardware, but I don't think it's as great as most people think in this particular case. It's also far from free for us to do the necessary engineering work, and not just in terms of money. It would basically mean pressing pause on feature development at a crucial time in the company's life, and that just isn't the right move.

ansible · on May 6, 2015

2x FirePro D500 gflop/s: 3500

That 3500 gflop/s for the D700? It is instead 2200 for the D500.

http://www.amd.com/en-gb/solutions/workstations/d-series

K80 runs about $4900 a card, whereas the entire Mac Pro (list price) is $4000. So it's 2.5x the performance at easily 2x the cost if not more.

The 6GB VRAM version with the D700 costs another $600 USD each.

The K80 has 12GB VRAM per GPU (24GB total per card).

If your code can use the additional memory, that is a huge difference.

Anyway, 3500 gflop/s times 8 is 28 tflop/s for the Mac Pros.

With 8 K80s, you're at 70 tflop/s. Single precision. So that's double the raw performance, and double the memory. Actual performance for a given workload? I wouldn't care to say.

I'd be concerned about thermal issues too. I wouldn't be surprised that the Mac Pro gets throttled after a while when running it hard. The kind of server you can put the K80 in usually has additional (server-grade) cooling.

I'm not disrespecting you guys, if you've got a solution that works, and makes you money, more power to you!

But I stand by my claim that at some point, it will be cheaper to rewrite the software for the render pipeline. Not this year I guess, and who knows, maybe not next year either.

skuhn · on May 6, 2015

Sorry, I do have this evaluation in a spreadsheet somewhere (except against the Tesla K20, K80 wasn't out then), but I just quickly looked up the Mac Pro specs. We do use the D500, so I should have quoted those gflops. There is a benefit to off-the-shelf GPUs, but I don't see it as a make-or-break kind of situation for imgix right now.

I agree that some day in the future, it does seem like it will make sense to bite the bullet and rewrite for Linux. It probably won't solely come down to a cost rationale though, because there are a TON of business risks involved in hitting pause on new features (or doubling team size, or some combination thereof).

Fundamentally I don't believe in doing large projects that have a best case scenario of going unnoticed by your customers (because the external behavior has not changed, unless you screwed up), unless you absolutely have to.

The real reason to migrate to Linux would have to be a combination of at least three things:

  1. Better hardware, in terms of functionality or price/performance
  2. Lower operational overhead
  3. The ability to support features or operations that we can't do any other way

Much more likely, we would adopt a hybrid approach where we still use OS X for certain things and Linux for other things.

ansible · on May 6, 2015

We do use the D500, so I should have quoted those gflops.

Well now I'm curious as to why you aren't using the D700s. The extra gflops seem like a good value to me. Approximately 60% greater GPU performance for a 15% increase in cost, everything else being equal.

But you probably have to get some work done, rather than answer random questions from the Internet. :-)

Good luck!

skuhn · on May 6, 2015

It is intriguing, and we have one D700 Mac Pro for test purposes. At the time we ordered the Pros for the prototype rack that is the subject of this article, we found that other parts of our pipeline were preventing us from taking full advantage of the increased GPU performance. So we ratcheted down to the D500.

Keep in mind that either of them offer significantly higher gflop/s per system than the best GPU ever shipped on a Mac Mini (480 vs 2200 vs 3500).

However, we have fixed bottlenecks in our pipeline as we identified them, so it is probably time to re-evaluate. I actually just had a conversation with an engineer a minute ago who is going to jump on this in the next few days. Higher throughput and better $/gflop is always the goal, just have to make sure we can actually see the improvement in practice.

skuhn · on May 8, 2015

Actually, I realized that we were both wrong on the math.

2200 gflop or 3500 gflop are the specs for just one of the Fire Pro cards. Whoops, I was writing a lot of comments that day.

So a Mac Pro with D700 GPUs has 7000 gflop/s and runs $4600 (list), whereas the Tesla K80 has 8740 gflop/s and runs $4900 or so. Since you still need a whole server to go with the K80, I stand by my thinking that it's not a great deal. We also don't need 12GB of VRAM for our use case, so that's a bit of a waste.

In Nvidia's product line, price/gflop is not at its best in their highest end cards. AWS uses the Nvidia GRID K2, for instance. You're paying a lot for the double precision performance in the Teslas, and imaging doesn't need it.

frugalmail · on May 7, 2015

> it will be cheaper to rewrite the software for the render pipeline.

You don't even have to rewrite it, Linux imagemagick + OpenCV can handle the use cases of cropping and sizing trivially. They can keep the rest of the code (device mappings and CDN related I guess) unless that was implemented using ObjectiveC (this is another thing that I would think is crazy)

skuhn · on May 7, 2015

Cropping and sizing are just two (common) operations that imgix can perform. There's a lot of other stuff as well: http://www.imgix.com/docs/reference

Not to say that it's totally impossible to do these types of operations on ImageMagick, but it wouldn't work nearly as well as our current solution does. ImageMagick is a shockingly awful tool for use in server-land for a variety of reasons, some of which are handled better in GraphicsMagick. IM was the bane of my existence at more than one previous company.

hinkley · on May 6, 2015

Most of the armchair folks on these threads haven't internalized Fred Brooks, especially regarding diminishing returns as the team size grows.

You as the server guy hiring a couple people to figure out how to squeeze another 10% value out of the system by hacking hardware is not fungible with hiring two more devs to try to avoid racking custom hardware. As if two devs could pull that feat off anyway.

exelius · on May 6, 2015

So for $56k USD you get a system that is roughly twice as fast as the 4x Mac Pro solution... but also costs twice as much? The numbers actually don't work out too badly. At least, better than I would have initially assumed. Density is really the only area where the 4x Mac Pro solution loses hands down.

ansible · on May 6, 2015

There's also local storage. The SuperMicro box can host a lot more storage locally than the Mac Pros can do easily (you'd need external Thunderbolt2 drives), and it can make sense to run RAID-10 or something to get more speed.

cmurf · on May 6, 2015

Unfortunate for many who wish otherwise, it's very clearly a market they do not remotely care about.

_dp9d · on May 6, 2015

I think the opposite is true. They made the Rack mountable xServe for almost a decade.

The fact they discontinued it shows that it's clearly not a market - customers didn't want it in enough volume to justify the product.

exelius · on May 6, 2015

It's not that there wasn't a market, it's that the Xserve wasn't sold in the way that companies that buy lots of rack mount gear handle procurement. If they certified specific configurations of commodity hardware for OS X and sold them through existing reseller channels as a new SKU, it would be much easier.

That said, there's probably not much of a market for it anymore since we've gone a few years without an OS X rack mount machine and people have found other solutions.