Three possible reasons I can think of for doing this over using PCs or Linux ser...

skuhn · on May 6, 2015

(I'm the datacenter manager at imgix, and I wrote this article)

1. Yeah, the OS X graphics pipeline is at the heart of our desire to use Macs in production. It's also pretty sweet to be able to prototype features in Quartz Composer, and use this whole ecosystem of tools that straight up don't exist on Linux.

2. I mentioned this elsewhere already, but it is actually a pretty good value. The chassis itself is not a terrible expense, and it's totally passive. It really boils down to the fact that we want to use OS X, and the Mac Pros are the best value per gflop in Apple's lineup. They're also still a good value when compared against conventional servers with GPUs, although they do have some drawbacks.

3. I would love it if they weren't little cylinders, but they do seem to handle cooling quite well. The power draw related to cooling for this rack versus a rack of conventional servers is about 1-5/th to 1/10th as much.

In terms of provisioning, we're currently using OS X Server's NetRestore functionality to deploy the OS. It's on my to-do list to replicate this functionality on Linux, which should be possible. You can supposedly make ISC DHCPd behave like a BSDP server sufficiently to interoperate with the Mac's EFI loader.

We don't generally do software updates in-place, we just reinstall to a new image. However, we have occasionally upgraded OS X versions, which can be done with CLI utilities.

sajal83 · on May 6, 2015

Why not unassemble the cylinders and re-assemble into rectangle chasis? Im sure that would give you a more dense layout.. Sure it would void warranty and resale value.. but do you really care?

JonathonW · on May 6, 2015

The whole machine's custom built to fit inside the cylindrical case... the best you could do would be to take the outer case off, and then you've just got a slightly smaller cylinder.

Electrically, everything's built around a round "central" PCB using a custom interconnect. You're not going to be able to reassemble the thing into a rectangle and still get a functioning machine (not without tons of custom design work, at least).

See https://www.ifixit.com/Teardown/Mac+Pro+Late+2013+Teardown/2...

skuhn · on May 6, 2015

This actually came up during the design phase, and it was tempting. However, you'd have to figure out how to connect the boards together, and you'd have to figure out where to put heatsinks and where to direct airflow.

Since we were able to get the Pros to the point where they effectively occupy 1U, there wasn't really any incentive to doing a disassembly style integration. Maybe if Apple announces the next Mac Pro comes as a triangle.

To your other point about the warranty and re-sale: we do care, but only a little. I budget machines to have a usable lifespan of 3 years, but the reality is that Apple hardware historically has significant value on the used market for much longer than that. So if we can recoup $500-1000 per machine after 3 years of service, that would be great.

zurn · on May 6, 2015

> The power draw related to cooling for this rack versus a rack of conventional servers is about 1-5/th to 1/10th as much.

Do you mean your Mac Pros dissipate 1/5 to 1/10th as much heat as other x86 server hardware, or is there there some other factor in play that makes your AC 5-10x more power efficient?

listic · on May 6, 2015

I understand "related to cooling" as Mac Pro's cooling in this setup is 5-10x more efficient.

skuhn · on May 6, 2015

Sorry, just some off-the-cuff math. We use Supermicro FatTwin systems for Linux stuff, and they run a lot of fans at much higher RPMs to maintain proper airflow relative to the Mac Pro design (which runs one fan at pretty low RPMs most of the time).

As a result, I'm calculating that the Mac Pros draw a lot less power for cooling purposes than the Linux systems due to their chassis design. However, serviceability and other factors are definitely superior on the Supermicro FatTwins.

rylee · on May 6, 2015

What's so good about this OS X graphics pipeline that isn't on anything else? I'm now super curious.

tlrobinson · on May 6, 2015

Core Image:

https://developer.apple.com/library/mac/documentation/Graphi...

http://en.m.wikipedia.org/wiki/Core_Image

I'm not super familiar with it or the competition, but I assume this is what they're talking about.

angersock · on May 6, 2015

So, it's basically the MESA Intel graphics pipeline?

EDIT:

For the downvoters and the unclear, the relevant bit talks about compiling exactly the instructions needed to change the image. As I understand it, this JIT recompilation of pixel shaders is effectively what was implemented in the mesa drivers for Intel chipsets.

skuhn · on May 6, 2015

Compiling the shaders is a big win, since it allows us to do almost all operations in one pass rather than multiple passes. The service is intended to function on-demand and in real time, so latency matters a lot.

kaolinite · on May 6, 2015

Thanks for replying and thanks for the article too - great read with some fantastic photography.

Really interesting to hear how you provision servers, had no idea that OS X Server came with tools for that, but it certainly makes sense. I wouldn't have thought Apple would have put much time or thought into creating tools for large deployments, but glad to hear that they have.

skuhn · on May 8, 2015

Thanks, the photography was done by our lead designer, Miguel. I am super impressed at what he's been able to capture in an environment that can easily come off as utilitarian and sterile.

He has some other work online that you might enjoy, not related to Macs or imgix: http://photos.miggi.me/

chiph · on May 6, 2015

What's the noise level like with these machines? The typical pizza-box servers aren't exactly quiet.

skuhn · on May 6, 2015

They're pretty much silent relative to datacenter stuff.

One of the goals of the next revision is to have LED power indicators (maybe plugged in to the front USB ports) or LCD panels built into the front of the chassis. Right now you actually can't tell that the rack is powered on unless you walk to the hot aisle and look at the power readouts, it's that quiet.

chiph · on May 6, 2015

Is fan failure reported through management APIs?

skuhn · on May 6, 2015

We wrote a little tool to probe SMC and graph the output, so we know CPU temp and fan speeds and whatnot. If a fan were to fail, it shows up as 0 rpm speed (in my experience thus far), so we can tell and take the host offline.

Even if you can't see when the fan itself has failed, the CPU core temp should eventually go out of the acceptable range without any forced air at all, which is also helpful to determine that hardware maintenance is required.

So far nothing has actually failed on any of our Mac Pros though. When and if that happens, the entire Pro will get swapped out as one field replaceable unit, and then put in the repair queue.

MagerValp · on May 7, 2015

BSDPy, AutoNBI, and Imagr provides a bleeding edge OS X deployment solution that runs entirely on Linux. OS images can be generated with AutoDMG, and Munki will keep them configured and updated afterwards.

Pop into ##osx-server on freenode if you want to talk to the devs.

skuhn · on May 8, 2015

Thanks, I was aware of AutoDMB and Munki, but the rest are news to me. We'll check them out.

frugalmail · on May 6, 2015

> It really boils down to the fact that we want to use OS X,

How the hell did you guys get funding to do this? I can't imagine any sane person wanting to put money behind this. Could I have their contact information?

skuhn · on May 6, 2015

The real question to me is: why would anyone fund doing this in EC2?

Here's the quick math on cost per gflop, including all network and datacenter costs:

  Mac Pro: $5/gflop
  EC2 g2.xlarge: $21.19/gflop

frugalmail · on May 7, 2015

Not sure where you got ec2 out of my comment.

I also think you need to redo your math on the price per gflop for a Mac pro, ypou seem to be at least half the price of my back of the envelope work. Unless you have some crazy good supplier.

skuhn · on May 7, 2015

Exposing more detail behind this math is unfortunately not something that I'm ready to do, but I'm pretty comfortable with it in broad strokes. EC2 really is that much more expensive, when you factor in things like network bandwidth.

As I noted elsewhere, I mention EC2 because all of our (funded) competitors run there. We can split hairs over whether I could save 10% on Linux systems vs Mac systems, but the elephant in the room are all of the companies trying to make this sort of service work in EC2. You can't do it, and make money at the same time. Even if you can make money at small scale, you will eventually be crushed by your own success.

My overriding goal for imgix's datacenter strategy (and elsewhere in the company) is to build for success. To do that, we have to get the economies of scale right. I believe we have done so.

teacup50 · on May 7, 2015

The choice isn't between a Mac Pro and EC2. You can rack up x86 boxes chock full of GPUs far more easily than Mac Pros.

skuhn · on May 7, 2015

I mention it because AFAIK, all of imgix's direct competitors run in EC2.

Kephael · on May 7, 2015

How long will it take to amortize the costs of the hardware based on EC2 g2.xlarge savings?

skuhn · on May 7, 2015

Not certain if I understand your question, but I'll take a shot at answering:

I expect a useful life span for any datacenter equipment of 3 years. A Mac Pros list price is about $4000. We pay less but I'll use public figures throughout. Using equipment leasing, I can pay that $4000 over the 3 year period, with let's say a 5% interest rate and no residual value (to keep this simple). So over 3 years, I spend $4315 in total per machine to get 2200 gflop/s.

Over 3 years with EC2, a g2.xlarge is $7410 up front (to secure a 57% discount) for 2300 gflop/s.

So I can pay over time, save $3100 over a 3 year period, and probably still resell the Mac Pro for $500 at the end of its life span. That's pretty compelling math to me. There are costs involved with building and operating a datacenter, and that evens things out a bit. What really kills EC2 though is the network bandwidth costs. It is just insane.

barbagallo · on May 7, 2015

It'll be REAL f'in expensive in EC2, that's for sure.

exelius · on May 6, 2015

The Mac Pro isn't a great value in the datacenter space. It's a single socket server that's limited to 64 GB of RAM. It's not unusual anymore to throw GPUs in rack mount systems; most of them already have the PCIe bandwidth necessary to support 4 big GPUs so it's often just a matter of getting the right riser cards.

Compare a Mac Pro to an HP DL360 that can hold 4 8-core Xeons (32 cores total) and over 200GB of RAM along with a few FirePro or Titan GPGPUs, and the HP will give you far greater density (though a rack mount system with 4 8-core Xeons and 4 GTX Titans would be a power and cooling nightmare!). That said, the Mac Pro isn't as far behind as I would have expected.

But OS X also kicks ass at multithreading, especially if you use Apple's graphics libraries. It's entirely possible they get much greater performance from OS X than a Linux or Windows based solution could provide.

semi-extrinsic · on May 6, 2015

No sane person is putting GTX cards into a configuration with that level of power density, you'd have reliability issues from day one. This use case is exactly why Nvidia makes Tesla cards.

whydid · on May 6, 2015

OS X does not have NUMA. It has some nice libraries for multi-threading, but that doesn't really matter that much when you're saturating your memory bus because the CPUs are doing too many cross-zone memory requests.

exelius · on May 6, 2015

NUMA is irrelevant anyway because there are currently no multi-socket OS X machines. Multiple cores on the same package share a memory controller.

acdha · on May 6, 2015

While OS X doesn't have something like Linux's NUMA interface to explicitly lock a thread to a core, 10.5 shipped a thread affinity API which allows you to help the scheduler make better placement decisions:

“OS X does not export interfaces that identify processors or control thread placement—explicit thread to processor binding is not supported. Instead, the kernel manages all thread placement. Applications expect that the scheduler will, under most circumstances, run its threads using a good processor placement with respect to cache affinity.

However, the application itself knows the detailed caching characteristics of its threads and its data—in particular, the organization of threads as disjoint sets characterized by their association with (affinity to) distinct shared data.

While threads within such a set exhibit affinity with each other via shared data, they share a disaffinity or negative affinity with respect to other sets. In other words, a set expresses an affinity with an L2 cache and the scheduler should seek to run threads in a set on processors sharing that L2 cache.”

https://developer.apple.com/library/mac/releasenotes/Perform...

lectrick · on May 6, 2015

The manager of the datacenter kicked in a comment 9 minutes ago or so defending the decision (with facts) at the same level of your comment, if you want to check that out.

mozumder · on May 6, 2015

Well, they're fitting 4 sockets and 8 GPUS in the space that's normally used by 4 sockets only.

Also, if you're trying to sync raw images between OS X clients and the cloud, then you're going to need OS X servers in the cloud.

It'll greatly complicate the clients workflow if they can't use their built in raw converters.

Sanddancer · on May 6, 2015

At the scale imgix is going for, and given they're already doing a lot of custom architecture work, something like Supermicro's GPGPU chassis [1] would allow the same server density, plus use GPUs and CPUs that are 1-2 generations ahead of Apple's offerings. Regarding raw images, you don't need OS X servers to do that, just programs that can read the raw formats. That could be a windows box, or an OpenCL-enabled program like darktable [2]. Really the biggest issue here is engineering time for porting the app, and given the costs of the hardware they're using, I'd take a good hard look at how long it would take to port the software; I'd bet that they'd save money after deploying a few boxes.

[1] http://www.supermicro.com/products/system/2u/2028/SYS-2028GR...

[2] https://www.darktable.org/

skuhn · on May 6, 2015

(I'm the datacenter manager at imgix, and I wrote this article)

I mentioned this elsewhere, but considering alternative solutions was definitely a part of this project. Supermicro's GPGPU chassis was one of them, as well as some of the 2U FatTwin options (which we use for all of our other system types).

While it would probably have long term cost savings, it definitely isn't something that we could realize within deploying just a few systems. It would be a pretty time and labor intensive process on the software side, in order to save labor on the operations side that isn't particularly problematic for us. So, maybe in another few generations of our image renderers this will make sense, but it doesn't today.

mozumder · on May 6, 2015

Every raw image processor is different. You could use a different raw processor (which would complicate a client workflow), but the results would look different from the native OS X client raw image processor.

If you want a solution that exactly matches OS X client, you need OS X.

exelius · on May 6, 2015

Right; which is why I was surprised that OS X stacks up as favorably as it does. But I wouldn't call it a great value since it's still much more expensive than a traditional rack mount setup. From a density perspective it's really not bad at all, which is a testament to how well-engineered the Mac Pro is.

mozumder · on May 6, 2015

I'd say it's not even as expensive as something like IBM's 4-socket rack servers like the x3850 X6.

skuhn · on May 6, 2015

In fairness, 4 socket servers are pretty serious money. Just the difference in cost for 4 socket capable Xeons alone puts them out of reach for many use cases.

  E5-2658 v2 (dual cpu): $1440 per part
  E5-4650 v2 (quad cpu): $3616 per part

As a result, I stick to 2 socket servers for Linux machines. I think the scaling out paradigm just works out a lot better, particularly for Internet services.

exelius · on May 6, 2015

Yeah, except IBM's x86 hardware has always been stupid expensive for no good reason other than it's IBM. And didn't they spin off their xSeries server business to Lenovo once the market settled on HP systems that cost half what IBM's did?

mozumder · on May 6, 2015

Even an HP 4 socket system like the DL980 G7 is going to be as expensive.

None of these servers are going to be cheap.

rconti · on May 6, 2015

Also the DL980 G7 is an unholy piece of crap. HP doesn't know how to build or fix them. I've gone through countless service requests on just a dozen machines or thereabouts.

It's the worst piece of any kind of hardware I've ever used, hands down.

jseliger · on May 6, 2015

2. The Mac Pro, whilst expensive, is good value for money. The dual graphics cards inside it are not cheap at all. As servers with GPUs are fairly niche, this might actually be a cheaper solution.

I'd actually qualify this ever-so-slightly by saying "It's a good value for money if you need the specific features it offers." Which it evidently does to the OP! But many of us would prefer something with, say, one video card, one mainstream-ish desktop processor, and one mechanical hard drive, an way lower costs.

skuhn · on May 6, 2015

Yes, you're absolutely right. We use conventional Linux servers for application, database and storage for exactly this reason. The Mac Pros are a good value for an image rendering machine, but not for general purpose server stuff (in my opinion).

It's also a bit dear for use as a desktop machine, but it is pretty nice to have one hanging out for on your desk for a few weeks.

sanjeetsuhag · on May 6, 2015

From the site :

"Building on OS X technologies means we’re dependent on Apple hardware for this part of the service, but we aren’t necessarily limited to Mac Minis. Apple’s redesigned Mac Pro seemed like an ideal replacement, as long as we could reliably operate it in a datacenter environment."