Jupiter Rising: A Decade of Google’s Datacenter Network [pdf]

mik3y · on July 30, 2015

Finally! Time to fly the flannel.. I built the first 3 generations described here starting in '04; it's awesome now to have something to point to, especially as FB has lead the charge on opening their own gear. And I'm sure it's going to make collaboration and hiring that much easier.

The secrecy was at first a universally-agreed necessity; in simplistic terms, we didn't want MSFT knowing how much money to throw at the problem, or even what the problem was. This was true (and surely still true to an extent) for all of platforms: it was always amusing to see public photo shoots at "google datacenters", which in reality were little more than a stack of google search appliances at a corp location.

The level of detail here is great and really sums up 10 years of a lot of hard-earned findings. I'm thrilled pictures of the hardware are even included, that team is just top notch.

For anyone who isn't a Google or Facebook but has large or growing colo/dc network needs, check out Cumulus Networks. It applies a lot of the SDN ideas (think ssh/puppet-driven config mgmt for your switches as just the start) and topology possibilities seen here; doesn't hurt that JR did a brief stint on firehose :)

kijiki · on Aug 4, 2015

Thanks for the kind words for Cumulus!

- nolan

ChuckMcM · on Aug 11, 2015

I agree, its nice they have finally come out with some of the stuff they are/were doing. The Jupiter stuff sounds especially tasty for east-west heavy workloads.

deelowe · on Aug 6, 2015

Hey Mikey! The first generation certainly was a blast! Amazing how far things have come. :-)

MattCruikshank · on July 31, 2015

I used to work at Google as well, and frankly I was so abstracted from all of this that it's absurd.

It's like finally reading how my car is an "intern-al comb-usti-on eng-ine" whatever that means.

mrbill · on July 30, 2015

Neat to see where the "Pluto" mystery switch might fit in, now.

http://www.wired.com/2012/09/pluto-switch/

packetslave · on July 30, 2015

Originally discussed at https://news.ycombinator.com/item?id=9734305 but this is the actual paper, which was just published.

hackuser · on July 31, 2015

How much opportunity is there for this level of datacenter design to reduce power consumption? What are the trade-offs?

mik3y · on July 31, 2015

The real power savings come not from the switches themsleves, but from the application and scheduling architectures it enabled.

Having full cross-sectional bandwidth between any pair of hosts means the bin packing problem is a lot easier. You don't need to (say) make sure your map reduce job is scheduled with one shard per rack because racks only have so much bandwidth. Any host on any rack will do. You can forget racks even exist.

This makes overall utilization of clusters more efficient (tighter bin packing), and the corollary is you don't need as many clusters and machines. (Not that has ever stopped Google from building more :)

wmf · on July 31, 2015

In general a faster network allows more flexible scheduling, which should give more freedom to tune the scheduler for energy savings. Check out http://web.stanford.edu/~davidlo/resources/2015.thesis.pdf

ariwilson · on July 30, 2015

Amazing network infrastructure has always been one of my favorite hidden technical wonders behind Google. It's amazing how they handle so much data.

thrownaway2424 · on July 31, 2015

When you actually have to use it what's amazing is how much congestion and packet loss there is. On paper it looks like a zero-impedance source of data but in reality it barely keeps up.

raldi · on Aug 11, 2015

What QoS are you referring to?

throwawayplanet · on July 30, 2015

I did a brief stint replacing burnt out parts in a Google datacenter.

This paper is weird--like maybe they switched around the names of some things and didn't mention others.

Just an FYI.

packetslave · on July 30, 2015

I used to work in cluster networking, before I became an SRE. Near as I can tell, no names have been switched around in this paper, and it covers all of the major generations of Google cluster networks up to a fairly recent point in time.

mvgoogler · on July 31, 2015

I used to work in platforms (not networking, but we worked pretty closely with them) and the information in the paper was accurate and surprisingly complete.

deelowe · on Aug 6, 2015

Been working on this stuff since the beginning. Looks right to me.