Zero Downtime with HAProxy

zenlikethat · on May 21, 2015

I think this is why some techniques for doing rolling deploys set up a health check for the node which is configurable by the deploy process. That way, otherwise healthy nodes report their status as "down" while still accepting and finishing requests, but then get taken out of the HAProxy rotation without having to re-load the configuration. The new code is rolled out on that node, and the toggle is set to report "healthy" again, so HAProxy brings the node back into rotation.

nulltype · on May 22, 2015

This is what I've done in the past, not sure why we need to "hack TCP" to get zero downtime.

As an additional thing I do rolling deploys where I just create an entirely new VM and add it before removing the old one. This just means that I don't have to recycle a node that might potentially have some state on it.

tangled · on May 22, 2015

The article is about restarting HAProxy without any downtime. HAProxy restarts are needed when adding new service instances or adjusting configuration options. This is a different and much harder problem than gracefully restarting load balanced service instances.

jolynch · on May 22, 2015

As other posters have mentioned the solution in this link is actually suboptimal. I showed how the iptables solution leads to 1-3s latency for connections that establish during the restart but you can avoid it by getting a little more hardcore.

See http://engineeringblog.yelp.com/2015/04/true-zero-downtime-h... or http://inside.unbounce.com/product-dev/haproxy-reloads/ for solutions that are even better than this one.

jsprogrammer · on May 22, 2015

Why shouldn't HAProxy have an API for updating its configuration while running?

vidarh · on May 22, 2015

It should, and it allows you to do some changes over a socket, but allowing complete updates with the current haproxy architecture is a lot of work and isn't there yet.

bbrazil · on May 22, 2015

That's the approach we take, documented at http://www.boxever.com/gracefully-shutting-down-haproxy-with...

The challenge is that whether HAProxy's own support for graceful restarts works depends on which OS and kernel version you're running and what network socket features are supported, so ensuring that's all kosher is a bit of a challenge. I took the approach that it's simpler to do something that takes a bit longer without having to worry about all that.

trump · on May 21, 2015

Relevant blog post: http://engineeringblog.yelp.com/2015/04/true-zero-downtime-h...

nodesocket · on May 22, 2015

The downtime "gap" between step 1 and step 2 is probably way less than 500ms right? If being down for 500ms, is unacceptable, you probably should have multiple HAProxy instances running, and just do a rolling deploy. Seems like a bit of over-optimization.

strebler · on May 22, 2015

Exactly, with these "high uptime" requirements, isn't it usual to have 2+ HAProxy instances running on different servers bound to the same IP? At least that's how I do it. When you update the config, first update the backup HAProxy, then update the main one. Any requests that take place during the main proxy restart would go to the backup.

vbezhenar · on May 24, 2015

Could you please describe how can I bind 2 different servers to the same IP with automatic request balancing (so when one server is down, another one will serve all requests)? I understand how to do it with another server with HAProxy, but your solution seems to be without HAProxy.

strebler · on May 27, 2015

Apologies for the delay. We do use HAProxy.

To have 2 servers with the same IP, we use VRRP (Virtual Router Redundancy Protocol) and keepalived. HAProxy is setup on 2 separate servers/instances, and using VRRP/keepalive they both share an IP address (which HAProxy binds to). The servers also have their own unique IP address(es) (on top of the shared one), so the shared address doesn't really "belong" to any machine.

If one server goes down, VRRP gives the IP to the other server and that HAProxy takes over.

WestCoastJustin · on May 22, 2015

Played around rolling deployments a little using Ansible and tagged github releases. Here's a screencast with some pics for anyone who's interested @ https://sysadmincasts.com/episodes/47-zero-downtime-deployme...

Workflow is to just mark the node down for maintenance via socket cat, upgrade it, then mark it back on-line.

  # mark node off-line
  echo "disable server episode46/web1" | socat stdio /var/lib/haproxy/stats

  # update app code
  git pull tagged release version

  # mark node on-line
  echo "enable server episode46/web1" | socat stdio /var/lib/haproxy/stats

vidarh · on May 22, 2015

That's the easy part - the tricky part is when you need to reload the haproxy config to do any of the many things you can't do over the socket.

brianwawok · on May 22, 2015

If you have a few HA proxy nodes with DNS load balancing, wouldn't the client auto move on to another HA proxy? It is a tiny lag but shouldn't be an error.

vidarh · on May 22, 2015

That might have been the case if you could expect clients to be well behaved, but you can't. In my experience, hardly no network clients handle that kind of thing properly.

jeremyjh · on May 22, 2015

No, once the client has resolved the IP it will either connect to that proxy or fail that particular connection attempt. But you can use the DNS to drain a proxy before restarting it.

andreyf · on May 22, 2015

This way is even better, as it queues up the SYN packets until HAProxy is ready, leading to no faster responses than waiting 1s until the client retries with another SYN: http://engineeringblog.yelp.com/2015/04/true-zero-downtime-h...

vidarh · on May 22, 2015

It's also a lot more complex. It mainly makes a difference if you're reloading haproxy configs frequently, which most of us don't do.

jolynch · on May 22, 2015

I've been a little confused by the claims of how complex the SYN delay solution is. Is it actually all that more complex?

The original blog post wraps a restart command in two iptables invocations (and relies on a hacky sleep interval which may or may not work sometimes). The SYN delay method wraps a restart command in two tc invocations. The concepts are more or less identical in complexity as one is telling the kernel "drop SYNs now please" and the other is saying "delay SYNs for a bit please".

All the complexity in the qdisc solution is in the one time setup of the queuing disciplines. I think the largest drawback of the delaying SYN solution is not complexity but that getting it to work with external load balancers is more tricky than getting it to work with internal load balancers. Honestly, you're right that if an org doesn't have to restart HAProxy a ton, then it doesn't make a lot of sense to invest in solving this problem; although if it were me I'd just make sure I was on the latest Linux kernel so that the period during which HAProxy can cause RSTs is as small as possible and not bother with either the iptables or tc solutions.

kul_ · on May 22, 2015

Does nginx has the same problem of "small time window" downtime? I was guessing reloading would be more gracefully handled by having a single process binding on http/s ports and child processes which contact upstream servers. That way there is no "small time window" while reloading.

jolynch · on May 22, 2015

No, nginx should not have this problem as it uses fd passing to gracefully handoff connections.

HAProxy only has this issue on Linux because Linux's SO_REUSEPORT implementation unfortunately introduces a race condition between accept and close. While I haven't personally tested it, HAProxy on one of the BSDs should not have this small window of downtime.

ericfrederich · on May 22, 2015

I was, and am, of the opinion that any solution involving sleep isn't a real solution

kentt · on May 22, 2015

Why is that?

thraxil · on May 22, 2015

Without being dogmatic about it, I feel similarly about fixing things by inserting sleeps.

At a high level, you often see programmers sprinkle sleeps into their code to "fix" race conditions or deadlocks. That doesn't really fix the problem, it just moves it around and it's usually done because they don't know how to reason about the underlying problem and fix it properly.

You need to sleep long enough that whatever you're waiting for will definitely have finished. Most of the time you have no exact guarantee of that, so you have to pick some N that is relatively large. Inevitably, no matter what N you pick, sooner or later, the thing you're waiting for will take N + 1 and things break. To make it worse, the N + 1 situation often happens because you're getting an unusually large amount of traffic or because something else in the system is already in a failure state. So the breakage tends to come at the worst possible time and exacerbate things.

Meanwhile, if you sleep for N ms somewhere, one thing you can guarantee is that whatever you're doing will take at least N ms. There's no way to make it faster, even if it may have been unnecessary to wait that long. Often not a big deal, but the more developers sprinkle sleeps into their code, the more often you run into bizarre performance bottlenecks as a result.

Network timeouts and similar are kind of a fact of life, so there's no perfect solution. But if you find yourself trying to solve a problem by sleeping for some arbitrary period of time, a little alarm should go off in your head telling you that there's probably a better solution.

danieltillett · on May 22, 2015

Sometimes sleep is the only viable solution when dealing with an external system. I have such a problem with one of my programs that has to access a hardware device. The problem is the load the hardware can handle varies. To get maximum throughput I have to pound the hardware as fast as possible and when it fails sleep for a random but increasing time. If anyone knows of a more elegant solution please post.

thraxil · on May 22, 2015

Yeah, that's why I'm not dogmatic about it. With network timeouts, external systems that you can't control, and low level hardware access, sometimes it's the best you can do. The better solution would be for the hardware/system you are interacting with to publish an event or otherwise signal when it is or isn't able to handle more load. If it wasn't designed with back pressure in mind though, you do the best you can, and in your case, exponential backoff is probably it.

Adding jitter to avoid dogpiling is another case where sleeping is perfectly reasonable.

What I get wary about is the common pattern of: Make a call to some external service. Sleep for some amount of time (to "let it finish"). Then continue under the assumption that it has completed.

danieltillett · on May 22, 2015

I think we are in massive agreement here. Sleep can be used as crutch inappropriately, but when you have a broken leg a crutch is exactly what you need.

ericfrederich · on May 22, 2015

Agree that we're in agreement ;-)

wumpus · on May 22, 2015

Not only is this from 2014 and not labeled as such, but the shortened title is linkbait. No system gets 100% uptime. This article is about eliminating one small source of downtime for HAProxy systems. Nice, but not exactly a revolution.

logician76 · on May 22, 2015

Why does HAProxy need reloading of config change, can't it load the new config do a diff and apply only that? Or alternative ways of having a web frontend for HAProxy to make changes to the config piecemeal without restarting?

bbrazil · on May 22, 2015

There's a few small things you can do over the unix socket, but in general you need to restart haproxy to change it's config.

weitzj · on May 21, 2015

Also interesting: http://inside.unbounce.com/product-dev/haproxy-reloads/

thejosh · on May 22, 2015

[2014] , posted here before.