We got burned by CloudFront about 18 months ago... we were serving our static assets (CSS, JS etc) through CloudFront and had bug reports from some users in eastern europe (I forget where, it might have been Slovenia) that our site was displaying without CSS. I got them to check and they couldn't load CSS for GitHub (which used CloudFront) either. We went back to serving directly from S3.
It's an infuriating bug, because I can't see how we could confirm that this kind of thing isn't an issue any more. I'd love to go back to CloudFront but I'm just not confident that it will reach all of our users.
I replied and asked them to run "host" and "ping" against
cdn.lanyrd.net and they sent back the following:
> Host cdn.lanyrd.net not found: 3(NXDOMAIN)
> ping:unknown host cdn.lanyrd.net
I also had an incident a few months later where our assets failed to load for a period for me sitting at my desk in London - GitHub's assets were affected as well, which lead me to suspect it was a CloudFront failure. Unfortunately I don't have any notes from that.
How do you know that wasn't your DNS provider having troubles there? Should have had them do `dig` to see if it was a DNS issue on your end instead of blaming Amazon right off the bat...
It could well have been (that's why I'm sharing the details: so people can make their own mind up). Like I said, this was over a year ago so it's pretty hard to debug-in-hindsight.
Starting with "We got burned by CloudFront..." seems a little harsh when the only piece of actual data you have could just as easily point at your own DNS provider rather than Amazon's systems...
We use S3 as our origin, so using CloudFront makes sense from an ease of use and fastest response perspective. Also, CloudFront offers reserved capacity pricing for yearly commitments above a certain bandwidth level.
I encountered these types of problems on Cloudfront-powered sites all the time when I lived in Colorado. I frequently had issues using GitHub, Basecamp, etc. Only solution was to wait a few minutes and try again.
Because with CloudFront there are dozens of origin servers around the world, and problems like the ones I experienced could be caused by a DNS server somewhere putting someone in touch with an unavailable server. S3 serves from one location (the location where you created the bucket) and hence is less likely to fail in the same way.
Yes, but if that one S3 location is having troubles, all of your users are affected, not just some of them as when CloudFront has trouble at a single location.
Did you get to the root cause of the problem? We are about to trial Cloudfront on one of our sites and have discussed the possibility of it causing problems for some users.
Still no gzip support, though. I had to jump through some hoops to get this to work by posting duplicate files that were gzipped ahead of time that respond to all requests with static headers saying the content is gzipped. It works, but it'd be a LOT better if cloudfront could do that for us.
When using a custom origin (non-S3), your web server is generally capable of gzip compression. CloudFront will separately request and cache the content in compressed and uncompressed form as needed.
Rackspace Cloud Files supports this. The file "test_javascript.js" was saved non-compressed. It works the other way, too (compressed->uncompressed if the client doesn't support compression):
True. Your content needs to be in Cloud Files, not on your own server. The storage and cdn services are tied together into the product. They have not been separated to allow the CDN on top of any arbitrary endpoint.
I don't see the requirement of storing the data in cloud files as a very heavy burden, but I'm not the most unbiased source on that.
It sounds like being able to run a bunch of Varnish servers to cache stuff at edge locations around the world. I wonder if it really works that way or do you have to change your web app a lot to work with it?
I implemented exactly this for our application about a year ago. We managed to speed up the average backend response times for the entire site by about 500ms. Unfortunately, the cost of the edge servers + the anycast routing tech from a third party vendor was more than the business benefit we saw.
If you set the TTL for a particular origin to 0, CloudFront will still cache the content from that origin. It will then make a GET request with an If-Modified-Since header, thereby giving the origin a chance to signal that CloudFront can continue to use the cached content if it hasn't changed at the origin.
I wonder how well this works for content that is truly dynamic. Seems like it would necessarily be slower for those pages that change on every request.
Not necessarily. The networks used by CloudFront may outperform direct paths. More importantly, CloudFront edge locations try to maintain a persistent connection to the origin and use a large initial TCP congestion window. This saves you from the delay caused by setting up a TCP connection over a long network path.
Simplified example (ignoring DNS latency, assuming symmetric paths):
User to CloudFront RTT is 30ms
User to Origin RTT is 100ms
CloudFront to Origin RTT is 100ms
It seems clear that User to Origin is faster than User to CloudFront to Origin, but not if you consider TCP mechanics.
If the User makes an HTTP request for a 4KB file to the Origin directly, it will take 100ms to set up the connection, 50ms for the request to reach the Origin, and another 50ms for the first response byte to arrive. Total: 200ms. If the origin does not have a big initial congestion window, it will take another 100ms for the last byte to arrive. Total: 300ms.
If the User makes an HTTP request for a 4KB file through CloudFront, it will take 30ms to set up the TCP connection. The request packet(s) will take 15ms to reach CloudFront. CloudFront tries to maintain a persistent TCP connection to the Origin, which avoids set-up time and slow start. The request to the Origin will take 100ms to complete, and another 15ms to reach the User. Total: 160ms.
Using CloudFront as an intermediary could reduce latency by a lot, even if no caching is going on.
I bet they use a grace period, like Varnish does. The grace period would let the cache serve stale data for a few seconds or more while the if-modified-since call is made, and if necessary, the cache refreshed.
I don't think that's what the post said. Doing so would break the semantics of the 0 second cache time. They must wait for your 304 Not Modified response before serving from their cache.
There's nothing stopping you from using a traditional CDN vendor as an application CDN. I spoke to the folks at Edgecast about this a year or two ago, and they didn't have a problem with it. However, it sounds like Amazon made some specific optimizations for application content which could be more appealing.
Also, CloudFlare offers the same service, but with added security and anti-spam features.
Hrmm....the more and more that Amazon rolls out these services is the more and more tempted I am to go full-hog on AWS.
I only use S3 now - and Heroku - but I am excited about where this will continue going.
The future looks bright and I can't wait until the right application comes along for me to build it on top of a fully scalable infrastructure that I only have to pay as I use.
Totally. Especially since the endgame in any web-facing EC2 architecture is to have stateless nodes behind a load balancer that you can scale up and down as required.
Ideally you'd automatically recreate AMI snapshots whenever your code changes. In other words, erm, Heroku.
I use Heroku and S3 too - Cloudfront took mere minutes to set up and I got an instant and very noticeable latency decrease. It's a CDN that "just works". I'm using it for all projects now.
My suggestion for almost all businesses is to use AWS for S3, SQS, SWF etc and then get dedicated/VPS servers in the same data center. I actually get faster ping times to SQS from my dedicated server than from EC2 (both in US-East).
EC2 is the biggest ripoff going around. And all the other AWS services are some of the most awesome going around.
What are some good dedicated hosting options in US-east? I've tried looking them up but the info is usually buried deep in the hoster's site where it's impossible to find.
I have only been with them a month but I use FastServ. If you search on WebHostingTalk they have offers available and the guy seems pretty knowledgable. I am hoping they don't let me down. Another option is the famous ServerBeach who Youtube used to use as their CDN. Would love to know others.
"You can see that if you utilize the servers 100%, then EC2 is between 2x and 3.3x more expensive than renting servers Additionally looking at the CUs the EC2 images are less powerful than rented hardware, so you probably need more of them."
Same thoughts here. I could never understand why anyone would go with AWS EC2. It is expensive and slow. You could get much better deals from other dedicated providers or cloud providers. Route53 is ok and improving although i would still recommend DNSmadeeasy.
I have been messing around with caching and combining files quite a bit over the last few days, mainly since the latency between myself and my webserver really add up, especially as the number of files I need to transfer increases.
The difference between San Diego and Chicago, or San Diego and New York, when I compare it to running my application locally, makes me really want some kind of quantum instant communication. But since we don't have that, I really would like something like a CloudFront. Pretty much for every single application or web page that I make, I would actually like to have that.
Actually, I think it would be better if everyone and every web site had that, a way that sites could automatically be cached in servers local to everyone's city. Wouldn't that be nice?
Which reminded me of the whole concept of content-centric networking.
The hard part about this is that to really be effective it probably means really changing the way things work. It is tough to ease into it.
This could also help with reducing the amount of data that needed to be transferred. Maybe we could figure out a way so that every website in the world would be compressed referencing a very large global dictionary that was shared on every client (or possibly partitioned for local clusters, but that is more complicated..)
Regardless of the level of compression, it would still probably be possible to distribute quite a lot of the trending web content to be cached locally. Maybe it could be a bit like a torrent client for people's desktops, or maybe web application servers could have an installed program that participates in the distribution system and also publishes to it.
Maybe it could be a browser extension or just userscript (Greasemonkey) (probably has to be an extension) that would cache and distribute web pages you view. So for example as we are clicking on Hacker News headlines we are caching those pages on our local machines. Then when another person who has the same script/extension installed clicks on that headline, it will first check his local peers, and if I am in the same city and have that file already, I can give him all of the content in a fraction of the time. If a lot of people used that extension, the web would be much faster, and it would solve a lot of problems.
I wonder if there isn't already a system like that. I mean there are probably RSS feeds that come off of Hacker News and Reddit, and a reader could actually precache all of that content. But more comprehensively, I bet there is quite a bit of content that large numbers of say programmers are constantly accessing, that could benefit from that type of system.
Can we make something like a gzip but not limited to within 32kb or anything, instead it is a giant dictionary that is a GB in size with all of the most common sequences for all of the software engineering web sites that are popular today. Then instead of sending a request to San Francisco or Chicago, I can just send a request to a guy less than a mile away who also happens to be interested in Node.js or whatever.
"Actually, I think it would be better if everyone and every web site had that, a way that sites could automatically be cached in servers local to everyone's city. Wouldn't that be nice?"
I assume that this means that versioning of CSS / JS files now works - one of the problems I've experienced with Cloudfront is updating files that don't normally change very often (eg CSS). As Cloudfront doesn't support query strings, changing version number of the following didn't work: stylesheet.css?ver=201200505, but now it should.
We've just been using mod_rewrite to rewrite stylesheet_([0-9]+).css to stylesheet.css for our CloudFront stuff. Our build scripts pop in the file modification time, so CF sees a new URL any time we update a file.
In case you should ever choose to email him to convey your distaste over sharing Amazon Web Services news, his email address (published in his HN profile) is: jbarr@amazon.com
I, for one, find AWS technical announcements welcome and relevant. Perhaps you'd rather read about janitors getting degrees?
FWIW, I appreciate seeing Amazon feature announcements on HN. I also believe that the 116 points that this post has received indicate that a lot of other people agree that this was relevant, and I further will state that some of the comments that were posted on this topic have been insightful. Please do not stop ;P.
It's an infuriating bug, because I can't see how we could confirm that this kind of thing isn't an issue any more. I'd love to go back to CloudFront but I'm just not confident that it will reach all of our users.