Amazon CloudFront - Support For Dynamic Content

simonw · on May 14, 2012

We got burned by CloudFront about 18 months ago... we were serving our static assets (CSS, JS etc) through CloudFront and had bug reports from some users in eastern europe (I forget where, it might have been Slovenia) that our site was displaying without CSS. I got them to check and they couldn't load CSS for GitHub (which used CloudFront) either. We went back to serving directly from S3.

It's an infuriating bug, because I can't see how we could confirm that this kind of thing isn't an issue any more. I'd love to go back to CloudFront but I'm just not confident that it will reach all of our users.

jeffbarr · on May 14, 2012

Please feel free to send me some details (address is in my profile) and I'll pass it along to the team.

simonw · on May 14, 2012

I've emailed you. For anyone else who's interested, here's the support email we got (back in January 2011 it turns out):

> The following URLs fail to load: > http://cdn.lanyrd.net/css/core.221dbc4b.min.css > http://cdn.lanyrd.net/js/jquery-1.4.3.min.97be02d1.min.js > http://cdn.lanyrd.net/js/jquery.jplayer.min.72d89d00.min.js > http://cdn.lanyrd.net/js/lang.ENG.4f594a71.min.js > http://cdn.lanyrd.net/js/global.f0851851.min.js > This is on the basic page - http://lanyrd.com/services/badges/. As far as I can tell, no files from the domain cdn.lanyrd.net will load. > > Also, it seems the Lanyrd.com site doesn't can't load any resources from the CDN domain as well - the homepage is totally broken for me. > > Oh, and I'm situated in Slovenia, if that helps.

I replied and asked them to run "host" and "ping" against cdn.lanyrd.net and they sent back the following:

> Host cdn.lanyrd.net not found: 3(NXDOMAIN) > ping:unknown host cdn.lanyrd.net

I also had an incident a few months later where our assets failed to load for a period for me sitting at my desk in London - GitHub's assets were affected as well, which lead me to suspect it was a CloudFront failure. Unfortunately I don't have any notes from that.

ceejayoz · on May 15, 2012

How do you know that wasn't your DNS provider having troubles there? Should have had them do `dig` to see if it was a DNS issue on your end instead of blaming Amazon right off the bat...

simonw · on May 16, 2012

It could well have been (that's why I'm sharing the details: so people can make their own mind up). Like I said, this was over a year ago so it's pretty hard to debug-in-hindsight.

ceejayoz · on May 16, 2012

Starting with "We got burned by CloudFront..." seems a little harsh when the only piece of actual data you have could just as easily point at your own DNS provider rather than Amazon's systems...

boundlessdreamz · on May 14, 2012

I ran into a bug where cloudfront takes a long time to server some requests. Not an isolated problem.

This thread has reports from other users - https://forums.aws.amazon.com/thread.jspa?messageID=246418

I had the same issue in one of my websites

1. http://www.webpagetest.org/result/111114_HM_263JS/1/details/ 2. http://www.webpagetest.org/result/111114_2N_263HM/1/details/

You can see that cloudfront takes upto 5 seconds for some images. And these are files which were cached in cloudfront (X-Cache: Hit from cloudfront)

Because of this issue, I moved away from cloudfront.

stevencorona · on May 14, 2012

On the flip side, we use CloudFront for > 5 billion requests/month and couldn't be happier.

giltotherescue · on May 14, 2012

Last time I checked, at that volume one could get much better pricing by committing to a yearly contract through a traditional CDN vendor.

stevencorona · on May 14, 2012

We use S3 as our origin, so using CloudFront makes sense from an ease of use and fastest response perspective. Also, CloudFront offers reserved capacity pricing for yearly commitments above a certain bandwidth level.

giltotherescue · on May 14, 2012

I encountered these types of problems on Cloudfront-powered sites all the time when I lived in Colorado. I frequently had issues using GitHub, Basecamp, etc. Only solution was to wait a few minutes and try again.

ceejayoz · on May 14, 2012

> I'd love to go back to CloudFront but I'm just not confident that it will reach all of our users.

Why are you confident you will serving directly off S3?

simonw · on May 14, 2012

Because with CloudFront there are dozens of origin servers around the world, and problems like the ones I experienced could be caused by a DNS server somewhere putting someone in touch with an unavailable server. S3 serves from one location (the location where you created the bucket) and hence is less likely to fail in the same way.

ceejayoz · on May 15, 2012

Yes, but if that one S3 location is having troubles, all of your users are affected, not just some of them as when CloudFront has trouble at a single location.

anarchitect · on May 14, 2012

Did you get to the root cause of the problem? We are about to trial Cloudfront on one of our sites and have discussed the possibility of it causing problems for some users.

simonw · on May 14, 2012

No I didn't, and I admit I didn't investigate very deeply (I chose to switch back to S3 and move on to other things).

j2labs · on May 14, 2012

Still no gzip support, though. I had to jump through some hoops to get this to work by posting duplicate files that were gzipped ahead of time that respond to all requests with static headers saying the content is gzipped. It works, but it'd be a LOT better if cloudfront could do that for us.

mslot · on May 14, 2012

When using a custom origin (non-S3), your web server is generally capable of gzip compression. CloudFront will separately request and cache the content in compressed and uncompressed form as needed.

http://docs.amazonwebservices.com/AmazonCloudFront/latest/De...

notmyname · on May 14, 2012

Rackspace Cloud Files supports this. The file "test_javascript.js" was saved non-compressed. It works the other way, too (compressed->uncompressed if the client doesn't support compression):

    $ curl -i http://d.not.mn/test_javascript.js
    $ curl -i -H "Accept-Encoding: gzip" http://d.not.mn/test_javascript.js

This isn't trying to take away from their announcement. I'm always impressed by Amazon's ability to rapidly deliver features.

boundlessdreamz · on May 14, 2012

Rackspace cloud files didn't have origin pull last time I checked. Without origin pull asset serving via cdn is a pain to setup and maintain.

notmyname · on May 14, 2012

True. Your content needs to be in Cloud Files, not on your own server. The storage and cdn services are tied together into the product. They have not been separated to allow the CDN on top of any arbitrary endpoint.

I don't see the requirement of storing the data in cloud files as a very heavy burden, but I'm not the most unbiased source on that.

giltotherescue · on May 14, 2012

I second this

kennu · on May 14, 2012

It sounds like being able to run a bunch of Varnish servers to cache stuff at edge locations around the world. I wonder if it really works that way or do you have to change your web app a lot to work with it?

tyler · on May 14, 2012

If that's what you're after you might want to check out Fastly. We're a CDN entirely based on Varnish, with all the features that implies.

giltotherescue · on May 14, 2012

I implemented exactly this for our application about a year ago. We managed to speed up the average backend response times for the entire site by about 500ms. Unfortunately, the cost of the edge servers + the anycast routing tech from a third party vendor was more than the business benefit we saw.

eli · on May 14, 2012

If you set the TTL for a particular origin to 0, CloudFront will still cache the content from that origin. It will then make a GET request with an If-Modified-Since header, thereby giving the origin a chance to signal that CloudFront can continue to use the cached content if it hasn't changed at the origin.

I wonder how well this works for content that is truly dynamic. Seems like it would necessarily be slower for those pages that change on every request.

mslot · on May 14, 2012

Not necessarily. The networks used by CloudFront may outperform direct paths. More importantly, CloudFront edge locations try to maintain a persistent connection to the origin and use a large initial TCP congestion window. This saves you from the delay caused by setting up a TCP connection over a long network path.

Simplified example (ignoring DNS latency, assuming symmetric paths):

User to CloudFront RTT is 30ms

User to Origin RTT is 100ms

CloudFront to Origin RTT is 100ms

It seems clear that User to Origin is faster than User to CloudFront to Origin, but not if you consider TCP mechanics.

If the User makes an HTTP request for a 4KB file to the Origin directly, it will take 100ms to set up the connection, 50ms for the request to reach the Origin, and another 50ms for the first response byte to arrive. Total: 200ms. If the origin does not have a big initial congestion window, it will take another 100ms for the last byte to arrive. Total: 300ms.

If the User makes an HTTP request for a 4KB file through CloudFront, it will take 30ms to set up the TCP connection. The request packet(s) will take 15ms to reach CloudFront. CloudFront tries to maintain a persistent TCP connection to the Origin, which avoids set-up time and slow start. The request to the Origin will take 100ms to complete, and another 15ms to reach the User. Total: 160ms.

Using CloudFront as an intermediary could reduce latency by a lot, even if no caching is going on.

byoung2 · on May 14, 2012

I bet they use a grace period, like Varnish does. The grace period would let the cache serve stale data for a few seconds or more while the if-modified-since call is made, and if necessary, the cache refreshed.

mckoss · on May 14, 2012

I don't think that's what the post said. Doing so would break the semantics of the 0 second cache time. They must wait for your 304 Not Modified response before serving from their cache.

ksec · on May 14, 2012

I was going to ask, is this something new for CDN or do others like Akamai and EdgeCast offers something similar?

giltotherescue · on May 14, 2012

There's nothing stopping you from using a traditional CDN vendor as an application CDN. I spoke to the folks at Edgecast about this a year or two ago, and they didn't have a problem with it. However, it sounds like Amazon made some specific optimizations for application content which could be more appealing.

Also, CloudFlare offers the same service, but with added security and anti-spam features.

teoruiz · on May 14, 2012

Supporting the query string as part of the cache key is big news for CF users.

Thanks a lot.

marcamillion · on May 14, 2012

Hrmm....the more and more that Amazon rolls out these services is the more and more tempted I am to go full-hog on AWS.

I only use S3 now - and Heroku - but I am excited about where this will continue going.

The future looks bright and I can't wait until the right application comes along for me to build it on top of a fully scalable infrastructure that I only have to pay as I use.

cmer · on May 14, 2012

I agree with taligent. Stay away from EC2. Here are some benchmarks I ran to give you an idea why: http://blog.carlmercier.com/2012/01/05/ec2-is-basically-one-...

alinajaf · on May 14, 2012

If I'm not mistaken, Heroku runs on EC2, so you're already full-hog on AWS.

marcamillion · on May 14, 2012

Nah I know...but I just prefer Heroku's interface and CLI tools to those of Amazon.

alinajaf · on May 15, 2012

Totally. Especially since the endgame in any web-facing EC2 architecture is to have stateless nodes behind a load balancer that you can scale up and down as required.

Ideally you'd automatically recreate AMI snapshots whenever your code changes. In other words, erm, Heroku.

fookyong · on May 14, 2012

I use Heroku and S3 too - Cloudfront took mere minutes to set up and I got an instant and very noticeable latency decrease. It's a CDN that "just works". I'm using it for all projects now.

taligent · on May 14, 2012

Don't.

My suggestion for almost all businesses is to use AWS for S3, SQS, SWF etc and then get dedicated/VPS servers in the same data center. I actually get faster ping times to SQS from my dedicated server than from EC2 (both in US-East).

EC2 is the biggest ripoff going around. And all the other AWS services are some of the most awesome going around.

netmau5 · on May 14, 2012

What are some good dedicated hosting options in US-east? I've tried looking them up but the info is usually buried deep in the hoster's site where it's impossible to find.

shimon_e · on May 14, 2012

Try burst.net

OVH should have their east coast data centre open for customers in August. http://www.datacenterknowledge.com/archives/2012/04/30/europ... I am an alpha testing and their service is great. They are not just a data centre that leases bandwidth from others. They are an internet backbone with ownership in back haul fibre. They are big enough to add 290gbps to their network in days. http://forum.ovh.co.uk/showpost.php?p=42216&postcount=66

Most of the links are 10gbps so it is just a simple hardware upgrade to 40gbps or 100gbps.

LogicX · on May 14, 2012

Voxel.net and incero.com are some great options - watch webhostingtalk.com for deals by incero.

Voxel is NYC Incero has Texas and NC locations.

Voxel is larger, great network, great reputation, good prices. Incero is smaller, insanely good prices, new, reliable and great CS so far.

cmer · on May 14, 2012

Check out ReliableSite.net. Follow them on Twitter and Facebook for discounts almost daily.

giltotherescue · on May 14, 2012

Logicworks in New Jersey/NYC and Peer1 in Atlanta.

taligent · on May 14, 2012

I have only been with them a month but I use FastServ. If you search on WebHostingTalk they have offers available and the guy seems pretty knowledgable. I am hoping they don't let me down. Another option is the famous ServerBeach who Youtube used to use as their CDN. Would love to know others.

Ping times are on average 2.75ms.

Uchikoma · on May 17, 2012

Two years ago I felt alone with my opinion, good to see the Must-be-on-EC2 hype is over.

http://codemonkeyism.com/dark-side-virtualized-servers-cloud...

"You can see that if you utilize the servers 100%, then EC2 is between 2x and 3.3x more expensive than renting servers Additionally looking at the CUs the EC2 images are less powerful than rented hardware, so you probably need more of them."

ksec · on May 14, 2012

Same thoughts here. I could never understand why anyone would go with AWS EC2. It is expensive and slow. You could get much better deals from other dedicated providers or cloud providers. Route53 is ok and improving although i would still recommend DNSmadeeasy.

giltotherescue · on May 14, 2012

I also use SQS from a non-EC2 app server environment. It's one less service I have to administer, and in my case the latency isn't an issue.

mattgreenrocks · on May 14, 2012

Blah, still no mention of SSL CNAME support.

WALoeIII · on May 14, 2012

This cannot be reliably done with Windows XP in the wild.

http://en.wikipedia.org/wiki/Server_Name_Indication

Essentially, the cloudfront server doesn't know the certificate to present.

wahnfrieden · on May 14, 2012

Yea this prevents us from using CloudFront :( CloudFlare has been a nightmare so I'd love to switch over.

ilaksh · on May 14, 2012

I have been messing around with caching and combining files quite a bit over the last few days, mainly since the latency between myself and my webserver really add up, especially as the number of files I need to transfer increases.

The difference between San Diego and Chicago, or San Diego and New York, when I compare it to running my application locally, makes me really want some kind of quantum instant communication. But since we don't have that, I really would like something like a CloudFront. Pretty much for every single application or web page that I make, I would actually like to have that.

Actually, I think it would be better if everyone and every web site had that, a way that sites could automatically be cached in servers local to everyone's city. Wouldn't that be nice?

Which reminded me of the whole concept of content-centric networking.

Here is one related project that I found: http://code.google.com/p/haggle/

http://en.wikipedia.org/wiki/Content-centric_networking

The hard part about this is that to really be effective it probably means really changing the way things work. It is tough to ease into it.

This could also help with reducing the amount of data that needed to be transferred. Maybe we could figure out a way so that every website in the world would be compressed referencing a very large global dictionary that was shared on every client (or possibly partitioned for local clusters, but that is more complicated..)

Regardless of the level of compression, it would still probably be possible to distribute quite a lot of the trending web content to be cached locally. Maybe it could be a bit like a torrent client for people's desktops, or maybe web application servers could have an installed program that participates in the distribution system and also publishes to it.

Maybe it could be a browser extension or just userscript (Greasemonkey) (probably has to be an extension) that would cache and distribute web pages you view. So for example as we are clicking on Hacker News headlines we are caching those pages on our local machines. Then when another person who has the same script/extension installed clicks on that headline, it will first check his local peers, and if I am in the same city and have that file already, I can give him all of the content in a fraction of the time. If a lot of people used that extension, the web would be much faster, and it would solve a lot of problems.

I wonder if there isn't already a system like that. I mean there are probably RSS feeds that come off of Hacker News and Reddit, and a reader could actually precache all of that content. But more comprehensively, I bet there is quite a bit of content that large numbers of say programmers are constantly accessing, that could benefit from that type of system.

Can we make something like a gzip but not limited to within 32kb or anything, instead it is a giant dictionary that is a GB in size with all of the most common sequences for all of the software engineering web sites that are popular today. Then instead of sending a request to San Francisco or Chicago, I can just send a request to a guy less than a mile away who also happens to be interested in Node.js or whatever.

Maybe something like http://en.wikipedia.org/wiki/Freenet or an open source CDN.

Or something like this http://en.wikipedia.org/wiki/Osiris_(Serverless_Portal_Syste...

http://www.osiris-sps.org/download/

bigiain · on May 14, 2012

"Actually, I think it would be better if everyone and every web site had that, a way that sites could automatically be cached in servers local to everyone's city. Wouldn't that be nice?"

HTTP over NNTP. I see a great need…

shakesbeard · on May 14, 2012

Aren't ISPs already doing that? (the caching part)

urbanjunkie · on May 14, 2012

I assume that this means that versioning of CSS / JS files now works - one of the problems I've experienced with Cloudfront is updating files that don't normally change very often (eg CSS). As Cloudfront doesn't support query strings, changing version number of the following didn't work: stylesheet.css?ver=201200505, but now it should.

ceejayoz · on May 14, 2012

We've just been using mod_rewrite to rewrite stylesheet_([0-9]+).css to stylesheet.css for our CloudFront stuff. Our build scripts pop in the file modification time, so CF sees a new URL any time we update a file.

on May 14, 2012

[deleted]

biot · on May 14, 2012

In case you should ever choose to email him to convey your distaste over sharing Amazon Web Services news, his email address (published in his HN profile) is: jbarr@amazon.com

I, for one, find AWS technical announcements welcome and relevant. Perhaps you'd rather read about janitors getting degrees?

jeffbarr · on May 14, 2012

I didn't see the top message in this thread, but I am receptive to public or private feedback regarding the relevancy of my submissions.

saurik · on May 15, 2012

FWIW, I appreciate seeing Amazon feature announcements on HN. I also believe that the 116 points that this post has received indicate that a lot of other people agree that this was relevant, and I further will state that some of the comments that were posted on this topic have been insightful. Please do not stop ;P.

biot · on May 14, 2012

It was a link to your submission history with the comment "Please stop".