Having just finished benchmarking a new project I'm writing on nearly all EC2 instance types I have to say nothing beats benchmarking.
Create a spreadsheet, write some benchmarking tools, make sure you aren't actually benchmarking another service the instance depends on (db, API, etc) and go to town. Vary # of request/sec, # of parallel requests, # of app workers/threads/etc.
Write down all your results and then compute amount of 'work' (reqs/sec, computations, db inserts/updates, etc) vs hourly or RI pricing. That should get you some good numbers on what instance type is most cost effective for your workload.
Per CloudWatch stats, it varies for machine and region. We have a 6 machines array in three regions receiving round robin traffic from ELB's. It is network bound. However, the CPU graphs say otherwise. Here's the output of three machines in three regions: http://i.imgur.com/nadGmDH.png while their siblings follow the same trend. The working set resides in RAM. The disk and network follow the same path. These are instance store machines sharing as little as possible. Failing over a multi-AZ RDS for maintenance reasons did not change the graphs.
This is the first time I've heard about AMI's - I was under the impression that Amazon just sold you the hardware and it was up to you to handle the rest.
If I wanted to buy a small instance to start which AMI should I choose to have a run of the mill Rails stack?
For a simple Rails app any of the Debian-based systems would suffice, and Ubuntu 12.04 LTS is particularly high-quality.
As for cost, there are licensing and support fees. By using such AMIs you are allocated support resources, eg: opening tickets, receiving support via personal QA assistance.
The AMIs that cost merely have an additional cents-per-hour charge.
AMIs are kind of hard to avoid if you manage your AWS through the web console - create a new instance and the first thing that pops up is "choose an AMI".
This is the first time I've seen that reddit case study. I'm sad that they didn't use the part where they interviewed me 3 years ago and actually got the history part right. :)
(no offense to Keith -- he wasn't there after all)
Considering the fact that a ami can ( and does ) shutdown at any time, loosing everything you've got on disk, i've always been reluctant to use those for databases ( and ebs volumes mounting / unmounting while a vm properly relaunch seems too slow and cumbersome to me and not reliable).
Am i right not to trust those for that task ?
AMIs definitely aren't for storing highly variable data. I use AMIs in two ways:
1) I create a base AMI for each group of my servers that have similar environments. For example, I may have a base image v1.3 for general wordpress sites, or v1.1 for rails + redis. By doing this, I greatly reduce the amount of manual setup when I want to load a new server, and I know that when my server is up, it has all the tools and configurations that I need in order to just start deploying my code.
2) I create nightly snapshots of all of my attached volumes and also create a backup AMI of all of my current servers. These are ONLY for recovering servers that are not changing minute-by-minute, or for easily rolling back if something terrible happens and I just need to start fresh.
You can use an Ec2 + an AMI + an EBS volume for whatever you'd like, including a database. If the server goes down, you will not lose a thing, since your EBS persists. You also don't have to manually mount/unmount - that's all taken care of for you automatically. There have been some rare issues in the past with EBS volumes dying, so for critical data, I would definitely consider some sort of replication.
Create a spreadsheet, write some benchmarking tools, make sure you aren't actually benchmarking another service the instance depends on (db, API, etc) and go to town. Vary # of request/sec, # of parallel requests, # of app workers/threads/etc.
Write down all your results and then compute amount of 'work' (reqs/sec, computations, db inserts/updates, etc) vs hourly or RI pricing. That should get you some good numbers on what instance type is most cost effective for your workload.