All the time spent talking about caching in Rails avoids the real issue: generating complex html in Ruby/Rails is really slow.
My Rails sites spend 85% of the time generating HTML, the other 15% of the time is spent communicating with the db and other external services. And it's not easy to figure out where the slowness is. When I've ran performance profiles in the past, something like 50% of the time was spent in GC.
From experience I have to agree with both parts of this - it can be very slow and it's generally not that easy to profile, e.g. the call graph tends to get very complex and therefore difficult to interpret, and as you say, GC can be a big part of the problem which doesn't help.
One thing worth mentioning: anecdotally I've found that ActiveRecord can have an impact on performance far beyond the cost of the database queries, both by directly adding a reasonable amount of overhead and by instantiating a lot of objects internally which in turn triggers a lot of garbage collection. I wouldn't be surprised if that accounts for a fair chunk of what you're seeing.
I should also add that addressing performance issues with caching can seriously impact the maintainability of your application, particularly if you end up with complex cache expiry rules and/or systems to pre-populate cache entries. I'd definitely say that the priority should be to first simplify your app as much as possible, then optimise and only when that doesn't work to use caching.
Agreed 100%. I've worked hard to remove all forms of caching from my sites. It simplifies things tremendously.
I've found that database views and functions are one of the easiest ways to improve performance in Rails.
FOR EXAMPLE
To generate user's information shown above each comment for https://img.skitch.com/20120220-pununfygpsaw1cw5gmjin8e95i.p..., I have to get the user's username, the total amount of "points" the user has, the user's profile image, if the user is an admin (admins are formatted differently), etc. This information is stored across around 5 or 6 tables. Enter this view (which calls a few db functions):
CREATE VIEW user_profile_info
AS
SELECT users.id AS user_id,
users.slug AS user_slug,
users.username,
user_stats.total_points,
user_profile_image(users.id) AS profile_image,
is_user_admin(users.id) AS admin
FROM (users
JOIN user_stats
ON (( user_stats.user_id = users.id )));
Now, I can have a simple UserProfileInfo ActiveRecord class that wraps this user_profile_info database view.
Then I can do:
@object.comments.includes(:user_profile_info)
and, very efficiently, I get a list of comments and all the user's information.
If I didn't use this approach, I would have to have a complex caching scheme to avoid the multiple sql queries. The goal is to minimize the amount of data that comes over the wire via sql queries (which also reduces the amount of work ActiveRecord has to do to construct these objects in memory).
BTW, I'm starting to write a book about using postgresql effectively with web applications. I've found that there are tons of web developers (especially in Rails) that don't use the full-power of postgresql correctly which leads to slow and buggy code.
I use most of these methods for caching. I have my own little methods for making cache keys. I use Rails.cache all over the place. I have a couple sweepers. What I really struggle is testing this stuff. You really can't do it anywhere other than an integration test, and they're ugly. I mostly just test the sweepers. Anyone have any nice solutions for testing caching, or think its unnecessary?
I've never had a production Rails project, but do you have any anecdotes or numbers to back up your statement? From initial glance, it looks like the win in processing speed outweights the complexity. So then you're implying that this win isn't required. I'm assuming that scaling out is cheaper than the complexity cost of implementing/maintaining such a caching solution?
I haven't had a lot of experience with caching, and I agree with your comment above about database views, but wouldn't the complications be mitigated by using the approach DHH detailed recently? http://37signals.com/svn/posts/3113-how-key-based-cache-expi...
That is, incorporating `updated_at` into the cache key, and using `touch: true` on your AR relations to make sure that caches of affected parent objects get expired too?
Are there other complications I may not have run into yet?
Imagine you have User has_many Comments. When someone's username is updated, you might need to update all the user's cached html comment fragments to include the new username.
However, ActiveRecord doesn't support touch on has_many relations. (It probably shouldn't, as updating the username would mean updating/touching thousands (or more) comment rows).
Also, if you have to update the database outside of ActiveRecord for any reason, you could be screwed - the cache would become out of sync with the database.
The post is actually easy to read and understand. Funny thing is, in the last couple of weeks, there has been a lot of material (podcasts, posts, screencast, etc) on the subject. But it didn't really take off until 37signals mentioned it.
I wrote it in May. It was well received then, but it has gotten more play now since the 37 signals post on caching. IMO my guide is the definitive guide to the caching system in Rails.
I believe this is the first time it has made it to HN. I've always gotten very good feedback on it. Thanks for your comments.
New Relic has a series of videos on scaling Rails, several of which are about caching. They're a couple of years old but most of the concepts and method calls are still valid.
My Rails sites spend 85% of the time generating HTML, the other 15% of the time is spent communicating with the db and other external services. And it's not easy to figure out where the slowness is. When I've ran performance profiles in the past, something like 50% of the time was spent in GC.