Advanced caching in Rails

joevandyk · on Feb 20, 2012

All the time spent talking about caching in Rails avoids the real issue: generating complex html in Ruby/Rails is really slow.

My Rails sites spend 85% of the time generating HTML, the other 15% of the time is spent communicating with the db and other external services. And it's not easy to figure out where the slowness is. When I've ran performance profiles in the past, something like 50% of the time was spent in GC.

pcowans · on Feb 20, 2012

From experience I have to agree with both parts of this - it can be very slow and it's generally not that easy to profile, e.g. the call graph tends to get very complex and therefore difficult to interpret, and as you say, GC can be a big part of the problem which doesn't help.

One thing worth mentioning: anecdotally I've found that ActiveRecord can have an impact on performance far beyond the cost of the database queries, both by directly adding a reasonable amount of overhead and by instantiating a lot of objects internally which in turn triggers a lot of garbage collection. I wouldn't be surprised if that accounts for a fair chunk of what you're seeing.

pcowans · on Feb 20, 2012

I should also add that addressing performance issues with caching can seriously impact the maintainability of your application, particularly if you end up with complex cache expiry rules and/or systems to pre-populate cache entries. I'd definitely say that the priority should be to first simplify your app as much as possible, then optimise and only when that doesn't work to use caching.

joevandyk · on Feb 20, 2012

Agreed 100%. I've worked hard to remove all forms of caching from my sites. It simplifies things tremendously.

I've found that database views and functions are one of the easiest ways to improve performance in Rails.

FOR EXAMPLE

To generate user's information shown above each comment for https://img.skitch.com/20120220-pununfygpsaw1cw5gmjin8e95i.p..., I have to get the user's username, the total amount of "points" the user has, the user's profile image, if the user is an admin (admins are formatted differently), etc. This information is stored across around 5 or 6 tables. Enter this view (which calls a few db functions):

  CREATE VIEW user_profile_info                                                                     
  AS                                                                                                
    SELECT users.id                     AS user_id,                                                 
           users.slug                   AS user_slug,                                               
           users.username,                                                                          
           user_stats.total_points,                                                                 
           user_profile_image(users.id) AS profile_image,                                           
           is_user_admin(users.id)      AS admin                                                    
    FROM   (users                                                                                   
            JOIN user_stats                                                                         
              ON (( user_stats.user_id = users.id )));

Now, I can have a simple UserProfileInfo ActiveRecord class that wraps this user_profile_info database view.

Then I can do:

  @object.comments.includes(:user_profile_info)

and, very efficiently, I get a list of comments and all the user's information.

If I didn't use this approach, I would have to have a complex caching scheme to avoid the multiple sql queries. The goal is to minimize the amount of data that comes over the wire via sql queries (which also reduces the amount of work ActiveRecord has to do to construct these objects in memory).

BTW, I'm starting to write a book about using postgresql effectively with web applications. I've found that there are tons of web developers (especially in Rails) that don't use the full-power of postgresql correctly which leads to slow and buggy code.

ruckusing · on Feb 20, 2012

I'd be very interested in purchasing this book. Please keep us updated when its available.

adman65 · on Feb 20, 2012

Hello, I'm the author. You can ask me respond to me here if you like.

freedrull · on Feb 20, 2012

I use most of these methods for caching. I have my own little methods for making cache keys. I use Rails.cache all over the place. I have a couple sweepers. What I really struggle is testing this stuff. You really can't do it anywhere other than an integration test, and they're ugly. I mostly just test the sweepers. Anyone have any nice solutions for testing caching, or think its unnecessary?

joevandyk · on Feb 20, 2012

I would try to remove as much caching as possible. It is complicated and hard to test.

johnkchow · on Feb 20, 2012

I've never had a production Rails project, but do you have any anecdotes or numbers to back up your statement? From initial glance, it looks like the win in processing speed outweights the complexity. So then you're implying that this win isn't required. I'm assuming that scaling out is cheaper than the complexity cost of implementing/maintaining such a caching solution?

nfm · on Feb 20, 2012

I haven't had a lot of experience with caching, and I agree with your comment above about database views, but wouldn't the complications be mitigated by using the approach DHH detailed recently? http://37signals.com/svn/posts/3113-how-key-based-cache-expi...

That is, incorporating `updated_at` into the cache key, and using `touch: true` on your AR relations to make sure that caches of affected parent objects get expired too?

Are there other complications I may not have run into yet?

joevandyk · on Feb 20, 2012

Imagine you have User has_many Comments. When someone's username is updated, you might need to update all the user's cached html comment fragments to include the new username.

However, ActiveRecord doesn't support touch on has_many relations. (It probably shouldn't, as updating the username would mean updating/touching thousands (or more) comment rows).

Also, if you have to update the database outside of ActiveRecord for any reason, you could be screwed - the cache would become out of sync with the database.

nfm · on Feb 21, 2012

Cheers, a helpful and extremely likely case!

cicloid · on Feb 20, 2012

The post is actually easy to read and understand. Funny thing is, in the last couple of weeks, there has been a lot of material (podcasts, posts, screencast, etc) on the subject. But it didn't really take off until 37signals mentioned it.

adman65 · on Feb 20, 2012

I wrote it in May. It was well received then, but it has gotten more play now since the 37 signals post on caching. IMO my guide is the definitive guide to the caching system in Rails.

I believe this is the first time it has made it to HN. I've always gotten very good feedback on it. Thanks for your comments.

aoe · on Feb 20, 2012

Is there any screencast other than the Railscasts one?

tmcdonald · on Feb 20, 2012

New Relic has a series of videos on scaling Rails, several of which are about caching. They're a couple of years old but most of the concepts and method calls are still valid.

http://railslab.newrelic.com/scaling-rails

seanp2k2 · on Feb 20, 2012

"worthless applications" like a news site or blog? Yeah, those seem to be pretty unpopular eyeroll

cmer · on Feb 20, 2012

This is the best article on Rails caching I have ever read. Highly recommended.