Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Assuming the maximum length of 140 characters, this means they're storing about 18 GB per day"

That's a huge underestimation. A tweet isn't 140 characters - it's 140 characters plus a huge chunk of surrounding metadata and indexes (who tweeted, when they tweeted, where they tweeted from, was it a reply, did it mention anyone, did it include any hash tags, did it link to anything, was it a retweet, its unique ID, how many users was it delivered to...) - all massively denormalised for performance reasons. See http://www.scribd.com/doc/30146338/map-of-a-tweet for an idea of the data involved.

Then there's the fact that a reference to each tweet has to be written in to the "inbox" of every user that receives it - so if Tim O'Reilly says something a reference to that tweet gets written 1,452,801 times, once for each of his followers.

On top of that, there's all of the associated stats collection, including link click tracking and a ton of data around who is doing what in the Twitter interface.

This article from last year suggests that Twitter were storing 8TB/day back in October, and it's only going to have gone up since then: http://techcrunch.com/2010/09/17/twitter-seeing-6-billion-ap...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: