Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not to take away from the substance of the article itself, but is anyone else surprised that they have 2 billion "documents", which presumably means active ads/listings? That seems like an awful lot.


MongoDB is being used for historical archiving, not for the live site itself. The big reason being that changing table schemas for very large sets of old data is painful with MySQL. So the 2 billion number would be any ad/listing older than a set amount of time.

The live data is < 1 TB and is still stored in MySQL.


Exactly.

The "set amount of time" typically hovers around 60 days, though our archiving process has been off for several months while the migration took place. So we have some catching up to do--somewhere in the neighborhood of 150 million postings, last I counted.


I've been hearing some good things about Riak lately and their masterless implementation seems quite interesting. Did Riak ever make your radar and, if so, what were the disadvantages that made you choose MongoDB?

Were I to guess based on the video, I would say lack of a Perl client and you'd probably end up having to roll too many of your own solutions on top of it?


I would have expected more, personally. Craigslist is massive, popular, and been around a long time. That ads up to a TON of listings.


~2.2 billion is a ton of listings. What you have to realize is that the craigslist wasn't in hundreds of cities on day #1. In recent years, we've had tens of millions of "live" ads on the site, but it took a bit of time to grow to that size.


This looks like a data warehousing of the archive. The two billion listings probably represents all expired ads ever. There is no way they have 2 billion active ads at any one time.


The above comment is correct.

The archive does have to be accessed by users though, since users can access listings from many years back.

The entire archive seems to be under 4 TB from what he described in the video (2 billion documents at 2 kilobytes each). They do not retain photos.


Yup. You hit the nail on the head.


How much photo data do you handle? How long do you keep it?


The photos are removed once the posting is no longer live on the site (roughly). As for how many, I'd have to dig a bit to find that out...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: