Twitter Announces Fire Hose Marketplace: Up to 10k Keyword Filters for 30 Cents

kmfrk · on April 4, 2011

> Thanks to Qwerly integration, when you look at a Twitter @username - Mediasift sees more than just the Twitter profile. It sees @username who has bookmarks saved online, plans for public events to attend, photos shared publicly with friends, check-ins to places around town and much more. Any of those are like columns in a spreadsheet of Twitter search results. Show me Tweets with any of the following keywords by people planning to attend event A and who have been to place B or C. Thanks to Qwerly, Twitter didn't just get a giant new developer search and filter feature - it got integration with a whole lot of other social services.

Thank God I use different pseudonyms on the internet. That just gives me the willies.

---

On another note, this may come at a very opportune time when the Obama 2012 is kicking into gear.

pstack · on April 5, 2011

I don't see a problem with it. If you use twitter, then you clearly don't have a problem living your life in the open and the price of being an attention whore is getting attention. For those of us who don't need to play-by-play our every thought and action in 140 character narration throughout the day to our imagined hoards of clingy worshippers, it won't be a problem. We likely don't use twitter or we don't use the same identity as we do elsewhere. I mean, as much as I'd kind of like to have all of my identities tied together, I don't like the potential problems that may draw. Therefore, I use a different identity at HN than I do at Slashdot which is different than on LinkedIn which is different than on Amazon, which is different than on Steam, which is different than on XBOX Live.

Yeah, people like you and I are still open to having any anonymity data-mined out of us through aggregate manipulation -- but at least it's a simple layer of abstraction.

In the meantime, if someone can get rich using the wealth of public information that every vapid college girl posting a thousand twitpics a night from her cell at the club puts up online, then more power to them.

mduvall · on April 4, 2011

Despite being "dirt cheap" for potential clients such as companies, I find the 30 cent barrier an awkward price point since people who will want the data are probably willing to pay more, and it definitely bars the casual developer from access to social data.

mark_l_watson · on April 4, 2011

There is always the garden hose sample feed from Twitter. Or, use this service for short test intervals.

Seems like data hackers are being taken care of.

Also, if you want a lot of social media cheaply, check out the sample web app for Google Buzz that runs on AppEngine. I ran it last summer with some of my own filters. I could run it about 5 hours a day before I hit the limit of a free AppEngine account - so it would not cost too much to pay to keep a derivative of this example program running 24x7.

dotBen · on April 4, 2011

30c... an hour

which is ((24x7x52)/12)x$0.3 = $218.40 a month

bigiain · on April 5, 2011

Or from another perspective...

140 million tweets per day[1] / 24hr / $0.30 = 19.5million tweets per dollar.

_Surely_ there's some valuable data to be gleaned out of 20million tweets?

[1] wildly assuming Techcruch's numbers are connected with reality - http://techcrunch.com/2011/03/14/new-twitter-stats-140m-twee...

PanMan · on April 5, 2011

From their pricing calculator it seems the deliver max 2k tweets per hour. Way less than your 6 million, and not that useful.

OstiaAntica · on April 5, 2011

This is a vastly better business model than the dickbar.

dmix · on April 5, 2011

Assuming other companies can figure out how to monetize the tweets and be able to keep paying to access the firehouse.

waterlesscloud · on April 5, 2011

I've always said the ultimate Twitter business model is to know what everyone in the world is thinking right now, and to sell that information.

jrockway · on April 4, 2011

Nice. Another service that I'm not going to give free data to anymore.

Splines · on April 4, 2011

I sort of have this feeling too, but Twitter is a business after all, and hosting god-knows-how-many twits can't be cheap.

Sure, they could do something else instead of selling data, but until end users pony up for the service, then they're the product.

slapshot · on April 5, 2011

> until end users pony up for the service, then they're the product.

Exactly. It should come as no surprise when any free site that accumulates a massive amount of data turns around and starts selling that data -- even if users feel like their privacy is being violated.

Twitter really has nothing to sell but data. Same for Facebook and others. They can sell that data indirectly (by allowing targeted advertising) or directly (by selling massive blocks of data for $0.30 an hour), but nobody should be surprised when it happens; it's all that they have to sell.

chc · on April 5, 2011

Actually, God has told us how many tweets they're hosting, and tweets are mercifully short. As of about a month ago, Twitter says it gets about 140 million tweets per day. Assuming the maximum length of 140 characters, this means they're storing about 18 GB per day. At Amazon S3 rates (which are considerably higher than what Twitter pays if they have a working brain anywhere in their corporate structure), that means that their storage costs increase by about $1/day. After five years of storage at that rate, their monthly storage costs (again, at S3 rates) would be around $2000. If they're making less than $2000 per month with that wealth of data, nickel-and-diming developers is a drastically misguided underreaction.

I'm not pretending this is all it takes to run Twitter, but I'd be surprised if storing a few TB a year is a major cost center. (Serving up so many concurrent users seems like a much bigger and more expensive problem — that's an average of 1600 tweets per second, to say nothing of readers, and I suspect tweet rates are very lumpy.)

simonw · on April 5, 2011

"Assuming the maximum length of 140 characters, this means they're storing about 18 GB per day"

That's a huge underestimation. A tweet isn't 140 characters - it's 140 characters plus a huge chunk of surrounding metadata and indexes (who tweeted, when they tweeted, where they tweeted from, was it a reply, did it mention anyone, did it include any hash tags, did it link to anything, was it a retweet, its unique ID, how many users was it delivered to...) - all massively denormalised for performance reasons. See http://www.scribd.com/doc/30146338/map-of-a-tweet for an idea of the data involved.

Then there's the fact that a reference to each tweet has to be written in to the "inbox" of every user that receives it - so if Tim O'Reilly says something a reference to that tweet gets written 1,452,801 times, once for each of his followers.

On top of that, there's all of the associated stats collection, including link click tracking and a ton of data around who is doing what in the Twitter interface.

This article from last year suggests that Twitter were storing 8TB/day back in October, and it's only going to have gone up since then: http://techcrunch.com/2010/09/17/twitter-seeing-6-billion-ap...

xtacy · on April 5, 2011

I think storage is not just the tweets. It's also the meta data, especially the ReTweets. Considering that content dies down pretty quickly after they're posted, I would imagine that caching is extremely important. Caching infrastructure would require a lot of memory, for which they would have to pay for RAM, which costs a lot more.

dotBen · on April 5, 2011

this is what is known as a straw-man argument, and it does nothing to move the conversation forward.

Clearly storing just the 'tweet' contents alone would be unhelpful because what about the username or any of the other 40+ metadata point a tweet carries.

What about keeping the mechanisms needed to store, sort, search, send those tweets, etc etc. I could go on.

Also, what were you expecting - Twitter to their business at-cost?

chc · on April 5, 2011

As far as I can tell, the comment I was replying to was about Twitter's storage costs. So the fact that my response focused on storage and how much it costs does not make a strawman, which would involve arguing against something other than what I was replying to. Moreover, the point of my comment was that Twitter's storage costs are not the interesting part of their operation, so your question "Were you expecting … Twitter to run their business at-cost?" actually is a straw man. You're just repeating what I said, except with slightly less hard data.

gbhn · on April 5, 2011

I heard on This Week in Tech that all the history they sent to Library of Congress was 4TB in size. So yes, the storage isn't the issue.

Splines · on April 5, 2011

True enough. For the record, that was just a tongue-in-cheek phrase that I used. You're right; the bandwidth, machine time, and manpower probably make up the bulk of their costs.

zackattack · on April 4, 2011

When are you going to send pg the bill for your participation here?

jrockway · on April 5, 2011

When he starts selling a list of locations where I take photos?

ekanes · on April 4, 2011

Among other factors, this will probably increase the degree to which your twitter username represents your identity online. I'm not saying that's good or bad, just predicting it'll happen.

hop · on April 5, 2011

Why don't they just charge businesses to use the site? And charge relative to their number of employees or revenue. All on the honor system - you get a badge by your company name when you pay.

waitwhatwhoa · on April 5, 2011

Can someone explain why I can't register the 10k most popular keywords and then resell arbitrary subsets of that stream to interested entities at pennies on the... nickel and a quarter?

flog · on April 5, 2011

Well, it's all in the TOS. You can't resell, and you can't display the tweets.

dbard · on April 5, 2011

Surprised it took Twitter this long to monetize this.

nivertech · on April 5, 2011

Twitter Announces Fire Hose Marketplace: Up to 10k Keyword Filters for 1/2 cent per minute