> Thanks to Qwerly integration, when you look at a Twitter @username - Mediasift sees more than just the Twitter profile. It sees @username who has bookmarks saved online, plans for public events to attend, photos shared publicly with friends, check-ins to places around town and much more. Any of those are like columns in a spreadsheet of Twitter search results. Show me Tweets with any of the following keywords by people planning to attend event A and who have been to place B or C. Thanks to Qwerly, Twitter didn't just get a giant new developer search and filter feature - it got integration with a whole lot of other social services.
Thank God I use different pseudonyms on the internet. That just gives me the willies.
---
On another note, this may come at a very opportune time when the Obama 2012 is kicking into gear.
I don't see a problem with it. If you use twitter, then you clearly don't have a problem living your life in the open and the price of being an attention whore is getting attention. For those of us who don't need to play-by-play our every thought and action in 140 character narration throughout the day to our imagined hoards of clingy worshippers, it won't be a problem. We likely don't use twitter or we don't use the same identity as we do elsewhere. I mean, as much as I'd kind of like to have all of my identities tied together, I don't like the potential problems that may draw. Therefore, I use a different identity at HN than I do at Slashdot which is different than on LinkedIn which is different than on Amazon, which is different than on Steam, which is different than on XBOX Live.
Yeah, people like you and I are still open to having any anonymity data-mined out of us through aggregate manipulation -- but at least it's a simple layer of abstraction.
In the meantime, if someone can get rich using the wealth of public information that every vapid college girl posting a thousand twitpics a night from her cell at the club puts up online, then more power to them.
Despite being "dirt cheap" for potential clients such as companies, I find the 30 cent barrier an awkward price point since people who will want the data are probably willing to pay more, and it definitely bars the casual developer from access to social data.
There is always the garden hose sample feed from Twitter. Or, use this service for short test intervals.
Seems like data hackers are being taken care of.
Also, if you want a lot of social media cheaply, check out the sample web app for Google Buzz that runs on AppEngine. I ran it last summer with some of my own filters. I could run it about 5 hours a day before I hit the limit of a free AppEngine account - so it would not cost too much to pay to keep a derivative of this example program running 24x7.
> until end users pony up for the service, then they're the product.
Exactly. It should come as no surprise when any free site that accumulates a massive amount of data turns around and starts selling that data -- even if users feel like their privacy is being violated.
Twitter really has nothing to sell but data. Same for Facebook and others. They can sell that data indirectly (by allowing targeted advertising) or directly (by selling massive blocks of data for $0.30 an hour), but nobody should be surprised when it happens; it's all that they have to sell.
Actually, God has told us how many tweets they're hosting, and tweets are mercifully short. As of about a month ago, Twitter says it gets about 140 million tweets per day. Assuming the maximum length of 140 characters, this means they're storing about 18 GB per day. At Amazon S3 rates (which are considerably higher than what Twitter pays if they have a working brain anywhere in their corporate structure), that means that their storage costs increase by about $1/day. After five years of storage at that rate, their monthly storage costs (again, at S3 rates) would be around $2000. If they're making less than $2000 per month with that wealth of data, nickel-and-diming developers is a drastically misguided underreaction.
I'm not pretending this is all it takes to run Twitter, but I'd be surprised if storing a few TB a year is a major cost center. (Serving up so many concurrent users seems like a much bigger and more expensive problem — that's an average of 1600 tweets per second, to say nothing of readers, and I suspect tweet rates are very lumpy.)
"Assuming the maximum length of 140 characters, this means they're storing about 18 GB per day"
That's a huge underestimation. A tweet isn't 140 characters - it's 140 characters plus a huge chunk of surrounding metadata and indexes (who tweeted, when they tweeted, where they tweeted from, was it a reply, did it mention anyone, did it include any hash tags, did it link to anything, was it a retweet, its unique ID, how many users was it delivered to...) - all massively denormalised for performance reasons. See http://www.scribd.com/doc/30146338/map-of-a-tweet for an idea of the data involved.
Then there's the fact that a reference to each tweet has to be written in to the "inbox" of every user that receives it - so if Tim O'Reilly says something a reference to that tweet gets written 1,452,801 times, once for each of his followers.
On top of that, there's all of the associated stats collection, including link click tracking and a ton of data around who is doing what in the Twitter interface.
I think storage is not just the tweets. It's also the meta data, especially the ReTweets. Considering that content dies down pretty quickly after they're posted, I would imagine that caching is extremely important. Caching infrastructure would require a lot of memory, for which they would have to pay for RAM, which costs a lot more.
this is what is known as a straw-man argument, and it does nothing to move the conversation forward.
Clearly storing just the 'tweet' contents alone would be unhelpful because what about the username or any of the other 40+ metadata point a tweet carries.
What about keeping the mechanisms needed to store, sort, search, send those tweets, etc etc. I could go on.
Also, what were you expecting - Twitter to their business at-cost?
As far as I can tell, the comment I was replying to was about Twitter's storage costs. So the fact that my response focused on storage and how much it costs does not make a strawman, which would involve arguing against something other than what I was replying to. Moreover, the point of my comment was that Twitter's storage costs are not the interesting part of their operation, so your question "Were you expecting … Twitter to run their business at-cost?" actually is a straw man. You're just repeating what I said, except with slightly less hard data.
True enough. For the record, that was just a tongue-in-cheek phrase that I used. You're right; the bandwidth, machine time, and manpower probably make up the bulk of their costs.
Among other factors, this will probably increase the degree to which your twitter username represents your identity online. I'm not saying that's good or bad, just predicting it'll happen.
Why don't they just charge businesses to use the site? And charge relative to their number of employees or revenue. All on the honor system - you get a badge by your company name when you pay.
Can someone explain why I can't register the 10k most popular keywords and then resell arbitrary subsets of that stream to interested entities at pennies on the... nickel and a quarter?
Thank God I use different pseudonyms on the internet. That just gives me the willies.
---
On another note, this may come at a very opportune time when the Obama 2012 is kicking into gear.