I'm not the greatest Musk fan but IMHO his approach to charge those who benefits from Twitter is spot on and I'm actually rooting for him to be able to find a viable business model which does not rely on selling my attention to highest bidder.
If you are going to influence people, pay for the reach and If you are going to mine data, pay for the data. I guess the exact pricing can be adjusted according to the market needs but I agree with the paid access approach.
Academics don't resell the data to others. In fact, their existing agreements with Twitter requires their published datasets (for reproducibility) to be anonymized precisely to ensure they don't become a commercial goldmine.
Given that most of the article is about the pricing tiers for academic use, based on marketing communications to universities, your comment seems strangely indifferent to the context of this news. These proposed costs are unaffordable precisely because academics are not running a business around the data. If the article were about enterprise data sales, your point would make sense.
As you say, Academics are pretty used to paying for access to data, services, material, etc., but $42k-per-month for limited access to Twitter sounds more like a "fuck off" price than anything else.
How many researchers can and will pay $42k per month for access? What's the market size here? Is this anything more than a drop in the bucket for Twitter?
doesn't have to be free but with every increase there will be less research that can afford to pay for the data and with the proposed pricing of $500.000 for 0.3% of tweets it seems that no-one will be willing to pay the price
Except if I want to buy some piece of academic research, they sell articles for as high at $60. Why aren’t academics complaining about the absurd costs for the public accessing their information?
Academics don't make money from academic publishing. In fact, they often have to pay exorbitant review fees to journals. There have been many, many HN threads about this part of the the publishing industry.
Most academics (at least in Sweden). Get a lot of their articles from their University Library. Scihub is also an option if needed. If those options aren't possible then either request the library to buy it or to do it themselves. Besides it is way less than $43k.
Even then a lot of people are against the high cost
They do? Every single academic in the country supports open-access, lobbies their institutions to pay for the costs of open access. Every researcher will send you a copy of their article if you are paywalled and want to read it. And you, like all academics, know about Sci-Hub, so you should do what most academics do and use Sci-Hub to pirate the article to begin with.
When I was in college we used it to try and try a sentiment analysis model since they are notoriously bad at detecting sarcasm and Twitter was full of sarcasm.
We also used the API to try and determine the most impacted areas after a natural disaster. Basically it would use the model we trained to try and read tweets of people that needed help or people tweeting about severe damage and group them by their coordinates.
The first one I agree isn’t really positive since it is just using other people’s data to train a model, but the second one could’ve been a useful tool to help EMS during a natural disaster.
sigh this is just straight up wrong, I was an RA that worked on a real time social media analytics software. We were able to pick up on things like likely covid infections sites etc.
Academia has flooded the literature with >10,000 research papers based on the Twitter API feed. Virtually none of it is reproducible, it's frequently based on circular logic, the methodologies are unscientific and the conclusions are usually deeply partisan, but it nonetheless gets amplified by the media as "proof" of various false claims.
Count me in the camp of people who is happy Musk is doing this. I've been writing for years about the plague of "social bot" research coming out of academia that's based on the Twitter API:
Maybe your specific work on COVID was good, but it was certainly drowned out by the work that was sharply net negative for both society and science. Academic institutions were clearly never going to get the problem under control, so booting them out whilst allowing search engines and the like to continue accessing the feed seems like a good solution.
This is absurd; you're throwing the baby out with the bathwater. Certainly, it is easy to find social science papers with terrible methodology that use the Twitter API, or that build on the sand of papers with terrible methodology.
But you conclude from that that all academic use of the Twitter API is garbage, which is nonsensical, and that preventing academics from studying Twitter at scale is the ideal solution. Your hyperbolic language (here and in your two medium articles, which I read thoroughly, along with the SSRN paper you cited*) does nothing for your own credibility.
The main 'methodology' of the SSRN paper is combing through other papers' datasets, contacting some of the identified 'bot' accounts, and establishing that they're operated by real people; the accounts as misidentified as bots when in reality the account operators were just aggressively quote-tweeting by using copy & paste to spread (eg) political or Qanon messages 200 times an hour. The authors point out that by really making an effort, Twitter users can tweet spam up to 25 times a minute, with no bots in sight! While the authors are quite correct to point out that people can be misidentified as bots, this completely ignores the fact of the unwanted spamming behavior. Pointing out the scientific flaws of 'tools' like Botometer is wholly valid, but the effort to research and develop tools for bot identification are a response to the fact of systematic information pollution, and most papers that try to address this issue are careful to offer caveats and qualifications about the limitations of their methods. It is not the fault of academics if media pundits over-simplify the fruits of their research.
Here are some examples of high quality research using data from Twitter:
1. Ephemeral Astroturfing Attacks: The Case of Fake Twitter Trends
A good start! It makes relatively limited claims (they aren't trying to assert whole elections are being distorted by Twitter bots) and is indeed higher quality than the ones I've been citing. It actually makes its data available, which is a step forward. But it's had limited impact (28 citations), and it's also not particularly useful. All they're doing is revealing that there is ordinary spam, hijacking and SEO on Turkish Twitter, which was never in doubt. All social media sites have these problems and the authors were tipped off by some amateur third party that highlights these campaigns. Most of what they find is plain commercial spam, there's also some politics in there related to local Turkish issues like cab drivers protesting against Uber but there's no evidence presented that this is actually having a real impact on politics.
The main question here is why are universities spending grant money on subsidizing Twitter? The only people who can do anything with this paper are Twitter's spam team, there isn't generalizable new scientific knowledge coming out of it.
2. Political Astroturfing on Twitter: How to Coordinate a Disinformation Campaign
This one starts with a big claim, so it can at least say it's doing important research. But I really wonder why you suggested it because it actually agrees with us and even destroys the underlying premise of the entire field! A pretty useful paper that might be worth citing in future articles on the topic, in fact.
Firstly, their conclusion is that "if even a powerful and well-financed organization like the South Korean secret service cannot instigate a successful disinformation campaign, then this may be more difficult than often assumed in public debates". In other words, the supposed problem motivating this entire field of >10,000 papers doesn't actually exist: even government agencies fail to have impact when they try to sway opinions with Twitter.
Secondly, they accept that our criticisms of the field are correct. "We argue that past research’s predominant focus on automated accounts, famously known as “social bots” ... misses its target since reports on recent astroturfing campaigns suggest that they are often at least partially run by actual humans" and "Because a ground truth is rarely available, systematic research into astroturfing campaigns is lacking".
They also dunk on ML models on page three, and admit that "these studies still largely focus on anecdotes and lack a theory-driven framework" i.e. are more like blog posts than scientific research. These were all points being made by Gallwitz, Kreil and myself years ago.
The paper does have issues! Still, they should get some cred for being honest about their findings, albeit on the penultimate page of a 25 page study. The first sentence of the paper is phrased in a misleading way: they assert that astroturfing on Twitter has the potential to influence politics, but their conclusion is that it actually doesn't. That's a problem that you see a lot when reading papers in some fields.
Paper 3. QAnon Propaganda on Twitter as Information Warfare.
Note that this paper also isn't about bots. It's a complaint about the behavior of real American people. Where is the actual science? Why are you picking this as an example of high quality research? It's not only blatantly partisan, reading more like a Guardian op-ed than a research paper, it starts by citing paper (2), the one that wrecks the whole premise of the field! They are happy to cite it as evidence that they should look for astroturfing instead of bots, but forget to mention that it shows that even an intelligence agency was unable to have any impact on politics by running Twitter campaigns. Yet that doesn't stop them asserting that their line of research is important due to the "innovative misuse of social media towards undermining democratic processes by promotion of magical thinking".
This sort of problem is rampant in published research. I've seen it so often that a paper cites another paper which directly undermines the conclusions of the first, yet the authors don't address or even mention it. This sort of thing is just deceptive. If they want to cite paper (2) then they need to tackle its conclusion.
The rest of it is just US Democratic Russiagate talking points. Getting into the accuracy of that is a book-sized job and and not about science, so I won't do that here, there are many such debates on the internet.
So that's your three papers. One is OK but not very valuable, one ends up (unintentionally?) wrecking the premise of the other ~10,000+ papers and one isn't even scientific research. It's unclear how they were picked but if these are really the best examples of high quality research from the field then, indeed, who really cares if Musk cuts it all off.
I'll try and find time to look at the papers you cite as high quality later today.
> you conclude from that that all academic use of the Twitter API is garbage, which is nonsensical
"All" no, a vast amount of it, yes. Is it nonsensical? Twitter themselves concluded this exact same thing even before Musk, both in public blog posts and internal emails (see the Twitter Files for examples).
But we don't really need to cite Twitter as an authority here. Just try to answer this question: what mechanisms exist that are stopping bad science outside the field of social bot research, and why have those mechanisms failed within it? It can't be peer review, university hiring committees and so on because those are all existing within social studies as well.
> Your hyperbolic language ...
What language do you think is hyperbolic, exactly, and why?
> Pointing out the scientific flaws of 'tools' like Botometer is wholly valid, but the effort to research and develop tools for bot identification are a response to the fact of systematic information pollution
This is exactly the sort of problem I'm talking about: this justification is circular. We do bad bot research because we know there are bots, we know there are bots because we do bad bot research. If there were actually big problems with social bots then it would be easy to find them and research them; we wouldn't see this situation where basically all papers are seeing patterns in noise.
Botometer is a good example of that. You admit that it's "scientifically flawed" but with respect, that language is not "hyperbolic" enough. It's not merely flawed, it's outright useless. It had an FP rate of 50% when tested against a known human dataset. Yet the Botometer paper has been cited over 900 times now (up from ~700 when I previously wrote about it). When exactly does the rest of the world get to call time on this bad behavior by the academy? These people are changing the opinions of world leaders on the back of misinformation, the exact problem they claim to be fighting.
> It is not the fault of academics if media pundits over-simplify the fruits of their research.
It wasn't media pundits that made academics cite the Botometer paper over 900 times, or write outright deceptive papers like the one I reviewed. The problem here is academia and the institutions need to start taking responsibility for it. Otherwise you're going to get situations like this one: academia will just get cut off from data. People don't have time to try and figure out which little subsections of the academy are following the rules to separate them from the rest.
There is literally only couple of US universities having that, for smaller universities 42k a month for a research or two doesn't make any financial sense at all. This price is just basically a huge gatekeeper to prevent most people using it.
Indiana University a Midwest US state school has a 3+ Billion Dollar endowment behind it and ranks 16th right now. University of Texas is #1 at $42B and Berkeley is 20th at $2.6B. These are just state schools. Stanford is at $38B and Yale is at $42B. There is plenty of money out there in the university endowments. They just need to spend it on things like research and professors rather than Golf Courses and sports stadiums.
It is for Twitter. Our attention is a synonym for showing ads. If advertisers step away because they got nervous about the behavior of the new owner then it's better for Twitter to have other sources of income than not.
The free users remain being the product though, don't we? We are the reach and the mined so the company can sell that but at least maybe there's a chance of not being interrupted.
Ideally, everyone would pay to use the service and nothing would be mined for manipulation but that world is hard to imagine in 2023.
> When does Twitter start paying it's users who produce the data?
Never.
Having a business in the capitalist system is about maximising profits.
Musk spent ~$44bn USD or so to buy Twitter (and tried to back out of the deal too). Do you really think Twitter is gonna fairly compensate any of the users any time soon?
You’d be better off migrating to Mastodon. Maybe some instance in that ecosystem will figure out how to use crypto for good, and to compensate its content creators.
>I'm not the greatest Musk fan but IMHO his approach to charge those who benefits from Twitter is spot on and I'm actually rooting for him to be able to find a viable business model which does not rely on selling my attention to highest bidder.
If there is money to be made, he's not going to pass it up. Why not charge and sell your attention to the highest bidder at the same time? It's the literal cable model and it's proven to work for 50 years.
The difficulty is, people can still scrape the data. That data scraping is likely to cost Twitter more than the API did, as they have to serve up the full page.
Yes, you can try to block people doing that, but historically people haven't succeeded.
scrapping data at scale is much harder that you are making it sounds like. Especially is the company is trying hard to prevent that. Much cheaper just to pay for the api.
LinkedIn tries hard to prevent scraping, but there are third parties doing it, and then re-selling it. Each user is presumably paying a fraction of what the scraping cost.
Fully agree, I just wish I knew who these people are, because clearly he has looked at some data that suggests they'll pay up, or his hosting costs will be significantly lowered.
If drive everyone important off for good with a ridiculous price before “the market adjusts” (Musk changes his mind) you’ve done permanent damage to your operation.
you're opposed to selling your attention to the highest bidder but you think it's a good business model to sell all of "your data" to any "market rate" bidder?
If you are going to influence people, pay for the reach and If you are going to mine data, pay for the data. I guess the exact pricing can be adjusted according to the market needs but I agree with the paid access approach.