Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Twitter's social graph is fairly exposed. You can start with a few trusted "role models" (i.e. you'd want to follow anyone they're following) and build a pretty sizable graph from there, with minimal pruning necessary.

I built a tool to recursively scrape RSS feeds from web pages linked to twitter bios. You pass in a "root" trusted user, and look in their bio and every "followee"'s bio for a website, then look for anything "rss" "feed" "atom" or "xml"-y on the link itself or in the domain's sitemap.xml.

Surprisingly very useful. There's a decent amount of value in twitter's content, but arguably much more value in the followee network of "smart people", and the websites "linked out" from their profiles and tweets.

Reddit, similar but in a different way, filters itself into variously useful, well-moderated communities. Top X posts of subreddits A, B, C is a great heuristic for getting 90% of the value out of reddit with very little of the toxicity.

You needn't limit yourself to r/all and the twitter equivalent! :)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: