Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Wiki for tools to scrape website data (scraperwiki.com)
155 points by araneae on Nov 28, 2010 | hide | past | favorite | 11 comments


http://theinfo.org/ is a good one as well (by Aaron Swartz)

get | process | view

get (http://theinfo.org/get/tools)

process (http://theinfo.org/process/tools)

view (http://theinfo.org/view/tools)


Previous discussion, with 39 comments:

http://news.ycombinator.com/item?id=1584597


If you have not yet started the tutorial, click on at least the first one. It's snazzy the FireBug-ish console action they have going on there, and the Bespin/Skywriter editor action, too.

If this doesn't show what the next generation of web-app looks like, I don't know what would. It remains to be seen, however, how that model holds up to "real" work - which is the same concern I had about Bespin/Skywriter.


The tutorial is certainly nice; however, I'm guessing that running the following snippet in the ruby tutorial:

  output = `cd \\etc && cat passwd`
  puts output
shouldn't actually be returning the contents of the passwd file.


I reported this to their staff, hopefully they will fix it soon.


they run as nobody:

puts `whoami`

>> nobody


Actually, I think that's just CodeMirror, not Bespin/SkyWriter. It's cool nonetheless, but personally, I would like it if they integrated it with node+jsdom (or even more awesome would be hooking it up to an actual webkit/mozilla instance).

Something like this for personal/private data would be interesting too, a system for people to take their data out of the "walled garden" websites, possibly a browser extension that runs it passively, copying information whenever a user views it, storing it locally, to enable true data portability.


I like that concept in theory (as a passive browser extension), but it might be wiser to store it in a user-connected cloud-based account for security reasons.


Interesting. Can it be used to post and request iMacros macros too?

If so, I can donate at least 10 web scraping macros right away.

PS: The software I refer to is http://wiki.imacros.net/Data_Extraction (open source and commercial web scraping browser addon)


Had a play with it and it's lots of fun. I scraped the Hacker News headlines. Is mucking about with HN the new 'Hello World'?

Saw the chaps behind it pitch at Software City in Liverpool this week. Good guys and lots of potential for use in journalism, government and beyond. I think its partly open source too which is always a bonus.


Nice project! You might cQuery useful for more complex content extraction:

http://cquery.com/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: