Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
From hell to HTML: releasing a Python package to easily work with Wikimedia HTM (wikimedia.org)
11 points by todsacerdoti on April 5, 2023 | hide | past | favorite | 1 comment


If anyone's interested in an approach to processing the data set quickly, I got something working and wrote it up when I was curious about turning the content into structured data for database tables.

https://feder001.com/exploring-wikipedia-as-a-database-part-...

https://feder001.com/exploring-wikipedia-as-a-database-part-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: