Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The simplest of which would be to turn the images into a multi page raster PDF, using freely licensed linux based command line tools for PDF generation. Which will of course result in a rather large file size vs doing OCR, but might be the best preservation method for books with illustrations, unusual fonts, catalogs, mixed text and photos, etc.

I am not clear on to what extent the existing workflow does a de-skew of the camera images to deal with page curvature towards the spine.

I think I recall the Internet Archive having an open source design for something similar to this? And other projects which accomplish generally the same idea.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: