Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yep, in-house PDF generators should be some sort of good middle ground, but I dunno if this 'weasyprint' is open source, is _lean_ open source? (no c++, java, etc).

When dealing with an ultra-complex file format which cannot be dodged, usually a good way to deal with it is to only use a very simple but coherent subset and enforce this usage with validation tools.

For instance, the web, noscript/basic (x)html (or you are jailed in the 2.5 web engines of the whatng cartel).

With PDF, I dunno much of the format (since I did not manage to download easily the specs), but when I have to print some text, I have a very small PDF generator for that (written ~25 years ago, so no utf-8 for me).

But what's important: such attempt must be sided with re-assessing the pertinence of the usage of the information systems, and yes, it will annoying and much less comfy and that MUST be acknowledged before even trying.

And big tech is not the only one trying hard to do vendor and developer lock-in.



Hi, WeasyPrint/pydyf dev here!

> usually a good way to deal with it is to only use a very simple but coherent subset and enforce this usage with validation tools

You’re right, that’s exactly what we do. We support a growing subset of HTML and CSS that’s documented. We also use the W3C testing suite for HTML/CSS, and PDF validators, on top of custom unit tests.

> And big tech is not the only one trying hard to do vendor and developer lock-in.

We "only" follow open specifications and refuse vendor-specific features to avoid lock-ins (equivalent closed-source tools love that). And we even love the other open-source "concurrents": ♥ to Paged.js and Vivliostyle, try them, they’re great too!


"Open" is not enough anymore: it also has to be lean, stable in time, and able to do a good enough _pertinent_ (can be very subjective) job (and in the case of software, that includes the SDK, for instance if some c++ or similar are around, it should be excluded de-facto for obvious reasons).

It is _EXTREMELY_ hard to justify an honnest and permanent income writing software... REALLY HARD.


How about typst, do you not consider that competition??


You can learn more about weasyprint on their website (https://weasyprint.org/ ). It's an open source Python package that can be launched using cli or from Python code. It uses pypdf, which is "pydyf is a low-level PDF generator written in Python and based on PDF specification 1.7" (from their README at https://github.com/CourtBouillon/pydyf ).


Compile a minimal python interpreter with tinycc &| cproc &| scc, run this pydyf and you should be good to go :)

Hopefully, its API a C API bridge for interop.

But pydyf pretends to go up to PDF 1.7: this is kind of arrogant due to the file format complexity.

That's why such tools are not enough: what's important is to evaluate and to assess a subset of the PDF format, that to reduce significantly the technical cost of ownership and exit cost, and maybe use such tools to write also validation tools in order to enforce the usage of that subset of PDF.

Very often, complex file formats (open or not) end up being generated and consumed by one program.

A warning: big tech and its minions will fight super hard everything that is simple, stable in table and does a good enough job (like noscript/basic (x)html for nearly all online services as they were working a few years back).


What on earth is "lean open source"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: