It wouldn't work since we were allowed to manipulate certain aspects of text, to...

berkes · on Jan 26, 2022

You could even just move a few words around, depending on the language. Depending on the language, you could even just move a few words around.

Do that randomly, combine the products, and you might get enough entropy to create unique fingerprints for each download. Randomly do that, combine the products, and you might get enough entropy to create unique fingerprints for each download.

(This silly example can create 4 unique fingerprints)

Wowfunhappy · on Jan 26, 2022

When I write, I put a great deal of thought into how to arrange sentences for maximum clarity or effectiveness. I would not appreciate an eBook service messing with that, even if the meaning was unchanged.

In the most extreme case, imagine if this was a book of poetry.

viraptor · on Jan 26, 2022

For PDF you can do this in a much more subtle way. In a typical block of text every individual letter comes with its own kerning adjustment. You can adjust those in a way that's invisible to the reader but still allows fingerprinting. There's probably 1000 different options too - don't think of moving words as in swapping positions in a sentence. (I know parent suggested it, but that's silly)

matheusmoreira · on Jan 26, 2022

These probably wouldn't survive extraction of the pure text, would they?

out_of_protocol · on Jan 26, 2022

Replacing characters with identical-looking unicode chars, adding extra spaces here and there, adding newlines (and more spaces :)), adding random typos, use dictionary with "safe" word/phrase replacements etc. And don't forget about formulas, charts etc - pure text version is not too useful on its own

hdjjhhvvhga · on Jan 26, 2022

If you deal with fiction and the like where you basically have just text then I think that's correct: it would be trivial to detect the watermarks in various copies by simply comparing them. I was dealing with PDFs containing tables, formulas, illustrations, etc., so a plain-text version would be unusable.

snovv_crash · on Jan 26, 2022

Randomly choose 3 big paragraphs in the entire ebook to add an extra newline in the middle of at the end of a random sentence. This would be my choice if I had to do some kind of invisible watermarking, at least.

hdjjhhvvhga · on Jan 26, 2022

This is one of the many things that could be trivially detected and fixed when you have multiple watermarked copies of the same file.

afandian · on Jan 26, 2022

Closer to home and a bit more extreme, a few transposed numbers in a scholarly article would be enough to rekindle another autism/vaccine conspiracy theory!

hdjjhhvvhga · on Jan 26, 2022

No, this would not work for a couple of reasons. Manipulating the content itself such as changing the order of words is very dangerous as it can influence the meaning, and if you process things at scale it could lead to devastating consequences. But there are many other aspects of text such as kerning and others (a dozen or so in this particular case) that are virtually invisible to the reader but are detectable by a machine. I'd prefer not to get into the details of the implementation here but of course a dedicated team with enough resources could successfully break it after some time - but I believe it wouldn't make any sense economically.

berkes · on Jan 31, 2022

> as kerning and others

Those can be "removed" by rendering to text and regenerating a PDF, though. Or even with print + scan + OCR.

Neither are trivial, but doable.

matheusmoreira · on Jan 26, 2022

This isn't a watermark though. This is giving different people different content. Watermarks don't change the content.

aembleton · on Jan 26, 2022

But Stenography does; just in ways that are imperceptible to humans.

matheusmoreira · on Jan 26, 2022

Moving words around is perceptible to humans though. It can even destroy the meaning of content.