Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From what I understood, the case against OpenAI wasn't about the summarisation. It was the fact that the AI was trained on copyrighted work. In case of Wikipedia, the assumption is that someone purchased the book, read it, and then summarised it.


There are separate issues.

One is a large volume of pirated content used to train models.

Another is models reproducing copyrighted materials when given prompts.

In other words there's the input issue and the output issue and those two issues are separate.


They’re sort of separate. In a sense you could say that the ChatGPT model is a lossily compressed version of its training corpus. We acknowledge that a jpeg of a copyrighted image is a violation. If the model can recite Harry Potter word for word, even imperfectly, this is evidence that the model itself is an encoding of the book (among other things).

You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc, but a transformer model is not human, and very philosophically and economically importantly, human brains can’t be copied and scaled.


They're very separate in terms of what seems to have happened in this case. This lawsuit isn't about memory or LLMs being archival/compression software (imho, a very far reach) or anything like that. The plaintiffs took a bit of text that was generated by ChatGPT and accused OpenAI of violating their IP rights, using the output as proof. As far as I understand, the method at which ChatGPT arrived to the output or how Game of Thrones is "stored" within it is irrelevant, the authors allege that the output text itself is infringing regardless of circumstance and therefore OpenAI should pay up. If it's eventually found that the short summary is indeed infringing on the copyright of the full work, there is absolutely nothing preventing the authors (or someone else who could later refer to this case) from suing someone else who wrote a similar summary, with or without the use of AI.


> You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc

Also worth noting that, if a person performs a copyrighted work from memory - like a poem, a play, or a piece of music - that can still be a copyright violation. "I didn't copy anything, I just memorized it" isn't the get-out-of-jail-free card some people think it is.


A jpeg of a copyrighted image can be copyright infringement, but isn't necessarily. A trained model can be copyright infringement, but isn't necessarily. A human reciting poetry can be copyright infringement, but isn't necessarily.

The means of reproduction are immaterial; what matters is whether a specific use is permitted or not. That a reproduction of a work is found to be infringing in one context doesn't mean it is always infringing in all contexts; conversely, that a reproduction is considered fair use doesn't mean all uses of that reproduction will be considered fair.


I would guess that if there were a court case where a poet sued someone commercially that is for pay(say tickets specifically for it) reciting his poetry they might very well win. So reciting poetry probably could be copy right infringement at certain scale.

And as AI companies are commercial entities. I would lean towards direction where they doing it in general, even if not for repeating specific works, it could be infringement too.


That doesn't really make sense . Just because you purchased a book, does not mean the copyright goes away (for new works based on the book. For the physical book you bought, the doctrinevof first sale gives you some rights but only in that specific physical copy ). If openAI pirated material, that would be a separate issue from if the output of the LLM is infringing.


I think we have no evidence someone bought the book and summarized. And what if an ai bought the book and summarized, is it fine now?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: