Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How to deal with "unlearning" is the problem of the org running the illegal models. If I have submitted a gdpr deletion request you better honor it. If it turns out you stole copyrighted content you should get punished for that. No one cares how much it might cost you to retrain your models. You put yourself in that situation to begin with.


Exactly, I think is where it leads to eventually. And that is what I my original comment meant as well. "Delete it" rather than using some more techniques to "unlearn it", unless you claim the unlearning is 100% accurate.


> No one cares how much it might cost you to retrain your models.

Playing tough? But it's misguided. "No one cares how much it might cost you to fix the damn internet"

If you wanted to retro-fix facts, even if that could be achieved on a trained model, it would still get back by way of RAG or web search. But we don't ask pure LLMs for facts and news unless we are stupid.

If someone wanted to pirate a content it would be easier to use Google search or torrents than generative AI. It would be faster, cheaper and higher quality. AIs move slow, are expensive, rate limited and lossy. AI providers have in-built checks to prevent copyright infringement.

If someone wanted to build something dangerous, it would be easier to hire a specialist than to chatGPT their way into it. All LLMs know is also on Google Search. Achieve security by cleaning the internet first.

The answer to all AI data issues - PII, Copyright, Dangerous Information - is coming back to the issue of Google search offering links to it, and websites hosting this information online. You can't fix AI without fixing the internet.


What do you mean playing tough? These are existing laws that should be enforced. The amount of people's lives ruined by the American government because they were deemed copyright infringers is insane. The us has made it clear that copyright infringement is unacceptable.

We now have a new class of criminals infringing on copyright on a grand scale via their models and they seem desperate to avoid persecution hence all this bullshit.


1. You are assuming just training a model on copyrighted material is a violation. It is not. It may be under certain conditions but not by default.

2. Why should we aim for harsh punitive punishments just because it was done so in the past?


> 1. You are assuming just training a model on copyrighted material is a violation. It is not. It may be under certain conditions but not by default.

Using copyrighted content for commercial purposes should be a violation if it's not already considered to be one. No different from playing copyrighted songs in your restaurant without paying a licensing fee.

> 2. Why should we aim for harsh punitive punishments just because it was done so in the past?

I'd be fine with abolishing, or overhauling, the copyright system. This rules with harsh penalties for consumers/small companies but not for bigtech double standard is bullshit, though.


> Using copyrighted content for commercial purposes should be a violation

so reading a book and using the book contents to help you in your job would be a violation too based on your logic


A business cannot read a book, and your machine learning model is not given human rights.


> A business cannot read a book

Assume the human read the book as part of their job. Is that using copyrighted material for commercial purposes?

If that doesn't count then I'm not sure why you brought up "commercial purposes" at all.

> This rules with harsh penalties for consumers/small companies but not for bigtech double standard is bullshit, though.

Consumers and small companies get away with small copyright violations all the time. And still bigger than having your image be one of millions in a training set.


> Assume the human

Humans have rights. They get to do things that businesses, and machine learning models, or general automation, don't.

Just like you can sit in a library and tell people the contents of books when they ask, but if you go ahead and upload everything you get bullied into suicide by the US government[1]

> Consumers and small companies get away with small copyright violations all the time

Yeah, because people don't notice so they don't care. Everyone knows what these bigtech criminals are doing.

[1] https://en.wikipedia.org/wiki/Aaron_Swartz


> Humans have rights. They get to do things that businesses, and machine learning models, or general automation, don't.

So is that a yes to my question?

If humans are allowed to do it for commercial purposes, and it's entirely about human versus machine, then why did you say "Using copyrighted content for commercial purposes should be a violation" in the first place?

> Just like you can sit in a library and tell people the contents of books when they ask,

You know there a huge difference between describing a book and uploading the entire contents verbatim, right?

If "tell the contents" means reading the book out loud, that becomes illegal as soon as enough people are listening to make it a public performance.

> but if you go ahead and upload everything you get bullied into suicide by the US government[1]

They did that to a human... So I've totally lost track of what your point is now.


> and it's entirely about human versus machine

It's not. Those were what's called examples. There is of course more to it. Stop trying to pigeonhole a complex discussion onto a few talking points. There are many reasons why what OpenAI did is bad, and I gave you a few examples.


I'm not trying to be reductive or nitpick your example, I was trying to understand your original statement and I still don't understand it.

There's a reason I keep asking a very generic "why did you bring it up", it's because I'm not trying to pigeonhole.

But if it's not worth explaining at this point and the conversation should be over, that's okay.


A business is... made of people.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: