Hacker Newsnew | past | comments | ask | show | jobs | submit | NoraCodes's commentslogin

Do you think that the use of a hammer is an innate skill, and that woodworkers learn nothing from their craft?

Okay, so let's say the use of a coding agent isn't an innate skill, so the author was gaining experience with the tool.

You - and many other commentors in this thread - misunderstand the legal theory under which AI companies operate. In their view, training their models is allowed under fair use, which means it does not trigger copyright-based licenses at all. You cannot dissuade them with a license.

While I think OP is shortsighted in their desire for an “open source only for permitted use cases” license, it is entirely possible that training will be found to not be fair use, and/or that making and retaining copies for training purposes is not fair use.

Perhaps you can’t dissuade AI companies today, but it is possible that the courts will do so in the future.

But honestly it’s hard for me to care. I do not think the world would be better if “open source except for militaries” or “open source except for people who eat meat” license became commonplace.


The problem are "viral" licences. Must the code generated by an AI trained with GPL code be released with a GPL licence?

Also, can an AI be trained with the leaked source of Windows(R)(C)(TM)?


> Also, can an AI be trained with the leaked source of Windows(R)(C)(TM)?

I think you mean to ask the question "what are the consequences of such extreme and gross violations of copyright?"

Because they've already done it. The question is now only ... what is the punishment, if any? The GPL requires that all materials used to produce a derivative work that is published, made available, performed, etc. is made available at cost.

Does anyone who has a patch in the Linux kernel and can get ChatGPT to reproduce their patch (ie. every linux kernel contributor) get access to all of OpenAIs training materials? Ditto for Anthropic, Alphabet, ...

As people keep pointing out when defending copyright here: these AI training companies consciously chose to include that data, at the cost of respecting the "contract" that is the license.

And if they don't have to respect licenses, then if I run old Disney movies through a matrix and publish the results (let's say the identity matrix)? How about 3 matrices with some nonlinearities? Where is the limit?

Since copyright law cannot be retroactively changed, any update congress makes to copyright wouldn't affect the outcome for at least a year ...


Open source except for people who have downvoted any of my comments.

I agree with you though. I get sad when I see people abuse the Commons that everyone contributes to, and I understand that some people want to stop contributing to the Commons when they see that. I just disagree - we benefit more from a flourishing Commons, even if there are free loaders, even if there are exploiters etc.


Of course, if the code wasn't available in the first place, the AI wouldn't be able to read it.

It wouldn't qualify as "open source", but I wonder if OP could have some sort of EULA (or maybe it would be considered an NDA). Something to the effect of "by reading this source code, you agree not to use it as training data for any AI system or model."

And then something to make it viral. "You further agree not to allow others to read or redistribute this source code unless they agree to the same terms."


My understanding is that you can have such an agreement (basically a kind of NDA) -- but if courts ruled that AI training is fair use, it could never be a copyright violation, only a violation of that contract. Contract violations can only receive economy damages, not the massive statutory penalties that copyright does.

Having a license that specifically disallows a legally dubious behavior could make lawsuits much easier in the future, however. (And might also incentivize lawyers to recommend avoiding this code for LLM training in the first place.)

People think that code is loaded into a model, like a massive available array of "copy+paste" snippets.

It's understandable that people think this, but it is incorrect.

As an aside, Anthropic's training was ruled fair use, except the books they pirated.


Fair use is a defense to copyright violation, but highly dependent on the circumstances in which it happens. There certainly is no blanket "fair use for AI everything".

This is quite literally the opposite of the tragedy of the commons.

I mean, I'll probably ditch the LLM - after all, it's open source so I can just build my own app to receive the messages - but it seems like a neat bit of kit.


Presumably it's more like an errant Ctrl-C.


Yup exactly this. Also Ctrl-W, alt tab, etc.


All these issues having been solved already in kiosk setups.


This article makes a distinction between "TV and radio" and "digital devices". I wonder how much of the gap between how much older generations say they get news from the latter category is became younger people are more likely to understand the actual meaning of those words? Most TVs are indeed digital devices!


Why do AI companies get to do whatever they want in order to meet their business goals ("liftoff")?


Because a lot of people who are not white are born in Britain, making them native Brits?


[flagged]


There have been people of West African descent in England since the early medieval period.


[flagged]


How can you possibly define "native Brit" in a way that includes, say, Saxons, but excludes West Africans who immigrated in the 1200s, without simply saying "people with light skin"? Or is 1066 your cutoff? Why?


The invocation of Gregory of Nyssa, for one.


I dunno, I associate Gregory of Nyssa with the unequivocal rejection of slavery.


it seems very clear to me that the inclusion of a singular religious reference does not justify labelling an entire excerpt as having religious over/undertones ... not sure what im missing


How come ActivityPub gets shit for making people pick a server, but nobody complains about the mythical "average user" who is supposedly incapable of figuring anything out on their own when ATProto services ask them to understand DNS?

(It's because ATProto services targeted at average users are effectively centralized, which means everyone else has to put up with whatever Bluesky says or lose their access to the bulk of the network.)


Bluesky doesn’t ask them to understand DNS, it just gives them a free subdomain to start with. This isn’t very different from how Gmail gives you a gmail.com address. But you can also move it to your own domain later and obviously it’s possible to build user-friendly interfaces for that.


Right, that's exactly my point; Bluesky provides a centralized alternative to using your own domain name. (And moving from a did:plc to something decentralized is no easier than moving from mastodon.social or similar large instance to a smaller one!)

To be clear, I actually think it's a good idea to let people associate their own domains with their accounts, but I find it frustrating that people act like ATProto is the only or first example of open social protocols, as this TFA does.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: