I fucked with the title a bit. What i linked to was actually a mastodon post linking to an actual thing. but in my defense, i found it because cory doctorow boosted it, so, in a way, i am providing the original source here.
please argue. please do not remove.
It isn’t fair use, See most of faq @ fairuse faq.
“Fair Use” is often the subject of discussion when talking about online copyright with regards to online video content or music sampling, but it’s notably a flawed defense as it generally has no legal definition for how much of certain content can be used or referenced. The very first line of that faq has the following note:
How do I get permission to use somebody else’s work?
You can ask for it. If you know who the copyright owner is, you may contact the owner directly. If you are not certain about the ownership or have other related questions, you may wish to request that the Copyright Office conduct a search of its records or you may search yourself. See the next question for more details.All artists / writers and others are asking LLM model producers to do is a) Ask for permission or B) Attribute the artists work in some kind of ledger, respecting the copyright of their work. Every work you make (write/play/draw/whatever) has a copyright that should be respected by companies and are not waived by EULA or TOS (ever) and must be respected in order for author attribution as a concept to work at all. There is plenty of free, permissive copyrighted content on the internet that can be used instead to train an LLM, but simply asking for permission or giving attribution would at least be a step in the right direction for these companies and for the industry as a whole.
Defenders of AI will note that the “use” of art in LLM is limited and thus protected by fair use, but that is debatable based on the content of the above listed FAQ.
How much of someone else’s work can I use without getting permission?
Under the fair use doctrine of the U.S. copyright statute, it is permissible to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports. There are no legal rules permitting the use of a specific number of words, a certain number of musical notes, or percentage of a work. Whether a particular use qualifies as fair use depends on all the circumstances. See, Fair Use Index, and Circular 21, Reproductions of Copyrighted Works by Educators and Librarians.You can see that the use cases above (commentary, criticism, news reporting and scholarly reports) does not qualify LLM companies to use or train their models with copyrighted data for privatized industry. Additionally, you’ll note that “market disruptive” uses cannot be protected by fair use in it’s definition, meaning that displacing artists with AI automatically makes LLM use of copyrighted material an infraction of copyright that is not protected by the fair use clause.
Regardless, this will need to be proved in court and even if it passes certain criteria, it will not apply to all infractions. Fair use is a defense, not a protection, and thus LLM producers will have to spend time in court in order to defend individual infractions. There’s no way for them to catch all copyright infringement with one ruling, it needs to be proved on a case-by-case basis.
IANAL but this is my 2 cents on the matter.
Selling an AI model (or usage of that model) that allows for producing works that are clearly based upon those copyrighted works and would be considered copyright infringement if a person did the same thing is not fair use.
If a person creating the same thing as generative AI would be infringing, then it isn’t magically not infringing because it is on the internet or done by a program. Basically, AI needs to follow the same rules and restrictions as a person would. That does mean that the AI also needs to be trained to not create copyright infringing works if the use of the AI is being sold.
As a downloadable model that anyone can use at no cost? Sure, whatever is fine. Then it is on the person who uses it and tries to infringe. But if someone pays a company to use their AI to create infringing work, that is on the company and they are just as at fault as if they sold T shirts that infringed on copyright.
if someone pays a company to use their AI to create infringing work, that is on the company and they are just as at fault as if they sold T shirts that infringed on copyright.
wrong.
Selling an AI model (or usage of that model) that allows for producing works that are clearly based upon those copyrighted works and would be considered copyright infringement if a person did the same thing is not fair use
it is.
I think you might want to elaborate
instead of making 4 replies in 3 minutes
each averaging
2.75 wordsinstead of making 4 replies in 3 minutes
each averaging
2.75 words
this is irrelevant to the truth of my claim.
Yes, but at the time I wrote my reply there was no truth replied by you, only what can be summarized as “no”.
I presented exactly as much justification for my claims as the people to whom I was responding.
Again, right now, yes, but when I wrote that, no.
wrong.
I don’t see how selling a model or the use of a model infringes on a specific copyright. whose copyright has been infringed? how can you prove that? take AI out of the question. if you wanted to prove that some other author has infringed the copyright on your novel, how would you do that? if you want to prove that some quote unquote artist has infringed on your copyright, how would you do that? if any of your methods for proving that a person has infringed on your copyright is applicable to an AI, then that’s what that is. but if you can’t prove it, if the AI just learned about how style works, if an AI just saw your work but never actually copied it, then it’s not infringing.
If a person creating the same thing as generative AI would be infringing, then it isn’t magically not infringing because it is on the internet or done by a program
no one is arguing otherwise.
That does mean that the AI also needs to be trained to not create copyright infringing works if the use of the AI is being sold.
no it doesn’t.
I think we should have a rule that says if a LLM company invokes fair use on the training inputs then the outputs are public domain.
That’s already been ruled on once.
A recent lawsuit challenged the human-authorship requirement in the context of works purportedly “authored” by AI. In June 2022, Stephen Thaler sued the Copyright Office for denying his application to register a visual artwork that he claims was authored “autonomously” by an AI program called the Creativity Machine. Dr. Thaler argued that human authorship is not required by the Copyright Act. On August 18, 2023, a federal district court granted summary judgment in favor of the Copyright Office. The court held that “human authorship is an essential part of a valid copyright claim,” reasoning that only human authors need copyright as an incentive to create works. Dr. Thaler has stated that he plans to appeal the decision.
Why would companies care about copyright of the output? The value is in the tool to create it. The whole issue to me revolves around the AI company profiting on it’s service. A service built on a massive library of copyrighted works. It seems clear to me, a large portion of their revenue should go equally to the owners of the works in their database.
You can still copyright AI works, you just can’t name an AI as the author.
That’s just saying you can claim copyright if you lie about authorship. The problem then is, you may step into the realm of fraud.
You don’t have to lie about authorship. You should read the guidance.
Well, what you initially said sounded like fraud, but the incredibly long page indeed doesn’t talk about fraud. However, it also seems a bit vague. What counts as your contributions to the work? Is it part of the input the model was trained on, “I wrote the prompt”, or making additionally changes based on the result?
The vagueness surrounding contributions is particularly troubling. Without clearer guidelines, this seems like a recipe for lawsuits.
deleted by creator
in the ethical sense, everything is fair use. period.
in the legal sense, everything is fair use until it’s proven in court not to be.
deleted by creator
if anybody gets a copy of it, they have no ethical obligation not to share it, and every ethical justification for sharing it.
deleted by creator
this reads like an appeal to ridicule. if you have an objection to what I said please state it.
Every web request costs someone money. If you aren’t paying them you are being provided a service. They’ve given you knowledge/ material in their possession free of charge. You are taking advantage of that good will by using the content for purposes not intended. That is a moral failing.
To be clear the ownership of the material is not important, just the access is immoral, as the harm is already done.
Ill add the caveat that it can be moral if they’ve specifically told you you can via the websites robot.txt file which websites of consequence all have. But the assumption has to be they don’t intend this because that is how consent works.
the assumption has to be they don’t intend this
why? if someone publishes something on port 80, why should I ever assume they mean anything but for me to have and use that data?
deleted by creator
an appeal to ridicule is also called a horse laugh fallacy. it’s like writing lol instead of actually explaining what’s wrong with the position to which your objecting. this response also reads like an appeal to ridicule. if you can’t explain what’s wrong with my position, maybe you shouldn’t be speaking about my position.
I totally agree.
Copyright and patent laws need to die.
What constitutes fair use?
17 U.S.C. § 107
Notwithstanding the provisions of sections 17 U.S.C. § 106 and 17 U.S.C. § 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.
GenAI training, at least regarding art, is neither criticism, comment, news reporting scholarship, nor research.
AI training is not done by scientists but engineers of a corporative entity with a long term profit goal.
So, by elimination, we can conclude that none of the purposes covered by the fair use doctrine apply to Generative AI training.
Q.E.D.
it is pretty obviously scholarship and research
It is pretty obviously Research and Development of a commercial product in many cases. Not fair use.
there is no stipulation that the research must be non-profit.
Here’s another good one: https://www.eff.org/deeplinks/2023/04/how-we-think-about-copyright-and-ai-art-0
Agreed. I would also argue that trained model weights are not copyrightable.
They aren’t.
Courts have already ruled that copyright requires human creation, and weights are not decided by humans but by the training algorithms.
I didn’t know it was already settled law. But in that case, why are models like llama still released under licenses? If they are non-copyrightable, licenses should be unenforceable and therefore irrelevant.