It's the right move and authors + publishers should be rooting for it. Either your work lives in the corpus of human knowledge which AI's will increasingly reflect more perfectly over time or you're forgotten. You've also got precedent that they have to pay for access to your work like any other human.
As long as they're not violating copyright laws in output, it's fine and good.
Legally acquired copyrighted books.
> Despite siding with the AI company on fair use, Alsup wrote that Anthropic will still face trial for the pirated copies it used to create its massive central library of books used to train AI.
While this may seem like a big issue, it's really to be expected. Asking judges to rule on new technology applied to old laws is like asking a bus driver to design an energy efficient bus motor. Judges are technicians, not scientists. They apply the law, they can't think creatively about new laws. For that, we have experts (political think-tanks). And I'm sure political think-tanks are kicking into overdrive as they realize that ramifications of this ruling. This will have an impact, but it's hard to determine what that will be. To some degree, it will disincentivize writing books. If this ruling only applies to SELLING books, then some people will make their books subscription-only, and will test this law against that (do AI companies need a perpetual subscription if they've trained an AI on a subscription-only book?). If AI companies are the primary consumers of books, and everyone just gets their information from AI, then direct-to-consumer books will cease to be a thing, and authors will sell their books directly to AI companies for $100,000 or $1,000,000.
I don't think there should be many (if any at all) copyright restrictions on training. This doesn't give AIs the right to violoate copyright laws in its output. Plus stopping AIs from training may also stop just gathering metadata on materials like word frequency, genre or protaganist age.
Discussion (166 points, 1 day ago, 197 comments) https://news.ycombinator.com/item?id=44367850
This might lead to some creators and publishers silo-ing off valuable content in tightly controlled environments. Tightly controlled in terms of both DRM used to prevent screen/web scraping and potential contractual obligations restricting use for training if granted access.
Think Blu-Ray DRM but for more than video, it's already happened with publishers and college textbooks.
I say this as an author who has definitely had a lot of my work slurped up by these machine-learning goblins: This was the right call. I learned to write by reading other authors' works, so I'd have to be quite the hypocrite to stop others from learning from me. Still, it makes me sad and tired to know that I'm unwittingly training my own replacement--one that will never be sad or tired itself.
In my view, one real gray area is in image/video generation, especially "x in the style of y" kinds of shenanigans. As a society we may need to consider some better protections for an artist's/studio's style, otherwise distinct and novel and interesting styles will become watered down into a sea of bland mimicry until the sweet release of the heat death of the universe.
“Consistent with copyright’s purpose in enabling creativity and fostering scientific progress, ‘Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.’”
More smug nonsense emanating from Misanthropic. There is no creativity that is enabled. People tweak the prompts like children until something that was stolen from others emerges.
Most people working on "AI" have never created anything substantial. They rely on utilizing other people's creations. It is very sad that Alsup caves to big tech and issues a vibe ruling.
Huh. Us meatbags are not just artificial intelligences, we're organic intelligences and thus more important than robots (who are not alive; even if we make golems and infuse them with "life", they are not human animals), and all of us are in training throughout our lives, so this means training on copyrighted material is fair use.
Edit: I see another commenter, presumably human, clarified: "legally-acquired copyrighted books" Even with the arguments about AI being potentially helpful to disabled humans, one healthier route is to help each other out directly instead of dividing and conquering with technology, in the name of helping. Feels like one of the aims of Capitalists is to put us each into our Matrix (1999 movie) battery capsules and bleed us dry while we're distracted.
I remember the folks here who were dragging the Internet Archive for controlled digital lending, just trying to be a digital library, like it was infringing on authors by attempting to get back what was taken by publishers requiring libraries to buy licenses for ebooks that expired and had to be repurchased time and time again. “Universal access to all knowledge.”
Now, with judicial opinions on this fair use firming up, I am hopeful this will allow them to train on every book they’ve ever acquired and release those models to the world.