Can AI infringe the copyright of existing works?

The analogy

Think of a cook who learned the trade by tasting thousands of other people's dishes. As long as they cook their own recipes, inspired but different, no one says anything. The trouble arises at two points. If, to learn, they stole the recipes from cookbooks locked in a safe, they committed a wrong upstream. And if they serve at the table a dish identical to the specialty of the restaurant next door, the copy is visible on the plate, not in their memory.

Generative AI works like this. It has "tasted" enormous quantities of texts and images to learn. Two distinct responsibilities: how it learned (a problem for whoever trained it) and what you serve at the table (your problem when you publish). Keeping them separate is the only way to understand where the company's risk ends and yours begins.

How it really works

Copyright upstream: the training

The models learn by ingesting large quantities of works. When those works are protected and used without a license, the legal question arises. In 2025-2026 the topic exploded in court: over seventy infringement lawsuits against AI companies. The heaviest closed with a 1.5-billion-dollar settlement in the case involving Anthropic, where the judge held that ingesting whole, pirated works to train a commercial model went well beyond fair use, the lawful use without permission provided for by U.S. law. Other lawsuits are still open, including the one by the New York Times against OpenAI, which argues that the model can reproduce entire articles almost word for word.

This is the companies' level. You don't control it. But it tells you something useful: the provenance of the training data is contested ground, and the results can contain recognizable traces of real works.

Copyright downstream: your output

Here you come into play. A model can return you something that resembles an existing work too closely: a passage almost identical, an image that traces a signed style or a protected character, a text that reproduces passages of an article. If you publish that result, the infringement becomes yours, regardless of how the company behaved upstream.

The distance between "inspired" and "copied" isn't decided by the AI. It's decided by whoever looks at the final result. That's why the check before publishing isn't optional.

Why the risk isn't theoretical

The New York Times case arises precisely from the almost literal reproduction. If a model can return you an entire article, it can also return you a protected paragraph without warning you that it is one. You paste it in believing it's original. The damage comes later, when someone recognizes their own text in yours.

What you can do in practice

For recognizable texts, do a plagiarism check. Before publishing a generated passage, paste the most distinctive sentences into a search engine in quotation marks: if they appear identical on an existing site, rewrite them.
For images, use reverse image search. Upload the output to a visual search engine and check that it isn't tracing a protected work or character.
Avoid asking for "in the style of" a living, signed artist. The marked style of an identifiable author is the riskiest zone: the result tends to copy instead of being inspired.
For commercial projects, choose tools with declared training data. Some services declare they trained only on licensed or public-domain material: they reduce the risk upstream and often offer contractual guarantees.
Keep proof of your review. Save the checks you did. If one day you contest good faith, demonstrating you verified makes the difference.

A common misunderstanding

"If the AI generated it, the responsibility is the AI's, not mine." False, and dangerous. The AI isn't a legal subject: it answers for nothing. Whoever publishes answers. The moment you take an output and put it online, in a book, in a campaign, it's you who introduces that content into the world. If it infringes someone else's work, the lawsuit comes to you. The tool doesn't act as your shield.

Frequently asked questions

If the AI gives me a text that copies an article, whose fault is it?

On the practical plane, the fault of publishing it is yours, because it's you who introduces it. The company that trained the model has its own separate lawsuits, but it doesn't protect you. Verify the most distinctive passages before using them.

Can I ask the AI to write "like" a famous author?

Imitating a generic tone is a tolerable gray area. Tracing the signed and recognizable style of a living author, or reproducing their sentences and characters, is the zone where litigation arises. The more identifiable the author, the higher the risk.

Do the tools promising "copyright-risk-free content" really deliver?

Some offer contractual guarantees or an indemnity: if you're sued over their output, they cover you. It's a real protection but with conditions and limits. Read what the indemnity covers before trusting the slogan: it often applies only under certain conditions of use.

Quick answer