German court rules OpenAI violated copyright law by training on licensed music
A German court has ruled that OpenAI breached copyright law by training ChatGPT on licensed musical works without permission. The case was brought by GEMA, Germany’s music rights organisation, and the court ordered OpenAI to pay damages (amount not disclosed). OpenAI says it’s considering an appeal.
GEMA is calling this the first major AI copyright ruling in Europe. While it doesn’t automatically change practice in other countries, it raises the stakes for how AI companies source training data and the licensing obligations that may follow.
Source and discussion: TechCrunch report and the original Reddit thread.
What the ruling says about AI training data
The core dispute is whether AI companies can train models on copyrighted works without a licence. GEMA argued that copyright still applies even when the process is automated. The court agreed.
“The core issue is straightforward. OpenAI used copyrighted material to train its models without getting licenses or permission.”
OpenAI’s position has been that training on publicly available data falls under fair use or similar exceptions. The German court, at least in this instance and for licensed musical works, did not accept that reasoning.
Two bits of jargon, quickly decoded:
- Training data: the text, audio, images or code a model learns from to spot patterns and generate outputs.
- Scraping: automated collection of content from websites or platforms, often at scale.
Implications for the UK: fair dealing, text-and-data mining, and risk
This ruling doesn’t bind UK courts, but it will be read closely here. The UK framework differs from the US: we have fair dealing (specific, limited exceptions) rather than broad fair use. There is a UK exception for text and data mining (TDM), but it’s largely for non-commercial research. A proposed expansion to cover commercial use was shelved.
In practical terms: if you’re commercially training models in or for the UK market on copyrighted content, you should assume that licences may be required unless a clear exception applies. That includes music, books, journalism, images, and code repositories. Database rights can also bite, separate from copyright.
“You can’t just scrape our work to build a commercial product without paying for it.”
What UK teams should do now
- Audit your data sources: document where your training, fine-tuning, and evaluation data comes from. Avoid “misc web” without provenance.
- Prefer licensed or rights-cleared datasets: check terms carefully, including any opt-outs or usage restrictions.
- Use retrieval-augmented generation (RAG): RAG keeps copyrighted content outside the model weights. It fetches documents at query time and cites them, which is often lower risk than training. RAG = retrieve, then generate.
- Limit training scope: where possible, fine-tune on your own first-party data or material you’ve explicitly licensed.
- Vendor due diligence: ask model/API providers about their training sources, licensing, opt-out handling, and indemnities.
- Region-aware deployments: laws differ by country. Consider geofencing certain features or models by jurisdiction.
- Set up takedown and opt-out processes: respect rights-holder requests quickly and transparently.
Costs and operational impact
If licensing becomes the norm for training corpora, costs will rise and contracts will multiply. Expect:
- Higher model costs or new “licensed data” product tiers from vendors.
- More deals with collective management organisations (like GEMA) and publishers, each with their own terms.
- Shift towards smaller, domain-specific models trained on cleared data, plus RAG for proprietary sources.
- More emphasis on usage-based royalties and audit rights in contracts.
For many UK organisations, a practical path is to rely on established providers that can show licensing arrangements, and to use RAG over internal content you already own.
What’s still not disclosed
- Damages: the amount the court ordered OpenAI to pay hasn’t been disclosed.
- Scope: we don’t have detail on which specific works or datasets were at issue.
- Appeal timeline: OpenAI says it’s considering an appeal; timing and grounds are not disclosed.
- Precedent value: while notable, the ruling doesn’t set UK precedent. Future European cases may look to it, but outcomes can diverge by country and facts.
Balanced view: innovation versus creators’ rights
There’s a real tension here. Broad, web-scale training has accelerated AI progress. But creators are understandably pushing back if their livelihoods are subsumed into commercial models without consent or compensation.
The likely endgame is more licensing and opt-out infrastructure, clearer provenance standards, and model providers absorbing higher compliance costs. That could slow down some “move fast” experiments, but it also builds a more durable ecosystem where rights are respected and litigation risk is manageable.
If you’re building with ChatGPT today
Most UK businesses aren’t training frontier models. You’re integrating AI into workflows, connecting it to data, and shipping features. Focus on safe patterns (RAG, first-party data, clear licences) and vendor assurances. If you’re connecting ChatGPT to spreadsheets and operations, here’s a practical walkthrough:
How to connect ChatGPT and Google Sheets with a Custom GPT
Bottom line for UK readers
This German case is a signal: European courts may take a dim view of training on licensed works without permission. In the UK, fair dealing is narrow, and the commercial TDM exception didn’t land. If you’re training or fine-tuning models for commercial use, get serious about licences, provenance, and RAG.
This isn’t legal advice. If your product depends on large-scale training, speak to counsel early and budget for licensing.