Is AI Getting Worse? Model Collapse, Synthetic Data, and How to Maintain Quality in 2025

Explore concerns about AI quality decline due to model collapse and synthetic data, and learn strategies to maintain performance in 2025.

Hide Me

Written By

Joshua
Reading time
» 6 minute read 🤓
Share this

Unlock exclusive content ✨

Just enter your email address below to get access to subscriber only content.
Join 137 others ⬇️
Written By
Joshua
READING TIME
» 6 minute read 🤓

Un-hide left column

AI is deteriorating in realtime: what the Reddit debate gets right (and wrong)

A widely shared Reddit thread argues that AI is getting worse in realtime, pointing to “model collapse”, synthetic training data, and rising hallucinations. It links to serious sources – a Nature paper on recursive training, Epoch AI’s analysis of data limits, and commentary from Gary Marcus and Ed Zitron – alongside industry notes from OpenAI and Gartner.

So, is AI actually deteriorating? The short answer: quality is more fragile than many expected, but decline isn’t inevitable. The bigger story is how we train, evaluate, and deploy these systems in 2025.

Read the original Reddit post for the full list of sources.

What “model collapse” means – and why it matters

Model collapse is a failure mode where models trained on outputs from other models (rather than human-created data) lose diversity and factual fidelity over time. In LLMs, that shows up as blander phrasing, overconfident errors, and missing edge cases. The risk compounds if synthetic content floods the web and gets scraped into the next training run.

“AI models collapse when trained on recursively generated data.”

That’s the title and core finding of the Nature paper by Shumailov et al. (July 2024), which models how errors and biases amplify when training repeatedly on model-generated distributions. It’s not a reason to panic, but it is a warning: data provenance now directly affects model quality.

What the cited evidence actually shows

1) Training on synthetic data can damage quality if you’re careless

The Shumailov et al. paper (Nature) offers a theoretical and empirical basis for collapse when models consume their own outputs recursively. Key takeaway: feedback loops matter. Without curation, deduplication, and guardrails, quality drifts.

2) High-quality human data is finite

Epoch AI’s “Will We Run Out of Data?” argues that we’ll hit limits on public, high-quality human-generated text for large-scale training. This doesn’t mean “no data”, but it does mean the easy wins from web-scale scraping are tapering off.

3) Hallucinations remain a live issue in 2025

The Reddit post references OpenAI’s April 2025 o3/o4-mini system card and a “PersonQA” hallucination benchmark. Details are not disclosed in the thread, but the thrust is clear: even with frontier models, grounded reasoning about people and facts is brittle without retrieval or tools.

4) Synthetic data isn’t always bad – but it’s domain-sensitive

Self-play and synthetic curricula have been massively successful in formal domains (AlphaZero for chess/Go; AlphaGeometry for Olympiad-level geometry). These tasks have clear rules and objective outcomes. Natural language is messier: subjective, open-ended, and prone to compounding small errors – which is where collapse risk bites.

5) Industry signals and sentiment

Gartner has forecast that synthetic data could account for a majority share of training corpora by 2024 (a bold projection; methodology not disclosed here). The Duke University student survey (January 2025) suggests growing mainstream use and trust, but also concern about reliability. Commentators like Gary Marcus and Ed Zitron argue that quality and business models are under strain.

Is AI getting worse – or are we seeing the limits of scaling?

Both can be true. Some users report perceived regressions in popular chatbots. That can stem from:

  • Training shifts to reduce harmful outputs (safety tuning) that throttle creativity or specificity.
  • Context-window and caching tricks that change behaviour across turns.
  • Cost-driven deployment choices (smaller models, aggressive decoding settings) that trade accuracy for latency or price.
  • Evaluation drift: we’re asking harder, more specialised questions than we did in 2023.

None of this proves universal decline, but it does show why sustained quality requires better data hygiene, retrieval, and evaluation – not just bigger models.

Synthetic data at scale: risks, uses, and how to do it safely

Synthetic data is model-generated text, code, images, or labels used for training or fine-tuning. It’s attractive when human data is scarce, sensitive, or expensive. The risks arise when synthetic overwhelms the human core and introduces undetected biases or factual drift.

Where it works well:

  • Structured or formal domains with verifiable outcomes (compilers, maths proofs, games, some code tasks).
  • Data augmentation for rare classes, safety red-teaming, or style control.

Where it’s riskier:

  • Open-ended knowledge tasks without retrieval.
  • Long-horizon reasoning where small factual errors snowball.

Maintaining AI quality in 2025: practical steps for teams

1) Guard your data supply chain

  • Track provenance: label human vs synthetic, source licences, and collection dates.
  • Enforce synthetic-to-real ratios; oversample trusted human data.
  • Deduplicate aggressively to avoid echo-chambers of model paraphrases.

2) Use RAG for facts, not just bigger models

RAG (retrieval-augmented generation) fetches trusted documents at query time and grounds responses in them. It reduces hallucinations, supports citations, and is easier to audit for regulated use cases.

3) Build gold test sets and measure often

  • Create domain-specific evaluation suites with exact-match, citation, and reasoning checks.
  • Track regressions across model or prompt updates; freeze decoding settings for comparability.
  • Score hallucinations explicitly (e.g., entity-level factuality) and reward abstention when uncertain.

4) Prefer tools and orchestration over single-shot prompts

  • Use function calling, calculators, and search rather than forcing the model to “improvise”.
  • Chain-of-thought can help, but verify intermediate steps with tools where possible.

5) Curate synthetic data – don’t just generate and hope

  • Use multi-model cross-checking or human review for synthetic labels.
  • Watermark internal synthetic data where feasible to prevent re-ingestion in the wild.
  • Periodically refresh from human, high-quality sources to avoid drift.

Implications for UK organisations

Compliance and privacy. UK GDPR and the Data Protection Act 2018 require lawful bases for processing and strong data governance. If your models are trained or fine-tuned on personal data, provenance and minimisation matter. Retrieval logs can also be personal data – treat them accordingly.

Procurement and assurance. Ask vendors about training data sources, synthetic proportions, deduplication, and evaluation methods. For regulated industries, require clear documentation and the ability to reproduce key benchmarks on your data.

Cost and sustainability. Chasing scale without quality controls wastes compute – and water and energy. For a practical explainer on data centre cooling and water use, see my breakdown of AI’s water cycle.

Talent and process. Hire for data engineering and evaluation, not just prompt flair. Treat your knowledge base and test sets as core assets.

Where this conversation is headed

  • More hybrid stacks: smaller, faster models fronted by retrieval and tools, with large models for escalation.
  • Data-centric research: debiasing, selective mixing of synthetic with human data, and robust deduplication.
  • Transparent system cards: clearer reporting on hallucination benchmarks and dataset composition (currently not disclosed in many releases).

Further reading from the Reddit thread

Bottom line

AI isn’t doomed, but naïve scaling is. Quality now hinges on disciplined data pipelines, retrieval, and rigorous evaluation. If your organisation depends on reliable AI in 2025, invest in your data supply chain first – models will come and go, but your corpus and tests are leverage that lasts.

Last Updated

May 24, 2026

Category
Views
0
Likes
0

You might also enjoy 🔍

Minimalist digital graphic with a pink background, featuring 'AI' in white capital letters at the center and the 'Joshua Thompson' logo positioned below.
Author picture
Andrej Karpathy joins Anthropic, a move that matters for the AI landscape in 2025.
Minimalist digital graphic with a pink background, featuring 'AI' in white capital letters at the center and the 'Joshua Thompson' logo positioned below.
Author picture
Microsoft cancels internal Anthropic licences, revealing the real cost of token-based AI and offering budgeting strategies for 2025.

Comments 💭

Leave a Comment 💬

No links or spam, all comments are checked.

First Name *
Surname
Comment *
No links or spam - will be automatically not approved.

Got an article to share?