Is AI Getting Worse? Model Collapse, Synthetic Data, and How to Maintain Quality in 2025

Explore concerns about AI quality decline due to model collapse and synthetic data, and learn strategies to maintain performance in 2025.

24 May 2026by Joshua Thompson6 min read50 views

AI is deteriorating in realtime: what the Reddit debate gets right (and wrong)

A widely shared Reddit thread argues that AI is getting worse in realtime, pointing to “model collapse”, synthetic training data, and rising hallucinations. It links to serious sources – a Nature paper on recursive training, Epoch AI’s analysis of data limits, and commentary from Gary Marcus and Ed Zitron – alongside industry notes from OpenAI and Gartner.

So, is AI actually deteriorating? The short answer: quality is more fragile than many expected, but decline isn’t inevitable. The bigger story is how we train, evaluate, and deploy these systems in 2025.

Read the original Reddit post for the full list of sources.

What “model collapse” means – and why it matters

Model collapse is a failure mode where models trained on outputs from other models (rather than human-created data) lose diversity and factual fidelity over time. In LLMs, that shows up as blander phrasing, overconfident errors, and missing edge cases. The risk compounds if synthetic content floods the web and gets scraped into the next training run.

“AI models collapse when trained on recursively generated data.”

That’s the title and core finding of the Nature paper by Shumailov et al. (July 2024), which models how errors and biases amplify when training repeatedly on model-generated distributions. It’s not a reason to panic, but it is a warning: data provenance now directly affects model quality.

What the cited evidence actually shows

1) Training on synthetic data can damage quality if you’re careless

The Shumailov et al. paper (Nature) offers a theoretical and empirical basis for collapse when models consume their own outputs recursively. Key takeaway: feedback loops matter. Without curation, deduplication, and guardrails, quality drifts.

2) High-quality human data is finite

Epoch AI’s “Will We Run Out of Data?” argues that we’ll hit limits on public, high-quality human-generated text for large-scale training. This doesn’t mean “no data”, but it does mean the easy wins from web-scale scraping are tapering off.

3) Hallucinations remain a live issue in 2025

The Reddit post references OpenAI’s April 2025 o3/o4-mini system card and a “PersonQA” hallucination benchmark. Details are not disclosed in the thread, but the thrust is clear: even with frontier models, grounded reasoning about people and facts is brittle without retrieval or tools.

4) Synthetic data isn’t always bad – but it’s domain-sensitive

Self-play and synthetic curricula have been massively successful in formal domains (AlphaZero for chess/Go; AlphaGeometry for Olympiad-level geometry). These tasks have clear rules and objective outcomes. Natural language is messier: subjective, open-ended, and prone to compounding small errors – which is where collapse risk bites.

5) Industry signals and sentiment

Gartner has forecast that synthetic data could account for a majority share of training corpora by 2024 (a bold projection; methodology not disclosed here). The Duke University student survey (January 2025) suggests growing mainstream use and trust, but also concern about reliability. Commentators like Gary Marcus and Ed Zitron argue that quality and business models are under strain.

Is AI getting worse – or are we seeing the limits of scaling?

Both can be true. Some users report perceived regressions in popular chatbots. That can stem from:

Training shifts to reduce harmful outputs (safety tuning) that throttle creativity or specificity.
Context-window and caching tricks that change behaviour across turns.
Cost-driven deployment choices (smaller models, aggressive decoding settings) that trade accuracy for latency or price.
Evaluation drift: we’re asking harder, more specialised questions than we did in 2023.

None of this proves universal decline, but it does show why sustained quality requires better data hygiene, retrieval, and evaluation – not just bigger models.

Synthetic data at scale: risks, uses, and how to do it safely

Synthetic data is model-generated text, code, images, or labels used for training or fine-tuning. It’s attractive when human data is scarce, sensitive, or expensive. The risks arise when synthetic overwhelms the human core and introduces undetected biases or factual drift.

Where it works well:

Structured or formal domains with verifiable outcomes (compilers, maths proofs, games, some code tasks).
Data augmentation for rare classes, safety red-teaming, or style control.

Where it’s riskier:

Open-ended knowledge tasks without retrieval.
Long-horizon reasoning where small factual errors snowball.

Maintaining AI quality in 2025: practical steps for teams

1) Guard your data supply chain

Track provenance: label human vs synthetic, source licences, and collection dates.
Enforce synthetic-to-real ratios; oversample trusted human data.
Deduplicate aggressively to avoid echo-chambers of model paraphrases.

2) Use RAG for facts, not just bigger models

RAG (retrieval-augmented generation) fetches trusted documents at query time and grounds responses in them. It reduces hallucinations, supports citations, and is easier to audit for regulated use cases.

3) Build gold test sets and measure often

Create domain-specific evaluation suites with exact-match, citation, and reasoning checks.
Track regressions across model or prompt updates; freeze decoding settings for comparability.
Score hallucinations explicitly (e.g., entity-level factuality) and reward abstention when uncertain.

4) Prefer tools and orchestration over single-shot prompts

Use function calling, calculators, and search rather than forcing the model to “improvise”.
Chain-of-thought can help, but verify intermediate steps with tools where possible.

5) Curate synthetic data – don’t just generate and hope

Use multi-model cross-checking or human review for synthetic labels.
Watermark internal synthetic data where feasible to prevent re-ingestion in the wild.
Periodically refresh from human, high-quality sources to avoid drift.

Implications for UK organisations

Compliance and privacy. UK GDPR and the Data Protection Act 2018 require lawful bases for processing and strong data governance. If your models are trained or fine-tuned on personal data, provenance and minimisation matter. Retrieval logs can also be personal data – treat them accordingly.

Procurement and assurance. Ask vendors about training data sources, synthetic proportions, deduplication, and evaluation methods. For regulated industries, require clear documentation and the ability to reproduce key benchmarks on your data.

Cost and sustainability. Chasing scale without quality controls wastes compute – and water and energy. For a practical explainer on data centre cooling and water use, see my breakdown of AI’s water cycle.

Talent and process. Hire for data engineering and evaluation, not just prompt flair. Treat your knowledge base and test sets as core assets.

Where this conversation is headed

More hybrid stacks: smaller, faster models fronted by retrieval and tools, with large models for escalation.
Data-centric research: debiasing, selective mixing of synthetic with human data, and robust deduplication.
Transparent system cards: clearer reporting on hallucination benchmarks and dataset composition (currently not disclosed in many releases).

Bottom line

AI isn’t doomed, but naïve scaling is. Quality now hinges on disciplined data pipelines, retrieval, and rigorous evaluation. If your organisation depends on reliable AI in 2025, invest in your data supply chain first – models will come and go, but your corpus and tests are leverage that lasts.

Share𝕏 in

AI
Why AI Data Centres Are Facing Backlash Over Water, Power and Planning
AI data centres are no longer just a technology story. They are becoming a planning, utilities and public trust issue, with lessons for UK councils, businesses and AI policy.
JoshuaJuly 19, 2026
AI
Demis Hassabis wants a new AI standards body for the AGI era - what it could mean for the UK
A discussion of Demis Hassabis' AGI framework highlights a proposed Frontier AI Standards Body, pre-release model testing and the need for practical safety rules before more capable AI systems arrive.
JoshuaJuly 19, 2026
AI
Could AI decide who gets laid off? What the Meta lawsuit means for UK employers
A lawsuit by 26 Meta employees alleges AI systems and workplace monitoring data were used in layoff decisions that disproportionately affected people on protected leave. For UK employers, the lesson is not to avoid AI in
JoshuaJuly 19, 2026

Tagged

Model Agnostic

Last updated

5 July 2026

Star Rating

No ratings yet

Comments

No comments yet - start the conversation.