AI is deteriorating in realtime: what the Reddit debate gets right (and wrong)
A widely shared Reddit thread argues that AI is getting worse in realtime, pointing to “model collapse”, synthetic training data, and rising hallucinations. It links to serious sources – a Nature paper on recursive training, Epoch AI’s analysis of data limits, and commentary from Gary Marcus and Ed Zitron – alongside industry notes from OpenAI and Gartner.
So, is AI actually deteriorating? The short answer: quality is more fragile than many expected, but decline isn’t inevitable. The bigger story is how we train, evaluate, and deploy these systems in 2025.
Read the original Reddit post for the full list of sources.
What “model collapse” means – and why it matters
Model collapse is a failure mode where models trained on outputs from other models (rather than human-created data) lose diversity and factual fidelity over time. In LLMs, that shows up as blander phrasing, overconfident errors, and missing edge cases. The risk compounds if synthetic content floods the web and gets scraped into the next training run.
“AI models collapse when trained on recursively generated data.”
That’s the title and core finding of the Nature paper by Shumailov et al. (July 2024), which models how errors and biases amplify when training repeatedly on model-generated distributions. It’s not a reason to panic, but it is a warning: data provenance now directly affects model quality.
What the cited evidence actually shows
1) Training on synthetic data can damage quality if you’re careless
The Shumailov et al. paper (Nature) offers a theoretical and empirical basis for collapse when models consume their own outputs recursively. Key takeaway: feedback loops matter. Without curation, deduplication, and guardrails, quality drifts.
2) High-quality human data is finite
Epoch AI’s “Will We Run Out of Data?” argues that we’ll hit limits on public, high-quality human-generated text for large-scale training. This doesn’t mean “no data”, but it does mean the easy wins from web-scale scraping are tapering off.
3) Hallucinations remain a live issue in 2025
The Reddit post references OpenAI’s April 2025 o3/o4-mini system card and a “PersonQA” hallucination benchmark. Details are not disclosed in the thread, but the thrust is clear: even with frontier models, grounded reasoning about people and facts is brittle without retrieval or tools.
4) Synthetic data isn’t always bad – but it’s domain-sensitive
Self-play and synthetic curricula have been massively successful in formal domains (AlphaZero for chess/Go; AlphaGeometry for Olympiad-level geometry). These tasks have clear rules and objective outcomes. Natural language is messier: subjective, open-ended, and prone to compounding small errors – which is where collapse risk bites.
5) Industry signals and sentiment
Gartner has forecast that synthetic data could account for a majority share of training corpora by 2024 (a bold projection; methodology not disclosed here). The Duke University student survey (January 2025) suggests growing mainstream use and trust, but also concern about reliability. Commentators like Gary Marcus and Ed Zitron argue that quality and business models are under strain.
Is AI getting worse – or are we seeing the limits of scaling?
Both can be true. Some users report perceived regressions in popular chatbots. That can stem from:
- Training shifts to reduce harmful outputs (safety tuning) that throttle creativity or specificity.
- Context-window and caching tricks that change behaviour across turns.
- Cost-driven deployment choices (smaller models, aggressive decoding settings) that trade accuracy for latency or price.
- Evaluation drift: we’re asking harder, more specialised questions than we did in 2023.
None of this proves universal decline, but it does show why sustained quality requires better data hygiene, retrieval, and evaluation – not just bigger models.
Synthetic data at scale: risks, uses, and how to do it safely
Synthetic data is model-generated text, code, images, or labels used for training or fine-tuning. It’s attractive when human data is scarce, sensitive, or expensive. The risks arise when synthetic overwhelms the human core and introduces undetected biases or factual drift.
Where it works well:
- Structured or formal domains with verifiable outcomes (compilers, maths proofs, games, some code tasks).
- Data augmentation for rare classes, safety red-teaming, or style control.
Where it’s riskier:
- Open-ended knowledge tasks without retrieval.
- Long-horizon reasoning where small factual errors snowball.
Maintaining AI quality in 2025: practical steps for teams
1) Guard your data supply chain
- Track provenance: label human vs synthetic, source licences, and collection dates.
- Enforce synthetic-to-real ratios; oversample trusted human data.
- Deduplicate aggressively to avoid echo-chambers of model paraphrases.
2) Use RAG for facts, not just bigger models
RAG (retrieval-augmented generation) fetches trusted documents at query time and grounds responses in them. It reduces hallucinations, supports citations, and is easier to audit for regulated use cases.
3) Build gold test sets and measure often
- Create domain-specific evaluation suites with exact-match, citation, and reasoning checks.
- Track regressions across model or prompt updates; freeze decoding settings for comparability.
- Score hallucinations explicitly (e.g., entity-level factuality) and reward abstention when uncertain.
4) Prefer tools and orchestration over single-shot prompts
- Use function calling, calculators, and search rather than forcing the model to “improvise”.
- Chain-of-thought can help, but verify intermediate steps with tools where possible.
5) Curate synthetic data – don’t just generate and hope
- Use multi-model cross-checking or human review for synthetic labels.
- Watermark internal synthetic data where feasible to prevent re-ingestion in the wild.
- Periodically refresh from human, high-quality sources to avoid drift.
Implications for UK organisations
Compliance and privacy. UK GDPR and the Data Protection Act 2018 require lawful bases for processing and strong data governance. If your models are trained or fine-tuned on personal data, provenance and minimisation matter. Retrieval logs can also be personal data – treat them accordingly.
Procurement and assurance. Ask vendors about training data sources, synthetic proportions, deduplication, and evaluation methods. For regulated industries, require clear documentation and the ability to reproduce key benchmarks on your data.
Cost and sustainability. Chasing scale without quality controls wastes compute – and water and energy. For a practical explainer on data centre cooling and water use, see my breakdown of AI’s water cycle.
Talent and process. Hire for data engineering and evaluation, not just prompt flair. Treat your knowledge base and test sets as core assets.
Where this conversation is headed
- More hybrid stacks: smaller, faster models fronted by retrieval and tools, with large models for escalation.
- Data-centric research: debiasing, selective mixing of synthetic with human data, and robust deduplication.
- Transparent system cards: clearer reporting on hallucination benchmarks and dataset composition (currently not disclosed in many releases).
Further reading from the Reddit thread
- Shumailov et al. – AI Models Collapse When Trained on Recursively Generated Data (Nature, 2024)
- Epoch AI – Will We Run Out of Data? (ICML 2024)
- Gary Marcus – How an AI feedback loop threatens to break ChatGPT
- Ed Zitron – The Truth About the AI Bubble & The Software Decline
Bottom line
AI isn’t doomed, but naïve scaling is. Quality now hinges on disciplined data pipelines, retrieval, and rigorous evaluation. If your organisation depends on reliable AI in 2025, invest in your data supply chain first – models will come and go, but your corpus and tests are leverage that lasts.