When Generative AI Goes Wrong: Lessons from Deloitte Australia’s Hallucinated Report for Public Sector Buyers

Learn how Deloitte Australia's generative AI hallucination in a report provides vital lessons for public sector buyers.

12 October 2025by Joshua Thompson5 min read82 views

Deloitte caught using hallucinating AI in Australia – what UK public sector buyers should learn

A Reddit post this week flagged a cautionary tale from Australia: Deloitte reportedly used generative AI to help produce a A$440,000 government report that contained errors, and will provide a partial refund after admitting its use of AI. The discussion is here: Reddit thread, and the linked news story is from the Guardian.

Deloitte will provide a partial refund to the federal government over a $440,000 report that contained several errors, after admitting it used generative artificial intelligence to help produce it.

Details beyond that are not disclosed in the Reddit post, but the headlines are enough to matter. If a Big Four firm can let AI-driven mistakes slip into a high-value public report, the rest of us need to be honest about our controls.

What “hallucination” means in generative AI

Hallucination is when a model confidently generates content that is factually incorrect or fabricated. It happens because large language models predict plausible next words rather than verifying facts against a trusted source.

Mitigations include retrieval-augmented generation (RAG – a pattern that injects documents into the model’s context window at query time), citation requirements, constrained outputs, and rigorous human review. None of these are silver bullets. They reduce risk, they don’t eliminate it.

Why this incident matters to a UK audience

Whether you’re in a government department, a local authority, or a regulated industry, the lesson is the same: if AI is in the production chain, quality assurance must be too. UK buyers and suppliers are operating under public scrutiny and data protection obligations. Using AI can be entirely appropriate – but silent use, weak verification, and poor documentation are a recipe for reputational damage and contractual disputes.

In the UK context, consider:

Data protection and confidentiality – avoid pasting sensitive or personal data into unmanaged tools, and document where model inputs go and how they are retained.
Procurement transparency – declare AI use, methods, and controls in proposals and deliverables.
Value for money – if suppliers use AI to accelerate work, ensure savings do not come at the expense of accuracy and accountability.

Buyer checklist: how to contract for AI-assisted deliverables

If you commission reports, analysis, or content that could involve AI, bake expectations into the contract and the statement of work. Practical clauses:

Disclosure: suppliers must declare where and how generative AI is used in the work, including model families, hosting (on-prem, private cloud, vendor API), and any third-party tools.
Provenance and citations: require verifiable sources for factual claims, with a clear list of references and links. For RAG systems, require surfaced citations.
Human verification: mandate named reviewers with domain expertise and a documented QA process (spot checks, fact-check logs, sign-off records).
Data handling: prohibit feeding confidential or personal data to public models without explicit approval and safeguards; require audit logs and data retention limits.
Acceptance criteria: define factual accuracy thresholds, a correction window, and a refund/holdback mechanism if errors exceed an agreed tolerance.
Traceability: ask for a short “methods appendix” covering datasets, prompts or templates, model versions, and date of last regeneration.
Security: require that any model access complies with your organisation’s infosec policies and that API keys are centrally managed.

Supplier checklist: how to deliver AI-assisted work without tripping up

If you’re using generative AI in client work, treat QA as non-negotiable. A few operational patterns that work:

Design for verification: build RAG with authoritative sources and include citation snippets in outputs. Penalise the model for uncited claims.
Constrain generation: use structured prompts, schema-constrained outputs (JSON or table formats), and narrow the model’s task to summarising supplied sources rather than free invention.
Automated checks: run regex or rules-based validators on outputs (e.g., “no claims without a reference ID”; “dates must match source ranges”).
Human-in-the-loop: assign final fact-check to domain experts, not just the person who wrote the prompt. Use checklists and sampling plans.
Document the method: keep a lightweight log of model versions, prompts/templates, and data sources. If challenged, you can show your working.
Be transparent: tell the client where AI helped and where humans decided. Clients hate surprises more than they hate AI.

Quality controls you can put in place today

For public sector teams

Request a “sources-first” approach for research-heavy deliverables – vendors should show their source list and how they were used before final prose.
Pilot on lower-risk content: start with internal summaries and meeting notes rather than external-facing reports, while you refine QA.
Mandate a simple QA pack with each deliverable: a source log, a list of non-obvious claims with references, and a sign-off sheet.

For developers and analysts

Wire LLM outputs into a sheet or database and add programmatic checks before content is published. If you need a practical starting point, see my guide on connecting ChatGPT to Google Sheets, then layer validation rules on top.
Use retrieval with a narrow, curated corpus and block the model from answering outside it. If no evidence is found, instruct it to say “not found”.
Separate drafting from decision-making. AI can accelerate synthesis; humans must own the conclusions.

Costs, productivity, and risk: a balanced view

Generative AI can speed up research and drafting, reduce routine workload, and help create first-pass analyses. That can be good for taxpayers and shareholders when handled well. But it also shifts risk from effort spent to quality assured. Without clear controls, AI simply compresses time to error.

The sensible middle ground is transparent use with strong verification. Buyers should demand clarity and traceability. Suppliers should invest in process, not just prompts.

Key takeaways for UK readers

Hallucinations are a known failure mode – plan for them. If you can’t verify it, don’t ship it.
Contract for accuracy, provenance, and transparency when AI is involved. Put money behind acceptance criteria.
Use technical and procedural safeguards: RAG with citations, constrained generation, automated validators, and human sign-off.
Start small, measure error rates, and scale only when QA proves reliable.

The Australian story is a reminder: AI is not the problem. Poor governance is. Get the basics right and you can safely capture the benefits without ending up in the headlines.

Sources

Reddit discussion: Deloitte caught using hallucinating AI in Australia
News link from the Reddit post: The Guardian report

Share𝕏 in

AI
Why AI Data Centres Are Facing Backlash Over Water, Power and Planning
AI data centres are no longer just a technology story. They are becoming a planning, utilities and public trust issue, with lessons for UK councils, businesses and AI policy.
JoshuaJuly 19, 2026
AI
Demis Hassabis wants a new AI standards body for the AGI era - what it could mean for the UK
A discussion of Demis Hassabis' AGI framework highlights a proposed Frontier AI Standards Body, pre-release model testing and the need for practical safety rules before more capable AI systems arrive.
JoshuaJuly 19, 2026
AI
Could AI decide who gets laid off? What the Meta lawsuit means for UK employers
A lawsuit by 26 Meta employees alleges AI systems and workplace monitoring data were used in layoff decisions that disproportionately affected people on protected leave. For UK employers, the lesson is not to avoid AI in
JoshuaJuly 19, 2026

Tagged

Model Agnostic

Last updated

5 July 2026

Star Rating

No ratings yet

Comments

No comments yet - start the conversation.