The biggest lie about AI: it won’t do your job for you — it makes you an editor
The most upvoted AI take this week is painfully accurate: instead of replacing us, AI has turned many of us into full-time editors. The Redditor’s analogy is spot on — you’re now managing an incredibly fast, highly enthusiastic, slightly drunk intern.
“I spend less time creating and more time playing Where’s Waldo with hallucinations.”
That shift is real across knowledge work, especially in data-heavy roles. Models are fast and broadly capable, but also overconfident. Left unchecked, they fabricate facts (“hallucinations”), mangle edge cases, and breeze past instructions. The result: you’re validating output rather than doing the work yourself.
The original thread is here: The biggest lie we were told about AI is that it would do our jobs for us. The BI article referenced in the post is not disclosed.
Why this matters to UK teams: trust, governance and cost
For UK organisations, this “editor-in-chief” reality isn’t just workflow nuance — it’s compliance and risk management. Under UK GDPR and the Data Protection Act 2018, you need lawful bases, data minimisation, and clear accountability when using personal data in AI systems. The ICO’s guidance on AI and data protection is a good starting point.
You’ll also need a Data Protection Impact Assessment (DPIA) for higher-risk scenarios, procurement checks on data residency and retention, and a documented human-in-the-loop process for material decisions. In the public sector, consider FOI exposure and explainability. None of this bans AI — it just means your “editing” step is a governance control, not an optional extra.
From drunk intern to dependable assistant: how to build reliable AI workflows
You can cut the editing burden dramatically with a few engineering patterns. These are model-agnostic and play nicely whether you’re using OpenAI, Anthropic, or open-source models.
1) Risk-tier your workflow
- Low-risk (draft emails, brainstorming): allow “auto” output with lightweight spot checks.
- Medium-risk (internal reports, BI summaries): require structured outputs and automated validation, then human review.
- High-risk (external claims, financial advice, HR/legal): enforce evidence, run tests, and mandate sign-off.
2) Ground answers with your data (RAG with citations)
Retrieval-augmented generation (RAG) feeds the model relevant documents or records at query time, constraining it to what’s true in your corpus. Always ask for citations — specific document names, URLs, or cell references — so reviewers can spot-check quickly.
- Define “allowed sources” and include short, quoted passages with each claim.
- Use conservative retrieval settings to avoid stuffing irrelevant context into the prompt. A “context window” is the maximum text the model can consider at once; longer isn’t always better.
3) Constrain outputs and validate automatically
Free-form text is hard to verify at scale. Request structured outputs (for example JSON) using the model’s “function calling” or “structured output” features where supported. Then validate:
- Schema checks: required fields present, types correct, allowed values enforced.
- Rule checks: totals add up, dates are in range, IDs exist in your database.
- Diff checks: compare against ground truth where available, with strict tolerances.
See vendor docs: OpenAI structured outputs and Anthropic tool use.
4) Ask for evidence, not “chain-of-thought”
Don’t ask the model to reveal its full reasoning. Instead ask it to return evidence: the passages used, the formula applied, and the IDs/rows referenced. Short, checkable artefacts speed up human review and lower hallucination risk.
5) Put evals and sampling in place
Build a small “golden set” of prompts and known-good outputs. On each change (model, prompt, retrieval, or data), run evals and track pass/fail. In production, sample a percentage of outputs for manual QA and keep feedback loops tight.
6) Control variability and cost
- Reduce randomness for deterministic tasks (lower temperature; fix seeds where supported).
- Cache prompts/responses and reuse embeddings to save tokens.
- Use smaller, cheaper models for extraction and routing; reserve larger models for complex reasoning.
Hallucination hotspots and practical fixes
- Out-of-scope knowledge: switch to RAG and restrict sources.
- Ambiguous instructions: specify audience, format, constraints, and acceptance criteria.
- Long, noisy context: keep prompts lean; chunk documents; highlight must-use facts.
- Numbers and BI summaries: require explicit formulas and reference cells or query IDs.
- Code or SQL generation: run tests/linters; sandbox execution; compare results against sample datasets.
BI example: from Where’s Waldo to evidence-backed notes
The Reddit post calls out Business Intelligence. Here’s a lightweight pattern I use with Google Sheets or a warehouse:
- Retrieve only the relevant tables, named ranges, and data dictionary into the prompt.
- Ask for answers plus: the exact formula or SQL used, input ranges/tables, and row counts.
- Constrain output to JSON: {“claim”: “…”, “evidence”: [“Sheet!B2:B50”], “formula”: “…”, “assumptions”: [“…”]}.
- Validate: compute the same formula server-side and compare results; flag drift.
- Require hyperlinks to sources (sheets, dashboards) so reviewers can click-check.
If you’re automating with Sheets and ChatGPT, this walkthrough will help you wire things up and keep outputs structured: How to connect ChatGPT and Google Sheets with a Custom GPT.
Workflow roles and responsibilities
| Stage | Responsibility | Quality Controls |
|---|---|---|
| Prompt and retrieval design | Engineer/Analyst | Source scoping, test prompts, small golden set |
| Model execution | Platform | Low temperature, structured output, timeouts |
| Automated validation | App/Backend | Schema checks, business rules, diff tests |
| Human review | Domain expert | Evidence spot-checks, sign-off trail |
| Governance | Data/Legal | DPIA, vendor due diligence, retention policies |
Privacy, security and availability in the UK
- Prefer vendors with UK/EU data processing options and clear retention controls. Check official security pages and DPAs.
- Avoid pasting personal or sensitive data into unmanaged tools. Use approved integrations with audit logs and access controls.
- Document human oversight for consequential decisions. The ICO guidance covers explainability and accountability.
Ethics and workforce reality
Editing is work. It shifts skills from drafting to specification, validation, and judgement. That can be empowering — more leverage with less grunt work — but it also risks rubber-stamping plausible nonsense if teams don’t slow down for checks. Fairness, bias and representativeness still matter: if your sources are skewed, your outputs will be too.
Key takeaways
- AI won’t do your job — it will change your job. Treat yourself as editor-in-chief, not spell-checker-in-chief.
- Reduce hallucinations with grounded sources, structured outputs, and automated validation.
- Right-size human review by risk. Save “auto” for low-stakes tasks.
- For UK teams, your editing step doubles as a governance control. Build it into your process and your DPIAs.
Further reading
- Reddit discussion: The biggest lie we were told about AI is that it would do our jobs for us
- ICO – Guidance on AI and data protection: ico.org.uk/for-organisations/ai
- OpenAI – Structured outputs: platform.openai.com/docs/guides/structured-outputs
- Anthropic – Tool use: docs.anthropic.com/en/docs/build-with-claude/tool-use