Wharton researchers on why “just review the AI output” doesn’t work
A new Reddit thread highlights a January 2026 study from Wharton that quantifies something many teams are noticing: when AI proposes an answer, most of us accept it – even when it’s wrong. The post summarises “Thinking—Fast, Slow, and Artificial” by Steven D. Shaw and Gideon Nave (shared via SSRN) and argues that AI isn’t just a tool; it behaves like a third thinking system that sits outside our heads. You can read the discussion here: Reddit: Wharton researchers just proved why “just review the AI output” doesn’t work.
In short: when we rely on AI, we often stop verifying. And we don’t notice we’ve stopped.
AI as “System 3” and the risk of cognitive surrender
Kahneman famously split human thinking into System 1 (fast, intuitive) and System 2 (slow, analytical). The authors suggest AI now acts as a “System 3” – an external cognitive partner that can shortcut both. With repeated use, many people “surrender” their own checking.
“Cognitive surrender” is when you stop verifying the AI’s output without realising it.
This isn’t like using a calculator (offloading). With surrender, the AI’s answer is re-encoded as your judgement. You feel like you thought it through yourself.
Headline numbers from the experiment
The Reddit post reports the following figures from the study (1,372 participants; 9,593 trials):
| Condition | Behaviour or outcome | Reported figure |
|---|---|---|
| When AI was correct | Participants followed the AI | 92.7% |
| When AI was incorrect | Participants still followed the AI | 79.8% |
| No AI (baseline) | Participant accuracy | 45.8% |
| With correct AI | Participant accuracy | 71.0% |
| With incorrect AI | Participant accuracy | 31.5% |
| Confidence shift with AI | Confidence rose whether AI was right or wrong | +11.7 percentage points |
Incentives and real-time feedback helped, but didn’t eliminate the effect. The most resistant group combined high fluid intelligence (pattern-spotting ability) and a strong “need for cognition” (enjoyment of effortful thought).
Note: these are the figures as reported in the Reddit post; task details were not disclosed there.
Why “human-in-the-loop review” keeps failing in practice
The review paradox meets time pressure
The Reddit post names a “Review Paradox”: if a model writes the code, draft, or analysis, where does the reviewer’s skill come from to spot non-obvious mistakes? Add normal workplace pressure and the result is predictable:
- Authority bias – we treat AI outputs as authoritative, especially when fluent.
- Anchoring – the first answer we see frames our judgement.
- Cognitive load – reviewing is harder than producing; people default to acceptance.
- Misplaced confidence – as reported, confidence rose even when the AI was wrong.
That’s why some engineering teams have quietly shifted to reviewing specifications and test criteria rather than model-generated code. They’re moving human judgement earlier and making acceptance contingent on pre-agreed checks, not a final skim-read.
Safer human-AI workflows than “AI generates, human reviews”
Design patterns that reduce cognitive surrender
- Separate generation from verification. Use a different prompt, model, or person to verify against explicit criteria. Do not let the generator “mark its own homework”.
- Spec-first, test-first. Humans define requirements, acceptance tests, and constraints upfront; AI proposes outputs that must pass those tests. Review the spec, not the prose.
- Structured evidence, not vibes. Require citations, data ranges, and checklists. Prefer verifiable claims over persuasive narratives.
- Dual-modelling for disagreement. Have a second model produce an independent alternative or run a targeted critique prompt. Escalate when models disagree.
- Gold checks in the flow. Seed known-answer tasks into queues to measure reviewer vigilance and false accept rates continuously.
- Confidence and abstain. Force the system to surface calibrated confidence and an “I’m not sure – escalate” path. Tie automation to confidence thresholds.
- Skill maintenance. Randomly route a proportion of work through manual paths so teams keep the muscles needed to audit the AI.
- Four-eyes for high-stakes. In regulated or safety-critical tasks, require a second human or domain specialist to sign off, not just a rubber-stamp review.
- Latency as a safety feature. For risky changes, add a cooling-off period rather than instant deployment of AI-generated outputs.
- Metrics that matter. Track override rate, dissent rate, correction rate, time-to-detect errors, and downstream incident costs – not just “tasks closed”.
UK organisations: compliance and data protection
For UK teams, this isn’t only a quality problem – it’s a governance issue. If reviewers are primed to accept AI outputs, you need stronger process controls to satisfy oversight, whether that’s internal audit or regulators.
- Data protection by design. Limit personal data exposure to models, prefer retrieval-augmented checks on in-house docs, and keep audit logs. See the ICO’s guidance on AI and data protection: ICO – Artificial intelligence (AI).
- Sector specifics. In finance, a wrong-but-confident AI decision is a conduct risk; in healthcare and legal, it’s professional liability. Use slow lanes and dual sign-off where harm is high.
- Procurement and vendor risk. Ask vendors how they measure false accepts, disagreement rates, and escalation behaviour – not just average accuracy.
Limitations and open questions
- Task domain was not disclosed in the Reddit summary. Generalisability may vary across domains and expertise levels.
- Lab effects vs field reality. How do these surrender rates change with training, organisational norms, or stronger countermeasures?
- Model differences. Results may shift with model class, prompting, or tools that surface uncertainty and sources by default.
Practical next steps for teams
- Map where you currently rely on “just review it” and rework the riskiest flows first.
- Introduce independent verification steps and gold checks this sprint; measure false accepts.
- Move human attention upstream: agree specs, tests, and constraints before generation.
- Publish a simple policy on when AI may autonomously act, when it must seek a second opinion, and when it must escalate to a human.
If you’re wiring models into operational tools, a small example: when connecting AI to spreadsheets, add guardrails and approval steps rather than live-editing production sheets. My guide here shows how to integrate sensibly: How to connect ChatGPT and Google Sheets (Custom GPT).
Bottom line
The Wharton study, as relayed on Reddit, puts numbers to a widely felt reality: human review is not the fail-safe many roadmaps assume. Under time pressure, our brains tend to accept fluent AI outputs and feel unjustifiably confident about them. The fix isn’t lecturing people to “be careful”; it’s redesigning workflows so that verification is independent, evidence-based, measured, and – where needed – slow.
If you’ve experimented with spec-first reviews or dual-model checks, I’d love to hear what’s working in your team. And if you’ve read the paper in full, share any details on task design that add texture to these findings.