Trust but Verify: A Reliable Workflow for Using AI Deep Research Without Getting Burned

A guide to implementing a trust-but-verify workflow for AI deep research to prevent errors and enhance reliability.

Hide Me

Written By

Joshua
Reading time
» 6 minute read 🤓
Share this

Unlock exclusive content ✨

Just enter your email address below to get access to subscriber only content.
Join 121 others ⬇️
Written By
Joshua
READING TIME
» 6 minute read 🤓

Un-hide left column

“Deep research” that feels like a genius intern – and a pathological liar

The Reddit thread sums up the mood perfectly. Many of us tried Perplexity Pro and the new GPT research features and felt the speed-up immediately. Then the cracks appeared: confident reports, neat citations, and one non-existent regulation that could have sunk a meeting.

“A genius intern who is also a pathological liar.”

If you do market analysis, regulatory work, or any decision-critical research, this is the tightrope: the tools are brilliant at scaffolding answers, but their citations and specifics can be wrong in ways that cost you credibility.

Here’s what the post means for UK professionals – and a workflow to get the benefits without the burns.

Why hallucinated citations matter for UK professionals

Hallucinations are fabricated facts presented as true. The Redditor describes a fabricated EU clause that would have “solved all my problems”. That’s not a minor miss – it’s the kind of error that can mislead clients, wreck timelines, and create compliance exposure.

  • Regulatory risk: If you reference a non-existent rule, you can steer a project off course or breach obligations.
  • Data protection: Using AI tools with sensitive queries or documents requires care under UK GDPR. See the ICO’s AI guidance.
  • Client trust: Overconfident nonsense looks authoritative until it doesn’t.
  • Productivity paradox: If you must verify every claim, speed gains evaporate unless you build a verification layer.

What causes “deep research” hallucinations and broken citations

Large language models (LLMs) generate text by predicting likely continuations. They’re excellent at structure and style, but factuality is not guaranteed. Even with browsing and citations, they can:

  • Overfit to your prompt and make up specifics to please you.
  • Misattribute or conflate sources when extracting from multiple pages.
  • Link to real sites with wrong sections, dates, or invented clauses.

Vendors are improving retrieval, browsing, and citation quality, but no current system can be blindly trusted for decision-grade facts. The Redditor’s experience with Perplexity Pro and GPT fits what many of us see: great scaffolding, unreliable specifics.

A trust-but-verify workflow for AI deep research

Use AI for speed and structure. Use a human and a simple verification stack to ensure accuracy. This workflow keeps the benefits while reducing risk:

  1. Define the exact research question – what jurisdiction, date range, and scope are in play? Tell the model “EU energy regulations, post-2021 amendments only” or “UK FCA rules, consumer credit”.
  2. Ask the model for a plan first – relevant bodies, specific documents to find, keywords, and likely pitfalls. Don’t accept conclusions yet.
  3. Force citation discipline – prompt for verbatim quotes with URLs, section numbers, and publication dates. Ask for “direct quote + link + why this quote answers the question”.
  4. Verify links independently – open each citation. Check the quote appears on the page and the section numbers match. If it’s regulatory, prefer primary sources:
  5. Triangulate – for critical claims, require two independent primary sources or one primary plus the official regulator’s guidance. No triangulation, no inclusion.
  6. Use advanced search operators – e.g. site:ec.europa.eu or site:fca.org.uk with exact phrases. See Google’s advanced search operators.
  7. Track evidence – record each claim, source URL, quote, and verification status in a spreadsheet so your team can audit. If you want to pull AI outputs into a sheet, see my guide on connecting ChatGPT to Google Sheets.
  8. Summarise last – only once the underlying quotes and links are verified should you ask the model to summarise and format for stakeholders.

“It literally hallucinated a specific clause.”

Red flags that a “deep research” answer is unsafe

  • Perfect formatting with highly specific citations but no direct quotes.
  • URLs that resolve but don’t contain the quoted text.
  • Section numbers that look plausible but don’t exist in the document.
  • Claims that rely on secondary sources when a primary source should exist.
  • Time-sensitive facts with no publication or last-updated dates.

Tools and stack ideas that reduce risk (they don’t eliminate it)

  • Ask for quotes, not just citations – models are less likely to fabricate when they must provide word-for-word text and a working link.
  • Cross-model checks – if Perplexity says X, ask a different model to find the same clause with a quote and link. Disagreement is a prompt to dig deeper.
  • Local retrieval (RAG) – Retrieval-augmented generation (RAG) means having the model answer only from a curated set of documents you provide. It doesn’t solve everything, but it narrows the model’s freedom to invent. See the RAG paper by Lewis et al. (arXiv).
  • Use official sources first – start from EUR-Lex, legislation.gov.uk, or regulator docs, then ask the model to interpret. Don’t go the other way around.
  • Browser and citations – tools with browsing and citations can help, but they still need human verification. See vendor docs for how browsing works in tooling like the OpenAI Assistants API (docs) and Perplexity Pro’s features (site).

When to trust AI research vs when not to

  • Good fit: brainstorming angles, drafting outlines, mapping the landscape, identifying stakeholders, generating checklists and interview questions, summarising verified sources.
  • High caution: quoting regulations, specifying clauses, stating compliance thresholds, or making claims that directly inform legal, financial, or safety decisions.

In high-stakes cases, treat AI like a fast junior researcher: helpful for first passes and synthesis, never the final authority.

Why this matters now

Generative research tools are here to stay, and the productivity uplift is real. The Redditor’s “genius intern” metaphor is apt: delegate structure and speed, but keep tight controls on facts. For UK teams, that means designing processes that respect data protection, cite primary sources, and make verification part of the cadence rather than an afterthought.

The good news is you don’t need exotic infrastructure. A disciplined prompt, official sources, a spreadsheet of evidence, and a quick cross-model check get you most of the way there. That’s how you keep the magic – without getting burned.

Read the original discussion: Deep Research feels like having a genius intern who is also a pathological liar.

Last Updated

February 1, 2026

Category
Views
0
Likes
0

You might also enjoy 🔍

Minimalist digital graphic with a pink background, featuring 'AI' in white capital letters at the center and the 'Joshua Thompson' logo positioned below.
Author picture
Learn how to turn AI productivity gains into higher pay and career progress with this UK guide.
Minimalist digital graphic with a pink background, featuring 'AI' in white capital letters at the center and the 'Joshua Thompson' logo positioned below.
Author picture
Learn how to replace cloud costs by running AI workloads locally with a £599 Mac Mini and Claude AI.

Comments 💭

Leave a Comment 💬

No links or spam, all comments are checked.

First Name *
Surname
Comment *
No links or spam - will be automatically not approved.

Got an article to share?