Five AI models, one answer: why 42 keeps turning up
On Reddit, /u/ishaqhaj asked five different AI models – ChatGPT, Claude, Grok, Qwen and DeepSeek – to “pick a number between 1 and 100”. Every model replied with the same number: 42.
“Every. Single. One. answered 42.”
It feels like a cosmic coincidence, but it’s really about how large language models (LLMs) learn and generate text. The headline: LLMs are not random number generators. They predict likely text based on patterns in their training and alignment data – and “42” is a culturally loaded, high-probability answer to that prompt.
Why LLMs gravitate to 42
Cultural priors from shared training data
Most frontier models are trained on overlapping portions of the public internet, technical forums, and Q&A data. In that corpus, “42” is a strong meme from The Hitchhiker’s Guide to the Galaxy – “the answer to life, the universe and everything.” That makes “42” unusually salient when the model sees prompts like “pick a number”.
With so much shared pretraining data across vendors, those cultural priors line up. Different models end up with similar token probabilities for the same prompt, so you see convergence on the same output.
Instruction-tuning and safe defaults
Modern LLMs are instruction-tuned and reinforced for helpful, harmless, human-like behaviour. When asked to “pick a number”, models often interpret it as an invitation to be brief and playful. “42” is a compact, recognisable nod to tech culture that feels witty and safe – a high-reward default under alignment.
Decoding parameters and determinism
Most chat products run with conservative decoding: low temperature and nucleus sampling (top-p). Low temperature reduces randomness and pushes the model to pick the single most probable completion. If the interface uses temperature = 0, the result is fully deterministic – the same prompt yields the same output every time.
In that regime, a culturally dominant answer like “42” wins again and again across models. See temperature in the OpenAI API docs and Anthropic’s messages API.
Randomness in LLMs is not uniform randomness
LLMs sample from a probability distribution over tokens. Parameters like temperature and top-p adjust the shape of that distribution – they do not create uniform randomness over concepts like “integers from 1 to 100”. Even at higher temperature, the model still favours numbers it has seen more often in similar contexts: 7, 13, 17, 23, 37, 42, 69, 73, 99, and so on.
So if you want a uniformly random integer, asking an LLM to “pick a number” is the wrong tool. You will get a biased sample of culturally popular numbers, not a fair draw.
Try it yourself: ways to break the 42 habit
- Dial up temperature: If your interface allows it, increase temperature or top-p. You’ll get more variety, but still not uniform randomness.
- Be explicit about uniform sampling: “Return a uniformly random integer between 1 and 100 (inclusive).” This reduces meme answers, but cannot guarantee fairness in pure text-only generation.
- Ask for multiple draws: “Give me 50 numbers sampled independently and explain how you sampled them.” Then inspect the spread – you will likely see clustering around culturally popular integers.
- Constrain the space: Try “pick an even number between 1 and 100” or “pick a prime between 1 and 100”. You’ll often see new defaults emerge (e.g., 37 or 73 are common primes).
- Use tools for true randomness: If your stack supports function calling or plugins, have the model call a proper RNG (e.g., Python’s secrets module or an external service) and return the result.
- Testing with APIs: Developers can use logit bias to downweight “42” and observe the next-preferred defaults. This reveals how peaked the distribution really is.
When genuine randomness matters in the UK
Plenty of UK use-cases require auditable randomness: prize draws, clinical trial allocation, A/B test bucketing, or any process covered by governance frameworks. Using an LLM for these introduces bias and compliance risk.
- Use cryptographic or hardware-backed RNGs for regulated or auditable scenarios.
- Log seeds and methods to support audits and reproducibility.
- Document in your DPIA (Data Protection Impact Assessment) how assignment and fairness are ensured when people are affected by automated decisions.
For broader guidance on AI, privacy and fairness under UK GDPR, see the ICO’s AI guidance.
Practical patterns for developers: pairing LLMs with real RNGs
If you need both natural language and real randomness, use the LLM as an orchestrator, not the source of random values. A few practical patterns:
- Tool calling to a server-side endpoint that uses a cryptographic RNG and returns the number.
- Have the model generate code (e.g., Python using secrets.randbelow) that you execute in a sandbox, returning only the result to the user.
- For spreadsheets and workflows, wire the LLM to external functions or sheets formulas that provide randomness, then narrate the outcome. If you’re automating with Sheets, this guide on connecting ChatGPT and Google Sheets is a good starting point for tool-enabled flows.
Reliable options for randomness
| Method | Source | Typical use |
|---|---|---|
| Python secrets | OS cryptographic RNG | Security-sensitive draws, tokens, fair sampling |
| /dev/urandom (Linux) | Kernel entropy pool | Server-side randomness for backend services |
| random.org | Atmospheric noise | Public draws and verifiable randomness (check terms) |
Why this small meme points to a bigger issue
The 42 phenomenon is a useful reminder that LLMs inherit our cultural defaults. That’s often helpful – they understand references and idioms – but it also leads to sameness. Cross-model convergence can feel like “AI monoculture”: different systems giving the same safe, familiar answers.
For UK organisations deploying AI, this has two takeaways. First, don’t rely on LLMs for tasks requiring fairness, randomness or statistical guarantees. Second, expect culturally loaded outputs and plan for them – through careful prompts, tool use, and evaluation that measures diversity and bias in generated content.
Bottom line
Multiple AIs choosing 42 is not a glitch; it’s how probabilistic text generators behave when trained and aligned on the same cultural data. If you need a fair draw, use a real RNG. If you need engaging language, use the LLM – but nudge it away from memes with explicit instructions, higher temperature, or proper tool support.