Andrej Karpathy says he’s never felt more behind as a programmer. Why that matters for you
A recent Reddit post distilled some striking takeaways from Andrej Karpathy’s latest talk. The headline: even Karpathy says he’s never felt more behind. That’s not performative humility – it’s a signal that the ground has shifted under professional software development.
The post highlights a pivotal moment, a new framing for developer practice, and a pragmatic warning about AI’s limits. Here’s what it means for developers and tech teams in the UK.
Read the original Reddit thread.
December 2025: the step change for agentic workflows
The post claims that December 2025 marked a true turning point – not a slow drift upwards, but a step change where agentic workflows suddenly became reliable. In plain English: it became viable to have AI systems that plan, call tools or APIs, iteratively try steps, and self-correct with minimal hand-holding.
If true, that timing matters. Many teams evaluate AI annually; a step change in capability can slip under the radar until a competitor ships a feature you assumed was still research-grade.
Vibe coding vs agentic engineering (and why it’s not just semantics)
Karpathy separates two mindsets:
- Vibe coding – using an LLM in the editor or chat to generate chunks of code or text by “vibe”: natural-language prompts, quick fixes, ad hoc assistance. It raises the floor for everyone.
- Agentic engineering – deliberately designing systems where models act as agents: plan, use tools, maintain short-term memory, verify outcomes, and recover from errors. It helps professionals go faster without dropping the quality bar.
“Vibe coding raises the floor. Agentic engineering is how professionals go faster without dropping the quality bar.”
Vibe coding is great for prototypes and productivity boosts. But the shift to agentic engineering is about reliability, observability, and repeatability – the things organisations need to run production systems without playing compliance roulette.
MenuGen: when a one-prompt solution erases an app
The post describes Karpathy’s app (MenuGen) that showed photos of menu items. Someone then solved the same problem with a single prompt to a multimodal model (text + images). His conclusion: his app “shouldn’t exist.”
The lesson is about product risk. If your differentiation is gluing basic vision and text together, a general-purpose model can now subsume you. This isn’t a condemnation of building – it’s a reminder to anchor your roadmap in unique data, distribution, user experience, or hard constraints (compliance, latency, offline, cost ceilings).
Jagged intelligence: extraordinary at X, clueless at Y
AI is powerful but lumpy. The post calls out “jagged intelligence” with a memorable example: the same model that can refactor a 100k-line codebase might tell you to walk 50 metres to the car wash to wash your car – missing the obvious need to drive.
AI can refactor your codebase, then forget you can’t carry a car 50 metres to a car wash.
This is why agentic engineering emphasises tool use and verifications. Don’t count on global common sense; instead, encode checks, use domain tools, and add constraints that keep agents on rails. Think: explicit planning, schema validation, test harnesses, sandboxed execution, and cheap evaluation datasets.
“Outsource thinking, not understanding”
The post quotes a line that lands hard:
“You can outsource your thinking, but you can’t outsource your understanding.”
Use models to do exploration, draft designs, or trial reasoning paths. But keep hold of understanding: the why behind a solution, the failure modes, and the data assumptions. That’s the part you’ll be accountable for with users, regulators, and your future self at 2am.
Practical playbook: moving from vibe coding to agentic engineering
Here’s a lightweight path you can start this week:
- Pick one contained workflow where failure is cheap: e.g., summarising a support thread, drafting a PR description, creating test cases.
- Instrument the loop: add planning steps, retries with different tools, and basic guardrails (schemas, regex checks, or simple assertions).
- Create a tiny eval set (20-50 cases) with ground truth or acceptance criteria. Run it nightly and track regressions.
- Separate tools from prompts: wrap APIs and business logic as callable tools; keep prompts small and versioned.
- Log everything: prompts, tool calls, outputs, and errors. Observability turns “vibes” into engineering.
- Keep humans in the loop where impact is high. Design UIs for quick review and correction.
- Control costs: prefer smaller models where possible, stream outputs, and cap token budgets per task.
If you’re experimenting with business ops, a simple bridge like connecting a GPT to a spreadsheet can be a safe, auditable start. I’ve documented a practical approach here: How to connect ChatGPT and Google Sheets (Custom GPT).
Implications for UK developers, data teams, and leadership
- Data protection and privacy – Treat prompts and tool calls as personal data if they can identify individuals. Minimise data, mask sensitive fields, and review vendor data use. UK GDPR still applies.
- Contracts and data residency – Put a Data Processing Agreement in place. Ask about retention, human review, model training on your data, and regional processing options.
- Security and logging – Agentic systems call tools. Secure credentials via a vault, scope tokens tightly, and audit who/what called which API.
- Cost predictability – Agent loops can balloon tokens. Add budget guards per request and fall back to smaller models or cached results.
- Skills and hiring – The bar shifts from “prompt wizard” to “agentic engineer”: evaluation, observability, tool design, and failure handling.
When to choose vibe coding, and when to engineer an agent
- Use vibe coding for one-off tasks, prototypes, and local productivity wins.
- Use agentic engineering when you need repeatability, SLAs, compliance, or multi-step workflows touching real systems or customers.
Not everything needs an agent. But when you do, design for failure from day one.
Open questions and what to watch
- Multimodal competence – If a single prompt can replace an app, expect product lines to blur. Focus on data, UX, and hard constraints as moats.
- Tool use and planning – Better planners and richer tool ecosystems will make agents both more capable and harder to govern. Invest early in evaluation and audits.
- Team workflows – The best returns may come from rethinking processes end-to-end, not just sprinkling prompts into existing steps.
Final thought
If December 2025 really was a step change, the gap now won’t close with more “prompting tips”. It will close with engineering: data discipline, toolable workflows, evaluation, and clear ownership. By all means outsource some thinking to the machine. But keep your understanding close – that’s the part you can defend in code review, in a post-mortem, and in front of your customers.