Self-improving AI agents: a 4-hour Python-to-TypeScript translation with zero supervision
A Redditor claims they ran a coding agent in a self-learning loop for four hours, translating ~14,000 lines of Python to TypeScript with zero build errors and all tests passing, and without any human intervention. The approach: let the agent run, reflect on what helped, extract “skills”, and restart the next run with those skills injected. Each iteration got smarter and cleaner.
It’s a striking result. If it stands up, it suggests practical self-improvement can emerge from execution-time feedback loops without fine-tuning or human feedback. Here’s what it means, what’s missing, and why it matters for UK teams.
Read the original Reddit thread.
What happened: 14k lines translated in ~4 hours, tests all green
The author describes a loop where an agent completes work, reflects, extracts “skills”, and restarts with those skills injected. Early iterations showed repeated mistakes and backtracking; later ones were cleaner with smarter decisions. No fine-tuning, no human feedback, no hand-holding.
Started it, walked away, and came back to working code.
Key run metrics (as reported)
| Task | Translate Python to TypeScript |
| Duration | ~4 hours |
| Commits | 119 |
| Lines translated | ~14,000 |
| Build errors | Zero |
| Test status | All tests passing |
| Human intervention | None |
| Model(s), context window, hardware, cost | Not disclosed |
How the self-learning loop likely worked
The author’s description aligns with a common “reflect-improve-restart” agent pattern:
- Run – the agent writes code, runs builds/tests, and commits changes.
- Reflect – after execution, the agent analyses what worked and what didn’t.
- Extract skills – it distils reusable patterns or rules (“skills”) from the run.
- Restart with skills – the next run injects those skills into prompts or tools.
Definitions for clarity:
- Agent – an autonomous system that plans, executes, and evaluates steps across tools (e.g., repo, CLI, tests).
- Fine-tuning – training a model on labelled examples to shape behaviour. The author did not use fine-tuning.
- Reflection – the agent’s own post-run analysis to improve its next attempt, without changing model weights.
Why this matters for UK developers and organisations
If reproducible, this is a strong data point that iterative autonomy can handle real codebases when paired with a reliable test suite. For UK teams facing migrations (Python-to-TypeScript, Python 3.x upgrades, framework rewrites), an agent that genuinely improves across runs could be a force multiplier.
Business implications:
- Legacy modernisation – safer, faster rewrites when tests are strong.
- Productivity – offload boilerplate translation or API surface refactors.
- Cost and throughput – potentially lower contractor spend for well-scoped, test-backed tasks (costs were not disclosed here).
- Compliance and data protection – unsupervised agents touching private repos raise GDPR and security concerns; keep data residency and logging in scope.
On the practical side, connecting agents to the right tools is half the battle. If you’re exploring light-touch automation in your workflows, this guide on connecting ChatGPT to Google Sheets shows how to wire models into real business data without overcomplicating things.
Caveats: what’s powerful here, and what’s missing
Strong signals:
- Evidence that execution-time reflection can materially reduce repeated mistakes.
- Translation plus passing tests suggests alignment to functional requirements, not just surface-level code conversion.
Important gaps (as reported):
- Model, context window, hardware, and token cost – not disclosed.
- Repo size/complexity and test coverage – not disclosed.
- Definition of “zero errors” – build errors were zero; runtime or integration behaviours beyond tests were not discussed.
Risks to keep in mind:
- Generalisation – success on one repo doesn’t guarantee broader reliability.
- Test overfitting – agents can learn to “game” weak tests; quality and coverage matter.
- Security – unsupervised commits can introduce subtle bugs or dependency risks.
- Compliance – ensure secrets, personal data, and access scopes are controlled, with audit logs for due diligence.
Practical guidance: trying a self-learning loop safely
If you want to experiment, aim for tight feedback loops and robust safety rails:
- Start with a well-tested, non-critical repo; enforce green tests as a hard gate.
- Sandbox execution (containers, read-only mirrors); restrict secrets and network access.
- Require signed commits and mandatory code review before merge to main.
- Automate static analysis, type checks, linting, and dependency scanning.
- Log every tool call and decision; keep artefacts of reflections/“skills” for auditability.
- Track cost and latency; set hard budgets and stop conditions.
- Measure success beyond “build passes” (runtime checks, integration tests, performance budgets).
What this tells us about “self-improving AI”
This isn’t model-level self-learning; weights weren’t updated. But it is meaningful: an agent can bootstrap better behaviour from its own execution traces and roll those forward. That kind of procedural meta-learning is extremely practical for code tasks where tests provide clear feedback.
Are we underestimating how close self-improving AI actually is?
Possibly. For narrow, test-rich domains, iterative autonomy seems further along than many assume. For open-ended tasks without robust evaluation, we’re not there yet.
Bottom line
This is an intriguing, concrete report: a self-learning agent loop handling a sizeable code translation in hours, with 119 commits and a clean build. The missing details stop us from declaring a breakthrough, but the pattern itself is valuable and actionable for UK teams: pair agents with strong tests, iterate, reflect, and ratchet up scope carefully.
If you’re considering this in production, treat it like any other engineering capability – instrument it, govern it, and prove it against your quality bar before it touches main.
Source: Reddit – I let an AI agent run in a self-learning loop…