Recursive self-improvement in AI, and why we should care

In October 2017, DeepMind published a paper that should have unsettled more people than it did. Their system, AlphaGo Zero, had learned the ancient game of Go from scratch. No human games as input. No opening theory. No strategic guidance. Just the rules.

After three days of playing itself, it beat the version that had defeated Lee Sedol the previous year by one hundred games to nil.

Three days. To master a game humans had been studying for four thousand years.

That moment is worth pausing on, because it is the cleanest demonstration we have of what people mean when they talk about recursive self-improvement in artificial intelligence. The system did not just learn. It taught itself, and each iteration of teaching produced a stronger student to teach the next round.

The idea has a longer history than the hype suggests. In 1965, the British mathematician I.J. Good wrote a short paper called Speculations Concerning the First Ultraintelligent Machine. His argument was simple. If you build a machine that is better than humans at intellectual tasks, then by definition it will be better at the task of designing intelligent machines. The second-generation machine will be better still, and so on. Good called the result an intelligence explosion. He was writing in 1965, sixteen years before the IBM PC.

What does it actually look like in 2026?

Three concrete examples are worth knowing.

The first is self-play. AlphaGo Zero, and its successor systems like MuZero, learn by competing against themselves. Every game produces training data for the next iteration. There is no ceiling set by human expertise, because no human is in the loop after the rules are programmed in. This is the same idea that has, over the past two years, allowed DeepMind’s mathematics systems to win silver and then gold medals at the International Mathematical Olympiad against the best teenage mathematicians in the world. In 2024, AlphaProof and AlphaGeometry 2 solved four of the six IMO problems, scoring at silver-medal level. One year later, Google’s Gemini with Deep Think achieved gold. Twelve months. One step up.

The second is AI-assisted AI research. The most under-discussed shift of the last two years is that the people building frontier AI systems are now using those systems to design the next ones. Better architectures, better training schedules, better evaluation methods. Anthropic, OpenAI and Google DeepMind have all publicly acknowledged this. The loop is not yet fully closed, but it is tightening.

The third is fully automated research. In August 2024, a Tokyo-based lab called Sakana AI published a system they called The AI Scientist. It generates research ideas, writes the code to test them, runs experiments, produces figures, drafts the paper, and even reviews its own work. The first version cost roughly fifteen dollars per paper. The second version, published in Nature this year, produced the first fully AI-generated research paper to pass human peer review. The complete cycle from question to publication, with no human writer in the loop.

None of these systems are general intelligence. None of them can run your business or replace your judgement. But each one demonstrates the same underlying mechanism: a feedback loop where the system improves the process that improves the system.

Why does this matter for someone running a mortgage firm, advising clients, or sitting on a leadership team?

For two reasons that have nothing to do with science fiction.

The first is competitive. The pace at which AI capabilities are improving is not constant. It is accelerating, and the acceleration has a specific cause. When a tool helps to build a better version of itself, you stop being able to plan around it the way you plan around other technologies. The conversations I have with brokers who think they have six or seven years to adapt are conversations I would not have if I had read I.J. Good in 1965.

The second is governance. If a system improves itself, the system at deployment is not the system you signed off. Most regulatory frameworks, including the work the FCA is doing on AI in financial services, assume that you can audit a model, approve it, and then check whether it still behaves the way you approved. That assumption holds only weakly for static models. It breaks entirely for systems that adapt while in use. Anyone deploying AI in a regulated environment should be asking their vendors very direct questions about how the system changes after go-live, and what controls exist around those changes.

There is a separate, harder question lurking underneath all of this. If the recursive loop closes more aggressively than the optimists expect, the systems that emerge will not be tools in any meaningful sense. They will be something new. The Asilomar AI Principles, written by the people who actually build these systems, address this in their twenty-second principle, which calls for strict safety and control measures around any AI system designed to recursively self-improve. That principle was signed by Hassabis, Bengio, Russell and Hawking. It is not a fringe view.

Most of the technology that runs the global economy today did not exist when I started in financial services. Most of what will run it in 2035 does not exist today. The difference is that, for the first time, the system that does not yet exist is being designed in part by the system that does.

Worth a few minutes of thought.

Sources: AlphaGo Zero, DeepMind / Nature (2017). I.J. Good, Speculations Concerning the First Ultraintelligent Machine (1965). Sakana AI, The AI Scientist (2024) and The AI Scientist-v2 (Nature, 2026). DeepMind, AlphaProof and AlphaGeometry 2 at IMO (2024). The Asilomar AI Principles are hosted at futureoflife.org/open-letter/ai-principles.

Contact Details

Do you have any questions?

Leave your details and I will be in touch shortly