How conversational AI actually happened, told through the people who made it

When ChatGPT landed in November 2022 and a hundred million people signed up in two months, most of us had never heard of Ilya Sutskever. Eighteen months later, the same name was on the front page of the Financial Times because he had voted to fire Sam Altman at OpenAI. The board reversed itself within a week. Sutskever quietly resigned a few months later and started his own company. By April this year it was reportedly valued at thirty-two billion dollars.

You can read all of that and still miss what is interesting about him.

What’s interesting is that for the past fifteen years, almost every time the field of artificial intelligence has shifted, this same quiet Russian-Israeli-Canadian researcher has been in the room. Often standing right next to the breakthrough.

He was born in 1986 in what was then the Soviet city of Gorky, now Nizhny Novgorod. Aged five, his family emigrated to Israel and he grew up in Jerusalem. At sixteen they moved again, this time to Canada. He arrived at the University of Toronto a few months later as a third-year undergraduate, eventually finding his way into the lab of a British computer scientist called Geoffrey Hinton.

Hinton had spent thirty years arguing that computers could be taught to see, hear and read by mimicking the layered structure of the brain. Most of his peers thought he was wasting his time. In 2012, Hinton, his student Alex Krizhevsky, and Sutskever entered an image recognition contest called ImageNet. Their model, AlexNet, did not just win. It beat the competition by such a margin that the entire field of computer vision changed direction within a year. That moment is now widely treated as the start of the modern AI era.

Sutskever was twenty-six.

Two years later, having joined Google, he co-authored a paper called Sequence to Sequence Learning with Neural Networks. The idea was simple to state and difficult to do. You took one sequence of words, in English say, and you trained a neural network to produce another sequence in French. No grammar rules, no dictionaries. Two networks, one to read and one to write, learning the relationship between them by being shown millions of examples.

It worked. Google Translate quietly shifted from rule-based translation to neural translation over the next few years and got dramatically better. More importantly for our story, Sutskever had now done the thing twice. He had been in the room when computers learned to see, and he had been in the room when computers learned to translate. Both times, the breakthrough came from giving up on telling the machine what to do, and instead showing it enough examples that it figured out the rules on its own.

In December 2015, Elon Musk and Sam Altman announced they were funding a new AI research lab called OpenAI. Their founding chief scientist was Ilya Sutskever. He left a senior position at Google to take it.

For the next two years, the lab did interesting work but nothing the broader world noticed. Then, in the summer of 2017, eight researchers at Google Brain published a paper with a title that sounded more like a Beatles album than a research paper.

Attention Is All You Need.

The paper introduced something they called the transformer. To understand why it mattered, you have to know what came before. Earlier neural networks read text the way you and I do, one word at a time, left to right, holding the meaning of the previous words in a sort of working memory. The trouble is that working memory fades. By the time you reach the end of a long paragraph, the network has half forgotten the start.

The transformer threw that approach out. Instead of reading sequentially, it looked at all the words at once and learned to “attend” to whichever earlier words were relevant to the word it was currently processing. Reading the sentence “the bank refused her loan because it was struggling,” the model learns to associate “it” with “the bank,” not with “her loan.” It does this not by following grammar rules but by being shown enough text to have noticed the pattern itself.

That single shift, paying attention to everything at once rather than reading in sequence, did two enormous things. It made models far better at handling context across long passages of text. And, almost by accident, it made them much easier to train on huge amounts of data, because the calculations could now be done in parallel rather than one word after another. Suddenly you could throw vastly more computing power at the problem and watch the model get better in a fairly predictable way.

The eight authors of that paper, none of whom were called Ilya Sutskever, had handed the field a new engine. The question was who would build the car.

Sutskever did. Or more accurately, he led the team at OpenAI that did. They took the transformer architecture, which Google had treated as a useful internal tool, and they kept making it bigger. GPT-1 in 2018. GPT-2 in 2019, which OpenAI initially refused to release because it was worried the text it produced was too convincing. GPT-3 in 2020. GPT-4 in 2023. And in November 2022, almost as an afterthought, they wrapped a chat interface around it and called it ChatGPT.

A hundred million users in two months. The fastest consumer technology adoption in history.

The point of this story, for people who run businesses rather than train neural networks, is the lineage. ChatGPT did not arrive out of nowhere. Behind it sits a chain of human decisions stretching back forty years. Hinton stubbornly working on neural networks when the field had given up on them. Sutskever showing up at his lab as an immigrant teenager. The eight Google researchers writing a paper that their employer never quite knew what to do with. And Sutskever again, at a different company, willing to bet that if you took that paper and threw enough money and computing power at it, something remarkable would happen.

Every part of this depended on individual people choosing to back ideas that the consensus thought were finished, eccentric or trivial.

It’s also worth looking at what kind of person Sutskever is. He doesn’t give many interviews. He has none of the flamboyance of Musk or the polish of Altman. He’s, by all accounts, a serious man who thinks about safety, alignment and the long-term implications of what he is helping to build. When he disagreed strongly enough with the direction OpenAI was taking, he tried to remove the chief executive, lost, and quietly walked out to start a company whose first stated product is not a tool but a research outcome.

You can disagree with his judgement on any of that. What is harder to disagree with is that he keeps being right about where the field is going.

For anyone in financial services trying to make sense of AI, the practical takeaway sits underneath the biographical detail. Breakthroughs tend to come from a small number of people, working on something the mainstream considers unfashionable, for years, until the world catches up. The people who actually move the field forward almost always sound curious rather than certain.

That’s worth remembering the next time someone tells you with great confidence what AI will and will not be able to do in five years’ time.

Sources:

Ashish Vaswani et al., “Attention Is All You Need,” 2017 — arXiv:1706.03762
Ilya Sutskever, Oriol Vinyals, Quoc V. Le, “Sequence to Sequence Learning with Neural Networks,” 2014 — arXiv:1409.3215
Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” 2012

Contact Details

Do you have any questions?

Leave your details and I will be in touch shortly