We evolved for a linear world. If you walk for an hour, you cover a certain distance. Walk for two hours and you cover twice that distance. This intuition served us well on Savannah. But it fails catastrophically when confronted with AI and the underlying exponential trends at its heart.
Since I started working on AI in 2010, the amount of training data going into frontier AI models has grown by a staggering 1 trillion times—from about 10¹⁴ flops (floating-point operations, the basic unit of computation) for early systems to 10²⁶ flops for today’s largest models. It’s a blast. Everything else in AI follows from this fact.
Doubts keep predicting the walls. And they keep getting wrong in the face of this epic generation of compute ramps. Often, they point out that Moore’s Law is slowing down. They also cite a lack of data, or they cite energy constraints.
But when you look at the combined forces driving this revolution, the clear trend seems pretty predictable. To understand why, it’s important to look at the complex and fast-moving reality beneath the headlines.
Think of AI training as a room full of people working on calculators. For years, adding computational power meant adding more people with calculators to the room. Most of the time those workers sat idly, tapping their fingers on their desks, waiting for the numbers to come in for their next calculation. Every interval was wasted potential. Today’s revolution goes beyond more and better calculators (although it does provide them); It’s actually about making sure that all those calculators never stop, and that they work together.
Three steps are now being taken to enable this. First, basic calculators got faster. Nvidia’s chips have increased raw performance eightfold in just six years. 312 teraflops in 2020 To 2,500 teraflops today. Our own Maya 200 The chip, launched this January, offers 30% better performance per dollar than any other hardware in our fleet. Second, the numbers arrive faster thanks to a technology called HBM, or high-bandwidth memory, which stacks chips vertically like miniature skyscrapers. The latest generation, HBM3, triples the bandwidth of its predecessor, delivering data to processors fast enough to keep them busy all the time. Third, the room of people with calculators became an office and then an entire campus or city. Technologies like NVLink And Infinity band Connect hundreds of thousands of GPUs to warehouse-sized supercomputers that act as single cognitive entities. A few years ago this was impossible.
All these advantages combine to give dramatically higher calculations. Whereas in 2020 it took 167 minutes to train a language model on eight GPUs, it now takes less than four minutes on equivalent modern hardware. To put this in perspective: Moore’s law will predict There has only been a 5x improvement over this period. We saw 50x. We’re training two GPUs on AlexNet, the image recognition model that started the modern boom in deep learning in 2012, in today’s largest clusters of over 100,000 GPUs, each individually more powerful than its predecessors.
Then there is the software revolution. Research from Epoch AI suggests that the compute required to reach a given performance level is halved approximately every eight months, much faster than Moore’s Law’s traditional doubling time of 18 to 24 months. Service costs for some recent models have dropped by a factor of up to 900 on an annual basis. AI is becoming exponentially cheaper to deploy.
The figures for the near future are equally staggering. Consider that leading labs are adding capacity at a rate of about 4x per year. Since 2020, the amount of compute used to train frontier models has increased. 5x per year. Global AI-related computing is predicted to reach 100 million H100 equivalents by 2027, a tenfold increase in three years. Put it all together and we’re looking at another 1,000x efficient compute by the end of 2028. It is conceivable that by 2030 we will bring an additional. 200 gigawatts of online computing each year – equivalent to the highest energy consumption of the UK, France, Germany and Italy.
What does all this get us? I believe it will drive the transition from chatbots to near-human-level agents—semi-autonomous systems capable of writing code for days, projects lasting weeks and months, making calls, negotiating contracts, managing logistics. Forget basic assistants answering questions. Think of teams of AI workers who deliberate, collaborate and execute. We’re only at the foot of this transition right now, and the implications go far beyond tech. Every industry built on knowledge work will be replaced.
The obvious obstacle here is energy. A single refrigerator-sized AI rack uses 120 kilowatts, the equivalent of 100 homes. But this appetite collides with another austerity: Solar energy costs has fallen by a factor of about 100 in 50 years. Battery prices There has been a 97% decline in three decades. There is a way to clean up visible scaling.
The capital is stationed. Engineering is providing. $100 billion clusters, 10 gigawatt power draws, warehouse-scale supercomputers … these are no longer science fiction. Ground is now being broken for these projects in the US and around the world. As a result, we are moving towards true knowledge abundance. At Microsoft AI, this is the world our Superintelligence Lab is planning and building for.
Skeptics accustomed to a linear world will continue to predict diminishing returns. They will continue to be surprised. The compute explosion is the technological story of our time, full stop. And this is just the beginning.
Mustafa Sulaiman is the CEO of Microsoft AI.