
For more than a decade, NVIDIA’s GPUs have underpinned nearly every major advance in modern AI. Now this position is being challenged.
Frontier models like Google’s Gemini 3 and Anthropic’s Cloud 4.5 Ops were trained not on NVIDIA hardware, but on Google’s latest tensor processing units, Ironwood-based TPUV7. This indicates that a viable alternative to the GPU-centric AI stack has already arrived—a frontier with real implications for scale training economics and architecture.
nvidia’s CUDA . "CUDA MOAT"; Once a team has built pipelines on CUDA, switching to another platform is prohibitively expensive due to its reliance on NVIDIA’s software stack. This, combined with NVIDIA’s first-mover advantage, helped the company achieve one 75% surprised Gross margin.
Unlike GPUs, TPUs were designed from day one as purpose-built silicon for machine learning. With each generation, Google has pushed further in massive AI acceleration, but now, the hardware behind the most capable AI models ever trained, the TPUV7 signals a broader strategy to challenge NVIDIA’s dominance.
Both GPUs and TPUs accelerate machine learning, but they reflect different design philosophies: GPUs are general-purpose parallel processors, while TPUs are purpose-built systems optimized almost exclusively for large-scale matrix multiplication. With TPUV7, Google takes this specialization even further by tightly integrating high-speed interconnects directly into the chip, allowing TPU pods to scale like a single supercomputer and reduce the cost and latency penalties that typically come with GPU-based clusters.
There are TPUs "Designed as a complete ‘system’ rather than just a chip," Val Bercocci, Chief AI Officer Wiccatold VentureBeat.
Google’s business pivots from internal to industry
Historically, Google restricted access to TPU entirely Cloud rentals On the Google Cloud Platform. In recent months, Google has begun offering the hardware directly to external customers, effectively removing the chip from the cloud service. Users can choose between treating compute as an operating expense by renting it through the cloud, or as a capital expense (buying the hardware outright), removing a major friction point for large AI labs that prefer to own their own hardware and effectively bypass it. "Cloud Rent" Premium for base hardware.
Central to Google’s shift in strategy is a landmark deal with Anthropic, where the creator of Cloud 4.5 Ops will have access to 1 million TPUV7 chips — more than a gigawatt of compute capacity. About 400,000 chips are being sold directly to Entropic by Broadcom, Google’s longtime physical design partner. The remaining 600,000 chips are leased through traditional Google Cloud contracts. Anthropic’s commitment adds billions of dollars to Google’s bottom line and locks a key competitor of Openi into Google’s ecosystem.
finished "CUDA MOAT"
For years, NVIDIA’s GPUs have been the clear market leader in AI infrastructure. In addition to its powerful hardware, NVIDIA’s CUDA ecosystem includes an extensive library of optimized kernels and frameworks. Coupled with extensive developer orientation and a huge installed base, enterprises were slowly shut down "CUDA MOAT," A structural constraint that made abandoning a GPU-based infrastructure impractically expensive.
One of the key blockers preventing wider TPU adoption is ecosystem friction. In the past, TPUs worked best with Jax, Google’s own numerical computing library designed for AI/ML research. However, mainstream AI development relies primarily on Pytorch, an open-source ML framework that can be built for CUDA.
Google is now directly addressing this gap. TPUV7 supports local protocol integration, incl Restless executionfull support for distributed APIs under PyTorch’s toolchain, torch.com, and custom TPU kernel support. The goal is to make Petrarch run as smoothly on GPUs as it does on NVIDIA GPUs.
Google is also contributing heavily vllm And sglangtwo popular open source inference frameworks. By optimizing these widely used tools for TPU, Google ensures that developers can change hardware without rewriting their entire code base.
Advantages and Disadvantages of GPUs vs TPUs
For enterprises comparing TPUs and GPUs for large-scale ML workloads, the benefits center primarily on cost, performance and scalability. Semianalysis recently published a Deep dive Weigh the pros and cons of both technologies, measuring cost-effectiveness as well as technical efficiency.
Thanks to its special architecture and maximum energy efficiency, the TPUV7 offers significantly better throughput per dollar for large-scale training and high-volume evaluation. This allows enterprises to reduce operational costs related to power, cooling, and data center resources. Semianalysis estimates that, for Google’s internal systems, the total cost of ownership (TCO) for an Ironwood-based server is about 44% lower than the TCO for an equivalent NVIDIA GB200 Blackwell server. Even after factoring in the profit margins of both Google and Broadcom, external customers like Entropic are seeing a ~30% reduction in costs compared to NVIDIA. "TPUs make sense for large-scale AI projects, when cost is paramount. With TPU, hyperscalers and AI labs can achieve a 30-50% TCO reduction, which can translate into billions in savings." Bercovici said.
This economic benefit is already reshaping the market. Only the existence of a viable alternative permitted the opening Negotiate a discount of ~30% On its own NVIDIA hardware. OpenAI is one of the largest buyers for NVIDIA GPUs, however, earlier this year, the company Added Google TPUs via Google Cloud To support its growing compute needs. Metta is also reportedly in advanced talks Get Google TPU for its data centers.
At this stage, it seems like Ironwood is an ideal solution for an enterprise architecture, but there are many trade-offs. While TPUs excel at specific deep learning workloads, they are far less flexible than GPUs, which can run a wide variety of algorithms, including non-AI tasks. If a new AI technique is invented tomorrow, the GPU will run it immediately. This makes GPUs more suitable for organizations that run a wider range of computational workloads than standard deep learning.
Migration from a GPU-centric environment can also be expensive and time-consuming, especially for TPUs not yet optimized for existing CUDA-based pipelines, custom GPU kernels, or this leveraged framework.
Bercovici suggests that companies "Choose GPU when they need to move quickly and time to market matters. GPU consumption standards infrastructure and the largest developer ecosystem, handle dynamic and complex workloads for which TPUs are not optimized, and are deployed in existing on-premises standards-based data centers without the need to rebuild custom power and networking."
Additionally, GPU dominance means more engineering talent is available. TPU demands a rare skill. "Leveraging the power of TPU requires engineering depth for an organization, which means being able to recruit and retain rare engineering talent who can write custom kernels and optimize compilers." Bercovici said.
In practice, the benefits of Ironwood can be realized for most enterprises with large, tensor-heavy workloads. Organizations requiring greater hardware flexibility, hybrid cloud strategies, or HPC-style flexibility may find GPUs a better fit. In many cases, a hybrid approach combining the two can offer the best balance of expertise and flexibility.
The future of AI architecture
The race for AI hardware dominance is heating up, but it’s too early to predict a winner — or if there even will be one. NVIDIA and Google are such fast-paced and innovative companies Like Amazon Joining the fray, future high-performance AI systems may be hybrids, integrating both TPUs and GPUs.
"Google Cloud is meeting demand for the acceleration of both our custom TPUs and Nvidia GPUs,” a Google spokesperson told VentureBeat. As a result, we are significantly expanding our NVIDIA GPU offerings to meet customer demand. The reality is that the majority of our Google Cloud users use both GPUs and TPUs. With our wide selection of the latest NVIDIA GPUs and seven generations of custom TPUs, we offer customers the flexibility of choice to optimize their specific needs."