Should NVIDIA be Scared of Ironwood?
SemiAnalysis's recent report, "Google TPUv7: The 900lb Gorilla In the Room" caught our attention for its comparison of Google's Ironwood (TPU v7) against NVIDIA's Blackwell platform. In it, SemiAnalysis argues that Ironwood closes much of the effective-throughput gap with NVIDIA’s Blackwell-class GPUs by delivering higher sustained utilization (“MFU”) rather than inflated peak FLOPs. SemiAnalysis claims that, for large and steady workloads, Ironwood can deliver comparable effective performance at a lower cost per useful FLOP for buyers able to operate at hyperscale.
The report might be controversial to some, but what is clear is that Ironwood is no longer a niche internal asset; with general availability imminent, Google seems ready to push TPUs beyond internal cloud workloads and position them as a true alternative for major enterprise AI deployments. Multiple outlets note increased interest from large AI firms and cloud customers - for example, Anthropic’s sizable purchase of TPUs in late October. There is even speculation that companies beyond just cloud-native AI labs, e.g., high-frequency trading firms and large enterprises, may adopt TPUs. But is Google truly positioned to “unseat” NVIDIA with their TPU?
Ironwood TL;DR
The Ironwood’s TPU architecture targets inference-first workloads: each chip brings 192 GB of HBM3e and up to 7.4 TB/s memory bandwidth, and pods can scale to 9,216 chips, delivering 42.5 exaflops of aggregate compute and 1.77 PB of shared HBM memory. For organizations running large (in parameter size) language models, retrieval pipelines, embedding services, or constant high-throughput inference, this design is compelling. It is engineered for predictable token throughput, minimized data movement, and high memory parallelism, exactly the kind of workload where GPUs may suffer inefficiencies due to mixed workloads, underutilized tensor pipelines, or memory bottlenecks.
With that said, significant doubts remain about how broadly that value proposition will generalize beyond hyperscalers. Google has historically constrained TPU access to internal or massive external customers; smaller enterprises often struggled with usability, provisioning latency, lack of fixed IPs, weak observability tooling, and general lock-in concerns when using legacy TPUs - if they could gain access to the chips at all. Even now, for non-hyperscale buyers or mixed workloads that include both training and inference, the flexibility, generality, and maturity of NVIDIA’s GPU ecosystem remain strong counterarguments.
Security
Does this have anything to do with Confident Security? Yes. Google frames Ironwood not merely as a raw accelerator but as a piece of a broader secure-compute stack: their newly publicized private-compute infrastructure promises hardware-isolated enclaves, memory isolation, encrypted data-in-use, and attested execution. Sound familiar? That may be because OpenPCC does this (and more). In theory, this security posture elevates TPU infrastructure from a raw performance story to a potential basis for confidential AI workloads under regulatory or privacy constraints - no other ASIC on the market offers confidential computing.
Even so, I’ll wait to sing Ironwood’s praises. The new enclave-based TPU compute environment lacks the public scrutiny, maturity, and third-party audits that long-standing TEE (trusted execution environment) deployments require before enterprises will trust them with sensitive data at scale. For now, Google’s guarantees are based on vendor documentation and design promises, not decades of battle-tested deployment.
By contrast, NVIDIA’s confidential-computing capabilities are already field-proven. Innumerable studies document how GPU TEEs can protect ML workloads with modest performance overhead on modern systems. While overhead increases under less ideal conditions (e.g., frequent model swapping, mixed precision, or distributed parallel training), the costs of doing so under confidential conditions are known, measurable, and acceptable for security-sensitive workloads. Because this path is mature and well understood, it aligns better with conservative security requirements, regulatory compliance, and the principle of least surprise across heterogeneous customer environments. This trust in NVIDIA’s maturity is best exemplified by Blackwell support by emergent ASICs like Amazon’s Trainium 3, which is CUDA compatible and, in future versions, will be NVLink compatible.
Given this context, I would say Ironwood and its TPU-based private compute stack represent a major architectural advance that may reshape the AI-infrastructure market - for ASICs. It introduces genuine competition to NVIDIA and could lead to downward pressure on GPU cost and margins, but for enterprises that require flexibility, mixed workloads, vendor-agnostic deployment, and provable confidential computing guarantees today, NVIDIA remains the safer foundation.
For our use of OpenPCC and emphasis on vendor-agnostic confidential computing, the current landscape suggests continuing to anchor on NVIDIA GPUs, unless Google wants to send some TPUs our way to test. If Google’s private-compute stack gains broad adoption, passes independent audits, and becomes available for general deployment beyond hyperscalers, TPUs might evolve into a credible parallel foundation. Until then, for regulated or high-sensitivity workloads requiring stability, auditability, and flexibility, GPUs remain the practical default.