Google Cloud Next 2026: TPU 8t / TPU 8i, Gemini Enterprise Agent Platform, and the “agentic enterprise”
At Cloud Next ‘26, Google split its eighth-generation TPUs into training (8t) and inference (8i) silicon, launched Gemini Enterprise Agent Platform atop Vertex, and published striking usage stats—3× training pod compute vs Ironwood, 80% better inference $/$, 1,152-chip inference pods, 75% AI-generated new code at Google, 16B+ customer tokens per minute. Primary sources: Google and Google Cloud official posts.
Google Cloud Next ‘26 (Las Vegas, April 2026) packaged one narrative: custom TPUs, Gemini (and partner models in places), and Gemini Enterprise as the end-to-end “agentic enterprise” layer. The same story echoed across X and trade press; the notes below are tied to primary Google and Google Cloud posts—not Grok- or second-hand summaries alone.
Per Google’s TPU 8 post, 8t and 8i are purpose-split for the agent era: training needs huge scale-up; inference needs memory bandwidth, low latency, and efficiency when many small steps chain together.
TPU 8t (training):
~3× compute per pod vs the prior generation (Google names Ironwood in the same post).
Up to 9,600 chips and 2 petabytes of shared HBM in a superpod; 121 exaFLOPS FP4; 2× interchip bandwidth; 10× faster storage to the fabric (TPUDirect); Virgo and JAX / Pathways for large jobs. Pichai’s shorter post also references scaling to on the order of one million8t chips in one logical cluster for frontier training.
Google targets >97% “goodput” (productive training time) via RAS, rerouting, and OCS.
TPU 8i (inference):
Pichai: 1,152 TPUs in one 8i pod; 3× more on-chip SRAM than the prior generation; aimed at latency-sensitive and agent workloads.
The long TPU post adds 288 GB HBM and 384 MB on-chip SRAM per chip, doubled Axion hosts, Boardfly topology, a Collectives Acceleration Engine, and about 80% better performance per dollar vs the prior inference generation—Google’s own efficiency claim, not a cross-vendor GPU benchmark.
The new Gemini Enterprise (April 22, 2026) frames Gemini Enterprise as end-to-end: models, product surfaces, governance, and deployment. The Agent Platform evolves Vertex into a build/tune/govern stack with MCP support, Model Armor, Agent Identity, paths into the Gemini Enterprise app, and a governed agent gallery for employees.
Partner agents name Oracle, Salesforce, ServiceNow, Workday, Adobe, Accenture, and others. Salesforce and Google’s joint PR (Cloud Next ‘26) covers Agentforce Sales in Gemini Enterprise and cross-platform Slack / Workspace work. Workspace Intelligence and Agentic Data Cloud are summarized in the Next ‘26 news post.
75% new internal code, 16B+ tokens per minute, customer scale
From Pichai’s blog: 75% of new code at Google is AI-generated and engineer-approved (up from 50% last fall); a separate item highlights agentic workflows and a 6× faster complexmigration vs a year prior; and first-party Cloud models process more than 16 billion tokens per minute from direct customer API use, up from 10 billion the prior quarter.
Next ‘26 and the Cloud welcome add adoption scale: e.g. nearly 75% of Google Cloud customers using AI products; 330 customers each with >1 trillion tokens in 12 months; 35 above 10 trillion. Business press also tied the event to Alphabet share moves; this post is not financial advice.
Technical Deep Dive: TPU 8t vs. TPU 8i Architecture
Google's decision to split TPU 8 into training (8t) and inference (8i) reflects a fundamental shift in AI infrastructure philosophy: the optimal silicon for pre-training is not the same as for production serving.
TPU 8t: Scaling to Million-Chip Clusters
Training frontier models demands:
Massive parallelism: Distributing weight updates across thousands of accelerators
High bandwidth: Synchronizing gradients and activations without bottlenecking
Fault tolerance: Recovering from hardware failures without restarting multi-week jobs
Memory capacity: Storing enormous activation checkpoints for gradient calculation
TPU 8t's spec sheet addresses these:
Compute density: ~3x compute per pod vs. Ironwood (TPU v5e) means a 9,600-chip superpod delivers approximately 120 exaFLOPS in FP4 precision. For context:
GPT-4 training (estimated): ~25,000 exaFLOPS-days
Gemini Ultra training (estimated): ~50,000-100,000 exaFLOPS-days
A single TPU 8t superpod could theoretically train a GPT-4-scale model in ~200 days if goodput approaches Google's claimed >97%.
Memory hierarchy:
2 PB of shared HBM across 9,600 chips = ~208 GB per chip (comparable to H100's 80GB but pooled for flexibility)
IT helpdesk agent: Diagnose user issues, suggest knowledge base articles, escalate to human techs
HR onboarding agent: Guide new hires through paperwork, schedule trainings, provision accounts
Adobe Creative Cloud:
Design agent: Generate marketing assets (banners, social posts) from brand guidelines
Video editing agent: Auto-cut highlights from long recordings, add captions, apply brand overlays
MCP Support: Standardizing Agent-Tool Communication
Google's endorsement of Model Context Protocol (MCP) is significant—it signals a shift toward open standards rather than proprietary tool-calling formats.
What MCP enables:
Tool portability: Write a tool once (e.g., "query_database"), use it with any MCP-compatible LLM (Gemini, GPT-4, Claude)
Ecosystem interoperability: Third-party developers can publish MCP tools to a registry, and any agent can discover/use them
Reduced lock-in: If you switch from Gemini to another provider, your agent's tools still work
Example MCP tool for Gemini Enterprise:
# Define an MCP tool@mcp.tooldeffetch_customer_history(customer_id: str) -> dict:
"""Retrieve purchase history and support tickets for a customer."""# Query internal databasereturn database.query(f"SELECT * FROM customers WHERE id = {customer_id}")
# Register with Gemini Enterprise
gemini_enterprise.agents.register_tool(fetch_customer_history)
Now, any agent in the organization can invoke fetch_customer_history via natural language:
"Pull up the purchase history for customer #12345"
The agent translates this to an MCP tool call, retrieves the data, and continues the conversation.
The 75% AI-Generated Code Claim: What It Really Means
Sundar Pichai's statement—"75% of new code at Google is AI-generated and engineer-reviewed"—sparked widespread discussion. Let's unpack it:
What "AI-Generated" Likely Includes
Autocomplete snippets: Engineers typing in an IDE (likely using an internal Gemini-powered tool similar to GitHub Copilot) see inline suggestions that they accept via Tab.
Boilerplate generation: Common patterns (e.g., REST endpoint stubs, test templates, config files) are generated from natural-language descriptions.
Code translation: Migrating legacy systems (e.g., Java → Kotlin, Python 2 → Python 3) with AI assistance.
Refactoring: Automated rewrites to adopt new APIs or coding standards.
What "Engineer-Reviewed" Means
This is critical: 75% AI-generated does NOT mean 75% autonomous. Every line is:
Reviewed by a human engineer (via code review tools like Gerrit)
Tested by CI/CD pipelines (unit tests, integration tests, fuzz tests)
Monitored post-deployment (performance metrics, error rates, user feedback)
The "reviewed" qualifier suggests Google is not shipping untested AI code to production—engineers are the final gatekeepers.
Productivity Gains vs. Quality Risks
Productivity: If engineers previously spent 30% of their time writing boilerplate, and AI now handles that, they can focus on architecture and optimization—potentially 1.4x productivity boost.
Quality risks:
Subtle bugs: AI may introduce off-by-one errors, race conditions, or security vulnerabilities that pass tests but fail in edge cases
Technical debt: AI-generated code may be "correct but ugly," leading to harder-to-maintain codebases over time
Skill atrophy: Junior engineers relying too heavily on AI may not develop deep coding intuition
Google's >97% goodput metric (productive training time) suggests strong internal evals are in place, but the broader industry should remain cautious about treating 75% as a target without similar rigor.
The 16B Tokens/Minute Metric: Scale and Implications
Google processes >16 billion tokens per minute via customer API calls—a staggering number. Let's contextualize it:
Token Volume Breakdown
Assuming an average request length of 1,000 tokens input + 200 tokens output = 1,200 tokens total:
16B tokens/min ÷ 1,200 tokens/request = ~13.3 million requests/minute
Google's AI APIs are operating at Google Search-scale throughput.
Infrastructure Implications
To serve 220K requests/second:
TPU fleet size: Assuming 1 TPU 8i pod (1,152 chips) serves ~1,000 requests/second, Google needs ~220 pods = ~253,000 TPU 8i chips for Gemini API alone.
Cost: At $1-2/chip/hour amortized, that's $250K-500K/hour in hardware costs, or $2.2B-4.4B/year—before software, power, cooling, networking.
Power consumption: TPUs consume ~300W each under load, so 253K × 300W = 76 MW continuous power draw—equivalent to a small city.
Revenue Implications
If Google charges an average of $0.50 per 1M tokens (blended input/output pricing):
16B tokens/min × 60 min × 24 hr × 365 days = 8.4 quadrillion tokens/year
8.4Q tokens × $0.50/1M = $4.2B annual revenue from Gemini API
Compare to:
Google Cloud revenue (2025): ~$40B/year
Gemini API as % of Cloud: ~10% (if $4.2B is accurate)
This positions AI APIs as a top-tier revenue driver for Google Cloud, justifying massive TPU capex.
explainx.ai: Multicloud, Skills, and Verification
Google is packaging silicon, models, governance, and SaaS partners in one story—compelling for GCP-centric shops, but remember: platform lock-in is real. Portable patterns matter.
Portable agent primitives still matter: Agent skills, MCP, and explainx.ai/skills help connectors and workflows survive model and host changes. Don't architect exclusively around Gemini Enterprise unless you're prepared to refactor if you switch providers.
The 75% figure is a process metric at one company—pair any keynote stat with hallucination literacy and your own evals. Google has elite infrastructure, rigorous review processes, and internal tools unavailable to most teams. Your mileage will vary.
Trust boundaries, registries, and verification first: Courses teach the same primitives regardless of cloud provider. The fundamentals (prompt engineering, tool design, eval harnesses, security) transfer across platforms.
Consider TCO beyond API pricing: Gemini Enterprise's per-seat licensing (pricing undisclosed) may be more expensive than OpenAI/Anthropic for small teams but cheaper at enterprise scale. Model costs, infrastructure overhead, and engineering productivity all factor into true cost-of-ownership.
SKUs, dates, and claims evolve. Re-verify on Cloud Next and product pages before plans or procurement. This article reflects announcements as of April 2026 and is not investment advice.