← Blog
explainx / blog

Google Cloud Next 2026: TPU 8t / TPU 8i, Gemini Enterprise Agent Platform, and the “agentic enterprise”

At Cloud Next ‘26, Google split its eighth-generation TPUs into training (8t) and inference (8i) silicon, launched Gemini Enterprise Agent Platform atop Vertex, and published striking usage stats—3× training pod compute vs Ironwood, 80% better inference $/$, 1,152-chip inference pods, 75% AI-generated new code at Google, 16B+ customer tokens per minute. Primary sources: Google and Google Cloud official posts.

13 min readYash Thakker
Google CloudGeminiTPUAI infrastructureEnterprise AIGoogle Cloud Next

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Google Cloud Next 2026: TPU 8t / TPU 8i, Gemini Enterprise Agent Platform, and the “agentic enterprise”

Google Cloud Next ‘26 (Las Vegas, April 2026) packaged one narrative: custom TPUs, Gemini (and partner models in places), and Gemini Enterprise as the end-to-end “agentic enterprise” layer. The same story echoed across X and trade press; the notes below are tied to primary Google and Google Cloud posts—not Grok- or second-hand summaries alone.

Read first (official, in order):


TPU 8t (training) and TPU 8i (inference)

Per Google’s TPU 8 post, 8t and 8i are purpose-split for the agent era: training needs huge scale-up; inference needs memory bandwidth, low latency, and efficiency when many small steps chain together.

TPU 8t (training):

  • ~3× compute per pod vs the prior generation (Google names Ironwood in the same post).
  • Up to 9,600 chips and 2 petabytes of shared HBM in a superpod; 121 exaFLOPS FP4; 2× interchip bandwidth; 10× faster storage to the fabric (TPUDirect); Virgo and JAX / Pathways for large jobs. Pichai’s shorter post also references scaling to on the order of one million 8t chips in one logical cluster for frontier training.
  • Google targets >97% “goodput” (productive training time) via RAS, rerouting, and OCS.
Live Bootcamp6 weeks

Complete AI Builder Bootcamp

Claude, Python automation & full-stack — 12 live sessions with Yash Thakker.

View bootcamp

The Complete AI Builder Bootcamp is the best AI development course for learning Claude AI, prompt engineering, Python automation, and full-stack web development. This intensive 6-week live bootcamp teaches you how to build AI-powered applications using Claude Projects, Claude Artifacts, Claude Code, and the complete Claude ecosystem. You'll master prompt engineering techniques, learn to create custom Claude connectors and MCP integrations, build Python automation workflows, develop full-stack websites with AI assistance, and create AI marketing agents.

The bootcamp includes 12 live Zoom sessions with Yash Thakker, founder of AISOLO Technologies and instructor to 350,000+ students. You'll build 8+ portfolio projects including AI playbooks, full-stack note-taking applications, Python automation scripts, marketing agents, and personal portfolio websites. The curriculum covers AI fundamentals, Claude Projects and Artifacts, Claude Co-work, Claude plugins and skills, Claude Code for Python development, full-stack development, AI marketing, and capstone projects.

Students receive 1-year access to all recordings, permanent Discord community access, a certificate of completion, and personalized career guidance. All enrollments include a 7-day money-back guarantee. This is the most comprehensive Claude AI bootcamp available, taking students from zero AI knowledge to expert AI builder in 6 weeks.

TPU 8i (inference):

  • Pichai: 1,152 TPUs in one 8i pod; 3× more on-chip SRAM than the prior generation; aimed at latency-sensitive and agent workloads.
  • The long TPU post adds 288 GB HBM and 384 MB on-chip SRAM per chip, doubled Axion hosts, Boardfly topology, a Collectives Acceleration Engine, and about 80% better performance per dollar vs the prior inference generation—Google’s own efficiency claim, not a cross-vendor GPU benchmark.

GA: public posts say later in 2026; request TPU information for commercial follow-up.


Gemini Enterprise Agent Platform and ecosystem

The new Gemini Enterprise (April 22, 2026) frames Gemini Enterprise as end-to-end: models, product surfaces, governance, and deployment. The Agent Platform evolves Vertex into a build/tune/govern stack with MCP support, Model Armor, Agent Identity, paths into the Gemini Enterprise app, and a governed agent gallery for employees.

Partner agents name Oracle, Salesforce, ServiceNow, Workday, Adobe, Accenture, and others. Salesforce and Google’s joint PR (Cloud Next ‘26) covers Agentforce Sales in Gemini Enterprise and cross-platform Slack / Workspace work. Workspace Intelligence and Agentic Data Cloud are summarized in the Next ‘26 news post.


75% new internal code, 16B+ tokens per minute, customer scale

From Pichai’s blog: 75% of new code at Google is AI-generated and engineer-approved (up from 50% last fall); a separate item highlights agentic workflows and a faster complex migration vs a year prior; and first-party Cloud models process more than 16 billion tokens per minute from direct customer API use, up from 10 billion the prior quarter.

Next ‘26 and the Cloud welcome add adoption scale: e.g. nearly 75% of Google Cloud customers using AI products; 330 customers each with >1 trillion tokens in 12 months; 35 above 10 trillion. Business press also tied the event to Alphabet share moves; this post is not financial advice.


Technical Deep Dive: TPU 8t vs. TPU 8i Architecture

Google's decision to split TPU 8 into training (8t) and inference (8i) reflects a fundamental shift in AI infrastructure philosophy: the optimal silicon for pre-training is not the same as for production serving.

TPU 8t: Scaling to Million-Chip Clusters

Training frontier models demands:

  • Massive parallelism: Distributing weight updates across thousands of accelerators
  • High bandwidth: Synchronizing gradients and activations without bottlenecking
  • Fault tolerance: Recovering from hardware failures without restarting multi-week jobs
  • Memory capacity: Storing enormous activation checkpoints for gradient calculation

TPU 8t's spec sheet addresses these:

Compute density: ~3x compute per pod vs. Ironwood (TPU v5e) means a 9,600-chip superpod delivers approximately 120 exaFLOPS in FP4 precision. For context:

  • GPT-4 training (estimated): ~25,000 exaFLOPS-days
  • Gemini Ultra training (estimated): ~50,000-100,000 exaFLOPS-days

A single TPU 8t superpod could theoretically train a GPT-4-scale model in ~200 days if goodput approaches Google's claimed >97%.

Memory hierarchy:

  • 2 PB of shared HBM across 9,600 chips = ~208 GB per chip (comparable to H100's 80GB but pooled for flexibility)
  • HBM bandwidth: Likely 3-4 TB/s per chip (vs. H100's 3.35 TB/s)
  • Inter-chip bandwidth: 2x vs. TPU v5, suggesting ~100-200 GB/s per chip-to-chip link

Virgo networking: Google's proprietary optical circuit switch (OCS) fabric, replacing traditional electrical switches. Benefits:

  • Reconfigurable topology: Adapt interconnect patterns to match workload communication patterns
  • Lower latency: Optical switching reduces hop count for distant chips
  • Higher bisection bandwidth: More total network capacity for large-scale all-reduce operations

TPUDirect storage: 10x faster I/O to persistent storage allows:

  • Frequent checkpointing without slowing training
  • Multi-modal training where video/image data is streamed from disk rather than pre-loaded to HBM
  • Elastic scaling: Add/remove chips mid-job by reloading from checkpoint

TPU 8i: Optimizing for Agentic Inference

Production inference has different priorities:

  • Low latency: Users expect sub-second response times, even for long contexts
  • High throughput: Serving millions of concurrent requests efficiently
  • Cost efficiency: Inference is a continuous expense; training is a one-time cost
  • Dynamic batching: Combining requests of varying sizes without padding waste

TPU 8i's design choices:

On-chip SRAM: 384 MB per chip (3x vs. TPU v5i) reduces reliance on HBM for:

  • KV cache storage: Keep recent context in fast SRAM rather than slower HBM
  • Activation caching: Intermediate layer outputs stay on-chip during multi-turn conversations
  • Speculative decoding: Cache multiple candidate tokens for faster exploration

For a 7B parameter model, 384 MB of SRAM can hold:

  • ~24M tokens of KV cache at FP8 precision
  • ~100 concurrent users with 240K tokens of context each

This makes TPU 8i ideal for long-context agents that accumulate large conversation histories.

Boardfly topology: Unlike TPU 8t's flexible Virgo, Boardfly is a fixed high-radix network optimized for:

  • All-to-all communication: Agents invoking tools need fast scatter/gather across shards
  • Low diameter: Minimizing hops between any two chips reduces tail latency

Collectives Acceleration Engine (CAE): Hardware offload for common distributed primitives:

  • AllReduce: Aggregate predictions from ensemble models
  • AllGather: Collect tool outputs from distributed execution
  • Scatter/Broadcast: Distribute prompts to multi-model pipelines

CAE reduces CPU overhead and frees up TPU cores for pure inference.

Performance per dollar: Google's claim of 80% better $/performance vs. TPU v5i likely reflects:

  • Higher utilization: Better batching and SRAM caching mean fewer idle cycles
  • Lower power: Inference-optimized silicon consumes less energy per token
  • Longer lifespan: Purpose-built chips amortize capital costs over more requests

Gemini Enterprise Agent Platform: What's Actually New

Google Cloud has offered Vertex AI for years—so what makes the Gemini Enterprise Agent Platform different?

From Model API to Full Agent Stack

Old Vertex (2023-2025): Primarily a model hosting and fine-tuning platform. Developers got:

  • API access to PaLM, Gemini, and third-party models
  • Fine-tuning tools for domain adaptation
  • Batch inference for large-scale predictions

New Gemini Enterprise (2026): An opinionated agent development framework with:

  • Agent Identity: Managed authentication for agents acting on behalf of users
  • Agent Gateway: Centralized routing, rate limiting, and policy enforcement
  • Model Armor: Runtime safety guardrails (content filtering, PII redaction, jailbreak detection)
  • Simulation tooling: Synthetic test environments for validating agent behavior before production
  • Partner gallery: Pre-built agents from Oracle, Salesforce, ServiceNow, etc.

Model Armor: What It Actually Does

Model Armor is Google's answer to the "how do we safely deploy agents?" question. Key features:

Input validation:

  • Prompt injection detection: Flag requests that attempt to override system instructions
  • PII scanning: Redact SSNs, credit cards, phone numbers before sending to the LLM
  • Content filtering: Block NSFW, hate speech, or self-harm content

Output validation:

  • Hallucination detection: Cross-check LLM outputs against grounding sources (e.g., knowledge graphs, internal docs)
  • Toxicity scoring: Prevent agents from generating offensive responses
  • Citation enforcement: Require agents to cite sources for factual claims

Runtime monitoring:

  • Anomaly detection: Flag unusual request patterns (e.g., sudden spike in tool calls)
  • Cost guardrails: Prevent runaway token spend from poorly designed prompts
  • Compliance logging: Audit trails for GDPR, HIPAA, SOC 2 requirements

Example workflow:

response = gemini_enterprise.agents.create_completion(
    agent_id="customer-support-agent",
    user_input="Help me cancel my subscription",
    armor_config={
        "input_filters": ["pii_redaction", "prompt_injection"],
        "output_filters": ["hallucination_check", "toxicity"],
        "cost_limit_usd": 0.50,
    }
)

Agent Identity: Solving the Delegation Problem

Traditional AI APIs are unauthenticated: the LLM doesn't know which human user it's serving. This breaks workflows like:

  • "Email my manager the quarterly report" — which manager? Which email account?
  • "Book a meeting with the sales team" — which calendar? Which sales team members?

Agent Identity binds agents to user contexts:

  1. OAuth delegation: User grants the agent permission to act on their behalf (via Google Workspace, Gmail, Calendar, Drive)
  2. Scoped credentials: Agent receives temporary tokens with limited permissions (e.g., "read calendar, write emails")
  3. Audit trails: All agent actions are logged with user attribution for compliance

Security model:

  • Agents cannot escalate privileges beyond what the user delegated
  • Tokens expire after a configurable period (default: 1 hour)
  • Users can revoke agent access at any time via Google Account settings

Partner Agent Gallery: Integration Ecosystem

Google announced 30+ partner agents at Cloud Next, including:

Salesforce Agentforce:

  • Sales agent: Auto-qualify leads from Gmail/Meet transcripts, update CRM records, draft follow-up emails
  • Service agent: Resolve support tickets by pulling from knowledge base and past case history
  • Marketing agent: Generate campaign briefs based on analytics data and market trends

Oracle Cloud Agents:

  • Finance agent: Reconcile invoices, flag anomalies in spend patterns, generate budget reports
  • Supply chain agent: Monitor shipment delays, suggest alternate suppliers, reorder inventory

ServiceNow Workflows:

  • IT helpdesk agent: Diagnose user issues, suggest knowledge base articles, escalate to human techs
  • HR onboarding agent: Guide new hires through paperwork, schedule trainings, provision accounts

Adobe Creative Cloud:

  • Design agent: Generate marketing assets (banners, social posts) from brand guidelines
  • Video editing agent: Auto-cut highlights from long recordings, add captions, apply brand overlays

MCP Support: Standardizing Agent-Tool Communication

Google's endorsement of Model Context Protocol (MCP) is significant—it signals a shift toward open standards rather than proprietary tool-calling formats.

What MCP enables:

  • Tool portability: Write a tool once (e.g., "query_database"), use it with any MCP-compatible LLM (Gemini, GPT-4, Claude)
  • Ecosystem interoperability: Third-party developers can publish MCP tools to a registry, and any agent can discover/use them
  • Reduced lock-in: If you switch from Gemini to another provider, your agent's tools still work

Example MCP tool for Gemini Enterprise:

# Define an MCP tool
@mcp.tool
def fetch_customer_history(customer_id: str) -> dict:
    """Retrieve purchase history and support tickets for a customer."""
    # Query internal database
    return database.query(f"SELECT * FROM customers WHERE id = {customer_id}")

# Register with Gemini Enterprise
gemini_enterprise.agents.register_tool(fetch_customer_history)

Now, any agent in the organization can invoke fetch_customer_history via natural language:

"Pull up the purchase history for customer #12345"

The agent translates this to an MCP tool call, retrieves the data, and continues the conversation.

The 75% AI-Generated Code Claim: What It Really Means

Sundar Pichai's statement—"75% of new code at Google is AI-generated and engineer-reviewed"—sparked widespread discussion. Let's unpack it:

What "AI-Generated" Likely Includes

Autocomplete snippets: Engineers typing in an IDE (likely using an internal Gemini-powered tool similar to GitHub Copilot) see inline suggestions that they accept via Tab.

Boilerplate generation: Common patterns (e.g., REST endpoint stubs, test templates, config files) are generated from natural-language descriptions.

Code translation: Migrating legacy systems (e.g., Java → Kotlin, Python 2 → Python 3) with AI assistance.

Refactoring: Automated rewrites to adopt new APIs or coding standards.

What "Engineer-Reviewed" Means

This is critical: 75% AI-generated does NOT mean 75% autonomous. Every line is:

  • Reviewed by a human engineer (via code review tools like Gerrit)
  • Tested by CI/CD pipelines (unit tests, integration tests, fuzz tests)
  • Monitored post-deployment (performance metrics, error rates, user feedback)

The "reviewed" qualifier suggests Google is not shipping untested AI code to production—engineers are the final gatekeepers.

Productivity Gains vs. Quality Risks

Productivity: If engineers previously spent 30% of their time writing boilerplate, and AI now handles that, they can focus on architecture and optimization—potentially 1.4x productivity boost.

Quality risks:

  • Subtle bugs: AI may introduce off-by-one errors, race conditions, or security vulnerabilities that pass tests but fail in edge cases
  • Technical debt: AI-generated code may be "correct but ugly," leading to harder-to-maintain codebases over time
  • Skill atrophy: Junior engineers relying too heavily on AI may not develop deep coding intuition

Google's >97% goodput metric (productive training time) suggests strong internal evals are in place, but the broader industry should remain cautious about treating 75% as a target without similar rigor.

The 16B Tokens/Minute Metric: Scale and Implications

Google processes >16 billion tokens per minute via customer API calls—a staggering number. Let's contextualize it:

Token Volume Breakdown

Assuming an average request length of 1,000 tokens input + 200 tokens output = 1,200 tokens total:

  • 16B tokens/min ÷ 1,200 tokens/request = ~13.3 million requests/minute
  • = ~220,000 requests/second globally

For comparison:

  • ChatGPT reportedly handles ~10M requests/day (as of 2023) = ~115 requests/second
  • Google Search handles ~100,000 queries/second

Google's AI APIs are operating at Google Search-scale throughput.

Infrastructure Implications

To serve 220K requests/second:

  • TPU fleet size: Assuming 1 TPU 8i pod (1,152 chips) serves ~1,000 requests/second, Google needs ~220 pods = ~253,000 TPU 8i chips for Gemini API alone.

  • Cost: At $1-2/chip/hour amortized, that's $250K-500K/hour in hardware costs, or $2.2B-4.4B/year—before software, power, cooling, networking.

  • Power consumption: TPUs consume ~300W each under load, so 253K × 300W = 76 MW continuous power draw—equivalent to a small city.

Revenue Implications

If Google charges an average of $0.50 per 1M tokens (blended input/output pricing):

  • 16B tokens/min × 60 min × 24 hr × 365 days = 8.4 quadrillion tokens/year
  • 8.4Q tokens × $0.50/1M = $4.2B annual revenue from Gemini API

Compare to:

  • Google Cloud revenue (2025): ~$40B/year
  • Gemini API as % of Cloud: ~10% (if $4.2B is accurate)

This positions AI APIs as a top-tier revenue driver for Google Cloud, justifying massive TPU capex.

ExplainX: Multicloud, Skills, and Verification

  1. Google is packaging silicon, models, governance, and SaaS partners in one story—compelling for GCP-centric shops, but remember: platform lock-in is real. Portable patterns matter.

  2. Portable agent primitives still matter: Agent skills, MCP, and explainx.ai/skills help connectors and workflows survive model and host changes. Don't architect exclusively around Gemini Enterprise unless you're prepared to refactor if you switch providers.

  3. The 75% figure is a process metric at one company—pair any keynote stat with hallucination literacy and your own evals. Google has elite infrastructure, rigorous review processes, and internal tools unavailable to most teams. Your mileage will vary.

  4. Trust boundaries, registries, and verification first: Courses teach the same primitives regardless of cloud provider. The fundamentals (prompt engineering, tool design, eval harnesses, security) transfer across platforms.

  5. Consider TCO beyond API pricing: Gemini Enterprise's per-seat licensing (pricing undisclosed) may be more expensive than OpenAI/Anthropic for small teams but cheaper at enterprise scale. Model costs, infrastructure overhead, and engineering productivity all factor into true cost-of-ownership.

Read next: Claude Code /ultrareview · Agent skills and security · Chrome "Skills" vs SKILL.md · DeepSeek V4-Pro: Benchmarks and Pricing · What are Agent Skills?


SKUs, dates, and claims evolve. Re-verify on Cloud Next and product pages before plans or procurement. This article reflects announcements as of April 2026 and is not investment advice.

Related posts