Google Cloud Next ‘26 (Las Vegas, April 2026) packaged one narrative: custom TPUs, Gemini (and partner models in places), and Gemini Enterprise as the end-to-end “agentic enterprise” layer. The same story echoed across X and trade press; the notes below are tied to primary Google and Google Cloud posts—not Grok- or second-hand summaries alone.
Read first (official, in order):
- Sundar Pichai — Google Cloud Next 2026 — TPU 8t/8i, 75% AI-generated new code (engineer-approved), 16B+ tokens/minute, Wiz, CapEx.
- Our eighth generation TPUs: two chips for the agentic era — Amin Vahdat; full 8t/8i technical story.
- The new Gemini Enterprise: one platform for agent development — Agent Platform, app, partners.
- Google Cloud Next ‘26 — news and updates and Welcome to Google Cloud Next26 — customer + token scale.
TPU 8t (training) and TPU 8i (inference)
Per Google’s TPU 8 post, 8t and 8i are purpose-split for the agent era: training needs huge scale-up; inference needs memory bandwidth, low latency, and efficiency when many small steps chain together.
TPU 8t (training):
- ~3× compute per pod vs the prior generation (Google names Ironwood in the same post).
- Up to 9,600 chips and 2 petabytes of shared HBM in a superpod; 121 exaFLOPS FP4; 2× interchip bandwidth; 10× faster storage to the fabric (TPUDirect); Virgo and JAX / Pathways for large jobs. Pichai’s shorter post also references scaling to on the order of one million 8t chips in one logical cluster for frontier training.
- Google targets >97% “goodput” (productive training time) via RAS, rerouting, and OCS.
Complete AI Builder Bootcamp
Claude, Python automation & full-stack — 12 live sessions with Yash Thakker.
The Complete AI Builder Bootcamp is the best AI development course for learning Claude AI, prompt engineering, Python automation, and full-stack web development. This intensive 6-week live bootcamp teaches you how to build AI-powered applications using Claude Projects, Claude Artifacts, Claude Code, and the complete Claude ecosystem. You'll master prompt engineering techniques, learn to create custom Claude connectors and MCP integrations, build Python automation workflows, develop full-stack websites with AI assistance, and create AI marketing agents.
The bootcamp includes 12 live Zoom sessions with Yash Thakker, founder of AISOLO Technologies and instructor to 350,000+ students. You'll build 8+ portfolio projects including AI playbooks, full-stack note-taking applications, Python automation scripts, marketing agents, and personal portfolio websites. The curriculum covers AI fundamentals, Claude Projects and Artifacts, Claude Co-work, Claude plugins and skills, Claude Code for Python development, full-stack development, AI marketing, and capstone projects.
Students receive 1-year access to all recordings, permanent Discord community access, a certificate of completion, and personalized career guidance. All enrollments include a 7-day money-back guarantee. This is the most comprehensive Claude AI bootcamp available, taking students from zero AI knowledge to expert AI builder in 6 weeks.
TPU 8i (inference):
- Pichai: 1,152 TPUs in one 8i pod; 3× more on-chip SRAM than the prior generation; aimed at latency-sensitive and agent workloads.
- The long TPU post adds 288 GB HBM and 384 MB on-chip SRAM per chip, doubled Axion hosts, Boardfly topology, a Collectives Acceleration Engine, and about 80% better performance per dollar vs the prior inference generation—Google’s own efficiency claim, not a cross-vendor GPU benchmark.
GA: public posts say later in 2026; request TPU information for commercial follow-up.
Gemini Enterprise Agent Platform and ecosystem
The new Gemini Enterprise (April 22, 2026) frames Gemini Enterprise as end-to-end: models, product surfaces, governance, and deployment. The Agent Platform evolves Vertex into a build/tune/govern stack with MCP support, Model Armor, Agent Identity, paths into the Gemini Enterprise app, and a governed agent gallery for employees.
Partner agents name Oracle, Salesforce, ServiceNow, Workday, Adobe, Accenture, and others. Salesforce and Google’s joint PR (Cloud Next ‘26) covers Agentforce Sales in Gemini Enterprise and cross-platform Slack / Workspace work. Workspace Intelligence and Agentic Data Cloud are summarized in the Next ‘26 news post.
75% new internal code, 16B+ tokens per minute, customer scale
From Pichai’s blog: 75% of new code at Google is AI-generated and engineer-approved (up from 50% last fall); a separate item highlights agentic workflows and a 6× faster complex migration vs a year prior; and first-party Cloud models process more than 16 billion tokens per minute from direct customer API use, up from 10 billion the prior quarter.
Next ‘26 and the Cloud welcome add adoption scale: e.g. nearly 75% of Google Cloud customers using AI products; 330 customers each with >1 trillion tokens in 12 months; 35 above 10 trillion. Business press also tied the event to Alphabet share moves; this post is not financial advice.
Technical Deep Dive: TPU 8t vs. TPU 8i Architecture
Google's decision to split TPU 8 into training (8t) and inference (8i) reflects a fundamental shift in AI infrastructure philosophy: the optimal silicon for pre-training is not the same as for production serving.
TPU 8t: Scaling to Million-Chip Clusters
Training frontier models demands:
- Massive parallelism: Distributing weight updates across thousands of accelerators
- High bandwidth: Synchronizing gradients and activations without bottlenecking
- Fault tolerance: Recovering from hardware failures without restarting multi-week jobs
- Memory capacity: Storing enormous activation checkpoints for gradient calculation
TPU 8t's spec sheet addresses these:
Compute density: ~3x compute per pod vs. Ironwood (TPU v5e) means a 9,600-chip superpod delivers approximately 120 exaFLOPS in FP4 precision. For context:
- GPT-4 training (estimated): ~25,000 exaFLOPS-days
- Gemini Ultra training (estimated): ~50,000-100,000 exaFLOPS-days
A single TPU 8t superpod could theoretically train a GPT-4-scale model in ~200 days if goodput approaches Google's claimed >97%.
Memory hierarchy:
- 2 PB of shared HBM across 9,600 chips = ~208 GB per chip (comparable to H100's 80GB but pooled for flexibility)
- HBM bandwidth: Likely 3-4 TB/s per chip (vs. H100's 3.35 TB/s)
- Inter-chip bandwidth: 2x vs. TPU v5, suggesting ~100-200 GB/s per chip-to-chip link
Virgo networking: Google's proprietary optical circuit switch (OCS) fabric, replacing traditional electrical switches. Benefits:
- Reconfigurable topology: Adapt interconnect patterns to match workload communication patterns
- Lower latency: Optical switching reduces hop count for distant chips
- Higher bisection bandwidth: More total network capacity for large-scale all-reduce operations
TPUDirect storage: 10x faster I/O to persistent storage allows:
- Frequent checkpointing without slowing training
- Multi-modal training where video/image data is streamed from disk rather than pre-loaded to HBM
- Elastic scaling: Add/remove chips mid-job by reloading from checkpoint
TPU 8i: Optimizing for Agentic Inference
Production inference has different priorities:
- Low latency: Users expect sub-second response times, even for long contexts
- High throughput: Serving millions of concurrent requests efficiently
- Cost efficiency: Inference is a continuous expense; training is a one-time cost
- Dynamic batching: Combining requests of varying sizes without padding waste
TPU 8i's design choices:
On-chip SRAM: 384 MB per chip (3x vs. TPU v5i) reduces reliance on HBM for:
- KV cache storage: Keep recent context in fast SRAM rather than slower HBM
- Activation caching: Intermediate layer outputs stay on-chip during multi-turn conversations
- Speculative decoding: Cache multiple candidate tokens for faster exploration
For a 7B parameter model, 384 MB of SRAM can hold:
- ~24M tokens of KV cache at FP8 precision
- ~100 concurrent users with 240K tokens of context each
This makes TPU 8i ideal for long-context agents that accumulate large conversation histories.
Boardfly topology: Unlike TPU 8t's flexible Virgo, Boardfly is a fixed high-radix network optimized for:
- All-to-all communication: Agents invoking tools need fast scatter/gather across shards
- Low diameter: Minimizing hops between any two chips reduces tail latency
Collectives Acceleration Engine (CAE): Hardware offload for common distributed primitives:
- AllReduce: Aggregate predictions from ensemble models
- AllGather: Collect tool outputs from distributed execution
- Scatter/Broadcast: Distribute prompts to multi-model pipelines
CAE reduces CPU overhead and frees up TPU cores for pure inference.
Performance per dollar: Google's claim of 80% better $/performance vs. TPU v5i likely reflects:
- Higher utilization: Better batching and SRAM caching mean fewer idle cycles
- Lower power: Inference-optimized silicon consumes less energy per token
- Longer lifespan: Purpose-built chips amortize capital costs over more requests
Gemini Enterprise Agent Platform: What's Actually New
Google Cloud has offered Vertex AI for years—so what makes the Gemini Enterprise Agent Platform different?
From Model API to Full Agent Stack
Old Vertex (2023-2025): Primarily a model hosting and fine-tuning platform. Developers got:
- API access to PaLM, Gemini, and third-party models
- Fine-tuning tools for domain adaptation
- Batch inference for large-scale predictions
New Gemini Enterprise (2026): An opinionated agent development framework with:
- Agent Identity: Managed authentication for agents acting on behalf of users
- Agent Gateway: Centralized routing, rate limiting, and policy enforcement
- Model Armor: Runtime safety guardrails (content filtering, PII redaction, jailbreak detection)
- Simulation tooling: Synthetic test environments for validating agent behavior before production
- Partner gallery: Pre-built agents from Oracle, Salesforce, ServiceNow, etc.
Model Armor: What It Actually Does
Model Armor is Google's answer to the "how do we safely deploy agents?" question. Key features:
Input validation:
- Prompt injection detection: Flag requests that attempt to override system instructions
- PII scanning: Redact SSNs, credit cards, phone numbers before sending to the LLM
- Content filtering: Block NSFW, hate speech, or self-harm content
Output validation:
- Hallucination detection: Cross-check LLM outputs against grounding sources (e.g., knowledge graphs, internal docs)
- Toxicity scoring: Prevent agents from generating offensive responses
- Citation enforcement: Require agents to cite sources for factual claims
Runtime monitoring:
- Anomaly detection: Flag unusual request patterns (e.g., sudden spike in tool calls)
- Cost guardrails: Prevent runaway token spend from poorly designed prompts
- Compliance logging: Audit trails for GDPR, HIPAA, SOC 2 requirements
Example workflow:
response = gemini_enterprise.agents.create_completion(
agent_id="customer-support-agent",
user_input="Help me cancel my subscription",
armor_config={
"input_filters": ["pii_redaction", "prompt_injection"],
"output_filters": ["hallucination_check", "toxicity"],
"cost_limit_usd": 0.50,
}
)
Agent Identity: Solving the Delegation Problem
Traditional AI APIs are unauthenticated: the LLM doesn't know which human user it's serving. This breaks workflows like:
- "Email my manager the quarterly report" — which manager? Which email account?
- "Book a meeting with the sales team" — which calendar? Which sales team members?
Agent Identity binds agents to user contexts:
- OAuth delegation: User grants the agent permission to act on their behalf (via Google Workspace, Gmail, Calendar, Drive)
- Scoped credentials: Agent receives temporary tokens with limited permissions (e.g., "read calendar, write emails")
- Audit trails: All agent actions are logged with user attribution for compliance
Security model:
- Agents cannot escalate privileges beyond what the user delegated
- Tokens expire after a configurable period (default: 1 hour)
- Users can revoke agent access at any time via Google Account settings
Partner Agent Gallery: Integration Ecosystem
Google announced 30+ partner agents at Cloud Next, including:
Salesforce Agentforce:
- Sales agent: Auto-qualify leads from Gmail/Meet transcripts, update CRM records, draft follow-up emails
- Service agent: Resolve support tickets by pulling from knowledge base and past case history
- Marketing agent: Generate campaign briefs based on analytics data and market trends
Oracle Cloud Agents:
- Finance agent: Reconcile invoices, flag anomalies in spend patterns, generate budget reports
- Supply chain agent: Monitor shipment delays, suggest alternate suppliers, reorder inventory
ServiceNow Workflows:
- IT helpdesk agent: Diagnose user issues, suggest knowledge base articles, escalate to human techs
- HR onboarding agent: Guide new hires through paperwork, schedule trainings, provision accounts
Adobe Creative Cloud:
- Design agent: Generate marketing assets (banners, social posts) from brand guidelines
- Video editing agent: Auto-cut highlights from long recordings, add captions, apply brand overlays
MCP Support: Standardizing Agent-Tool Communication
Google's endorsement of Model Context Protocol (MCP) is significant—it signals a shift toward open standards rather than proprietary tool-calling formats.
What MCP enables:
- Tool portability: Write a tool once (e.g., "query_database"), use it with any MCP-compatible LLM (Gemini, GPT-4, Claude)
- Ecosystem interoperability: Third-party developers can publish MCP tools to a registry, and any agent can discover/use them
- Reduced lock-in: If you switch from Gemini to another provider, your agent's tools still work
Example MCP tool for Gemini Enterprise:
# Define an MCP tool
@mcp.tool
def fetch_customer_history(customer_id: str) -> dict:
"""Retrieve purchase history and support tickets for a customer."""
# Query internal database
return database.query(f"SELECT * FROM customers WHERE id = {customer_id}")
# Register with Gemini Enterprise
gemini_enterprise.agents.register_tool(fetch_customer_history)
Now, any agent in the organization can invoke fetch_customer_history via natural language:
"Pull up the purchase history for customer #12345"
The agent translates this to an MCP tool call, retrieves the data, and continues the conversation.
The 75% AI-Generated Code Claim: What It Really Means
Sundar Pichai's statement—"75% of new code at Google is AI-generated and engineer-reviewed"—sparked widespread discussion. Let's unpack it:
What "AI-Generated" Likely Includes
Autocomplete snippets: Engineers typing in an IDE (likely using an internal Gemini-powered tool similar to GitHub Copilot) see inline suggestions that they accept via Tab.
Boilerplate generation: Common patterns (e.g., REST endpoint stubs, test templates, config files) are generated from natural-language descriptions.
Code translation: Migrating legacy systems (e.g., Java → Kotlin, Python 2 → Python 3) with AI assistance.
Refactoring: Automated rewrites to adopt new APIs or coding standards.
What "Engineer-Reviewed" Means
This is critical: 75% AI-generated does NOT mean 75% autonomous. Every line is:
- Reviewed by a human engineer (via code review tools like Gerrit)
- Tested by CI/CD pipelines (unit tests, integration tests, fuzz tests)
- Monitored post-deployment (performance metrics, error rates, user feedback)
The "reviewed" qualifier suggests Google is not shipping untested AI code to production—engineers are the final gatekeepers.
Productivity Gains vs. Quality Risks
Productivity: If engineers previously spent 30% of their time writing boilerplate, and AI now handles that, they can focus on architecture and optimization—potentially 1.4x productivity boost.
Quality risks:
- Subtle bugs: AI may introduce off-by-one errors, race conditions, or security vulnerabilities that pass tests but fail in edge cases
- Technical debt: AI-generated code may be "correct but ugly," leading to harder-to-maintain codebases over time
- Skill atrophy: Junior engineers relying too heavily on AI may not develop deep coding intuition
Google's >97% goodput metric (productive training time) suggests strong internal evals are in place, but the broader industry should remain cautious about treating 75% as a target without similar rigor.
The 16B Tokens/Minute Metric: Scale and Implications
Google processes >16 billion tokens per minute via customer API calls—a staggering number. Let's contextualize it:
Token Volume Breakdown
Assuming an average request length of 1,000 tokens input + 200 tokens output = 1,200 tokens total:
- 16B tokens/min ÷ 1,200 tokens/request = ~13.3 million requests/minute
- = ~220,000 requests/second globally
For comparison:
- ChatGPT reportedly handles ~10M requests/day (as of 2023) = ~115 requests/second
- Google Search handles ~100,000 queries/second
Google's AI APIs are operating at Google Search-scale throughput.
Infrastructure Implications
To serve 220K requests/second:
-
TPU fleet size: Assuming 1 TPU 8i pod (1,152 chips) serves ~1,000 requests/second, Google needs ~220 pods = ~253,000 TPU 8i chips for Gemini API alone.
-
Cost: At $1-2/chip/hour amortized, that's $250K-500K/hour in hardware costs, or $2.2B-4.4B/year—before software, power, cooling, networking.
-
Power consumption: TPUs consume ~300W each under load, so 253K × 300W = 76 MW continuous power draw—equivalent to a small city.
Revenue Implications
If Google charges an average of $0.50 per 1M tokens (blended input/output pricing):
- 16B tokens/min × 60 min × 24 hr × 365 days = 8.4 quadrillion tokens/year
- 8.4Q tokens × $0.50/1M = $4.2B annual revenue from Gemini API
Compare to:
- Google Cloud revenue (2025): ~$40B/year
- Gemini API as % of Cloud: ~10% (if $4.2B is accurate)
This positions AI APIs as a top-tier revenue driver for Google Cloud, justifying massive TPU capex.
ExplainX: Multicloud, Skills, and Verification
-
Google is packaging silicon, models, governance, and SaaS partners in one story—compelling for GCP-centric shops, but remember: platform lock-in is real. Portable patterns matter.
-
Portable agent primitives still matter: Agent skills, MCP, and explainx.ai/skills help connectors and workflows survive model and host changes. Don't architect exclusively around Gemini Enterprise unless you're prepared to refactor if you switch providers.
-
The 75% figure is a process metric at one company—pair any keynote stat with hallucination literacy and your own evals. Google has elite infrastructure, rigorous review processes, and internal tools unavailable to most teams. Your mileage will vary.
-
Trust boundaries, registries, and verification first: Courses teach the same primitives regardless of cloud provider. The fundamentals (prompt engineering, tool design, eval harnesses, security) transfer across platforms.
-
Consider TCO beyond API pricing: Gemini Enterprise's per-seat licensing (pricing undisclosed) may be more expensive than OpenAI/Anthropic for small teams but cheaper at enterprise scale. Model costs, infrastructure overhead, and engineering productivity all factor into true cost-of-ownership.
Read next: Claude Code /ultrareview · Agent skills and security · Chrome "Skills" vs SKILL.md · DeepSeek V4-Pro: Benchmarks and Pricing · What are Agent Skills?
SKUs, dates, and claims evolve. Re-verify on Cloud Next and product pages before plans or procurement. This article reflects announcements as of April 2026 and is not investment advice.