cloudflare-workers-ai

jezweb/claude-skills · updated Apr 8, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/jezweb/claude-skills --skill cloudflare-workers-ai
0 commentsdiscussion
summary

Run LLMs, embeddings, and image generation on Cloudflare's GPU network with 14 new 2025 models, streaming support, and 7 documented error preventions.

  • Supports 40+ models across text generation (Llama 4, Gemma 3, Mistral 3.1, GPT-OSS), embeddings (BGE 2x faster, EmbeddingGemma), image generation (Flux, Leonardo), vision, and audio (Deepgram, Whisper v3)
  • Handles critical 2025 breaking changes: context window validation switched from characters to tokens, BGE pooling parameter no longer b
skill.md

Cloudflare Workers AI

Status: Production Ready ✅ Last Updated: 2026-01-21 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: [email protected], @cloudflare/[email protected], [email protected]

Recent Updates (2025):

  • April 2025 - Performance: Llama 3.3 70B 2-4x faster (speculative decoding, prefix caching), BGE embeddings 2x faster
  • April 2025 - Breaking Changes: max_tokens now correctly defaults to 256 (was not respected), BGE pooling parameter (cls NOT backwards compatible with mean)
  • 2025 - New Models (14): Mistral 3.1 24B (vision+tools), Gemma 3 12B (128K context), EmbeddingGemma 300M, Llama 4 Scout, GPT-OSS 120B/20B, Qwen models (QwQ 32B, Coder 32B), Leonardo image gen, Deepgram Aura 2, Whisper v3 Turbo, IBM Granite, Nova 3
  • 2025 - Platform: Context windows API change (tokens not chars), unit-based pricing with per-model granularity, workers-ai-provider v3.0.2 (AI SDK v5), LoRA rank up to 32 (was 8), 100 adapters per account
  • October 2025: Model deprecations (use Llama 4, GPT-OSS instead)

Quick Start (5 Minutes)

// 1. Add AI binding to wrangler.jsonc
{ "ai": { "binding": "AI" } }

// 2. Run model with streaming (recommended)
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
      messages: [{ role: 'user', content: 'Tell me a story' }],
      stream: true, // Always stream for text generation!
    });

    return new Response(stream, {
      headers: { 'content-type': 'text/event-stream' },
    });
  },
};

Why streaming? Prevents buffering in memory, faster time-to-first-token, avoids Worker timeout issues.


Known Issues Prevention

This skill prevents 7 documented issues:

Issue #1: Context Window Validation Changed to Tokens (February 2025)

Error: "Exceeded character limit" despite model supporting larger context Source: Cloudflare Changelog Why It Happens: Before February 2025, Workers AI validated prompts using a hard 6144 character limit, even for models with larger token-based context windows (e.g., Mistral with 32K tokens). After the update, validation switched to token-based counting. Prevention: Calculate tokens (not characters) when checking context window limits.

import { encode } from 'gpt-tokenizer'; // or model-specific tokenizer

const tokens = encode(prompt);
const contextWindow = 32768; // Model's max tokens (check docs)
const maxResponseTokens = 2048;

if (tokens.length + maxResponseTokens > contextWindow) {
  throw new Error(`Prompt exceeds context window: ${tokens.length} tokens`);
}

const response = await env.AI.run('@cf/mistral/mistral-7b-instruct-v0.2', {
  messages: [{ role: 'user', content: prompt }],
  max_tokens: maxResponseTokens,
});

Issue #2: Neuron Consumption Discrepancies in Dashboard

Error: Dashboard neuron usage significantly exceeds expected token-based calculations Source: Cloudflare Community Discussion Why It Happens: Users report dashboard showing hundred-million-level neuron consumption for K-level token usage, particularly with AutoRAG features and certain models. The discrepancy between expected neuron consumption (based on pricing docs) and actual dashboard metrics is not fully documented. Prevention: Monitor neuron usage via AI Gateway logs and correlate with requests. File support ticket if consumption significantly exceeds expectations.

// Use AI Gateway for detailed request logging
const response = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  { messages: [{ role: 'user', content: query }] },
  { gateway: { id: 'my-gateway' } }
);

// Monitor dashboard at: https://dash.cloudflare.com → AI → Workers AI
// Compare neuron usage with token counts
// File support ticket with details if discrepancy persists

Issue #3: AI Binding Requires Remote or Latest Tooling in Local Dev

Error: "MiniflareCoreError: wrapped binding module can't be resolved (internal modules only)" Source: GitHub Issue #6796 Why It Happens: When using Workers AI bindings with Miniflare in local development (particularly with custom Vite plugins), the AI binding requires external workers that aren't properly exposed by older unstable_getMiniflareWorkerOptions. The error occurs when Miniflare can't resolve the internal AI worker module. Prevention: Use remote bindings for AI in local dev, or update to latest @cloudflare/vite-plugin.

// wrangler.jsonc - Option 1: Use remote AI binding in local dev
{
  "ai": { "binding": "AI" },
  "dev": {
    "remote": true // Use production AI binding locally
  }
}
# Option 2: Update to latest tooling
npm install -D @cloudflare/vite-plugin@latest

# Option 3: Use wrangler dev instead of custom Miniflare
npm run dev

Issue #4: Flux Image Generation NSFW Filter False Positives

Error: "AiError: Input prompt contains NSFW content (code 3030)" for innocent prompts Source: Cloudflare Community Discussion Why It Happens: Flux image generation models (@cf/black-forest-labs/flux-1-schnell) sometimes trigger false positive NSFW content errors even with innocent single-word prompts like "hamburger". The NSFW filter can be overly sensitive without context. Prevention: Add descriptive context around potential trigger words instead of using single-word prompts.

// ❌ May trigger error 3030
const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
  prompt: 'hamburger', // Single word triggers filter
});

// ✅ Add context to avoid false positives
const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
  prompt: 'A photo of a delicious large hamburger on a plate with lettuce and tomato',
  num_steps: 4,
});

Issue #5: Image Generation Error 1000 - Missing num_steps Parameter

Error: "Error: unexpected type 'int32' with value 'undefined' (code 1000)" Source: Cloudflare Community Discussion Why It Happens: Image generation API calls return error code 1000 when the num_steps parameter is not provided, even though documentation suggests it's optional. The parameter is actually required for most Flux models. Prevention: Always include num_steps: 4 for image generation models (typically 4 for Flux Schnell).

// ✅ Always include num_steps for image generation
const image = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
  prompt: 'A beautiful sunset over mountains',
  num_steps: 4, // Required - typically 4 for Flux Schnell
});

// Note: FLUX.2 [klein] 4B has fixed steps=4 (cannot be adjusted)

Issue #6: Zod v4 Incompatibility with Structured Output Tools

Error: Syntax errors and failed transpilation when using Stagehand with Zod v4 Source: GitHub Issue #10798 Why It Happens: Stagehand (browser automation) and some structured output examples in Workers AI fail with Zod v4 (now default). The underlying zod-to-json-schema library doesn't yet support Zod v4, causing transpilation failures. Prevention: Pin Zod to v3 until zod-to-json-schema supports v4.

# Install Zod v3 specifically
npm install zod@3

# Or pin in package.json
{
  "dependencies": {
    "zod": "~3.23.8" // Pin to v3 for compatibility
  }
}

Issue #7: AI Gateway Cache Headers for Per-Request Control

Not an error, but important feature: AI Gateway supports per-request cache control via HTTP headers for custom TTL, cache bypass, and custom cache keys beyond dashboard defaults. Source: AI Gateway Caching Documentation Use When: You need different caching behavior for different requests (e.g., 1 hour for expensive queries, skip cache for real-time data). Implementation: See AI Gateway Integration section below for header usage.


API Reference

env.AI.run(
  model: string,
  inputs: ModelInputs,
  options?: { gateway?: { id: string; skipCache?: boolean } }
): Prom
how to use cloudflare-workers-ai

How to use cloudflare-workers-ai on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add cloudflare-workers-ai
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/jezweb/claude-skills --skill cloudflare-workers-ai

The skills CLI fetches cloudflare-workers-ai from GitHub repository jezweb/claude-skills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/cloudflare-workers-ai

Reload or restart Cursor to activate cloudflare-workers-ai. Access the skill through slash commands (e.g., /cloudflare-workers-ai) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Task Automation & Efficiency

Automate repetitive workflows and reduce manual effort

Example

Generate reports, summarize documents, draft communications

Save 3-5 hours per week on routine tasks

Knowledge Enhancement

Learn new skills, understand complex topics, get expert guidance

Example

Explain concepts, provide examples, suggest learning resources

Accelerate learning and skill development by 2x

Quality Improvement

Enhance output quality through reviews, suggestions, and refinements

Example

Review drafts, suggest improvements, catch errors

Improve work quality by 30-40% with less effort

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client with skill support
  • Clear understanding of task or problem to solve
  • Willingness to iterate and refine outputs

Time Estimate

15-45 minutes depending on use case complexity

Installation Steps

  1. 1.Install skill using provided installation command
  2. 2.Test with simple use case relevant to your work
  3. 3.Evaluate output quality and relevance
  4. 4.Iterate on prompts to improve results
  5. 5.Integrate into regular workflow if valuable

Common Pitfalls

  • Expecting perfect results without iteration
  • Not providing enough context in prompts
  • Using skill for tasks outside its intended scope
  • Accepting outputs without review and validation

Best Practices

✓ Do

  • +Start with clear, specific prompts
  • +Provide relevant context and constraints
  • +Review and refine all outputs before using
  • +Iterate to improve output quality
  • +Document successful prompt patterns

✗ Don't

  • Don't use without understanding skill limitations
  • Don't skip validation of outputs
  • Don't share sensitive information in prompts
  • Don't expect skill to replace human judgment

💡 Pro Tips

  • Be specific about desired format and style
  • Ask for multiple options to choose from
  • Request explanations to understand reasoning
  • Combine AI efficiency with human expertise

When to Use This

✓ Use When

Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.

✗ Avoid When

Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.

Learning Path

  1. 1Familiarize yourself with skill capabilities and limitations
  2. 2Start with low-risk, non-critical tasks
  3. 3Progress to more complex and valuable use cases
  4. 4Build expertise through regular use and experimentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.758 reviews
  • Noor Kim· Dec 8, 2024

    Keeps context tight: cloudflare-workers-ai is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Noor Huang· Dec 8, 2024

    We added cloudflare-workers-ai from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Li Jain· Dec 8, 2024

    cloudflare-workers-ai reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Dhruvi Jain· Dec 4, 2024

    cloudflare-workers-ai has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Lucas Abebe· Dec 4, 2024

    Registry listing for cloudflare-workers-ai matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Naina Dixit· Nov 27, 2024

    Registry listing for cloudflare-workers-ai matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Noor Rahman· Nov 27, 2024

    cloudflare-workers-ai fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Nikhil Bhatia· Nov 27, 2024

    cloudflare-workers-ai has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Oshnikdeep· Nov 23, 2024

    cloudflare-workers-ai reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Neel Perez· Nov 23, 2024

    Keeps context tight: cloudflare-workers-ai is the kind of skill you can hand to a new teammate without a long onboarding doc.

showing 1-10 of 58

1 / 6