cloudflare-workers-ai▌
jezweb/claude-skills · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Run LLMs, embeddings, and image generation on Cloudflare's GPU network with 14 new 2025 models, streaming support, and 7 documented error preventions.
- ›Supports 40+ models across text generation (Llama 4, Gemma 3, Mistral 3.1, GPT-OSS), embeddings (BGE 2x faster, EmbeddingGemma), image generation (Flux, Leonardo), vision, and audio (Deepgram, Whisper v3)
- ›Handles critical 2025 breaking changes: context window validation switched from characters to tokens, BGE pooling parameter no longer b
Cloudflare Workers AI
Status: Production Ready ✅ Last Updated: 2026-01-21 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: [email protected], @cloudflare/[email protected], [email protected]
Recent Updates (2025):
- April 2025 - Performance: Llama 3.3 70B 2-4x faster (speculative decoding, prefix caching), BGE embeddings 2x faster
- April 2025 - Breaking Changes: max_tokens now correctly defaults to 256 (was not respected), BGE pooling parameter (cls NOT backwards compatible with mean)
- 2025 - New Models (14): Mistral 3.1 24B (vision+tools), Gemma 3 12B (128K context), EmbeddingGemma 300M, Llama 4 Scout, GPT-OSS 120B/20B, Qwen models (QwQ 32B, Coder 32B), Leonardo image gen, Deepgram Aura 2, Whisper v3 Turbo, IBM Granite, Nova 3
- 2025 - Platform: Context windows API change (tokens not chars), unit-based pricing with per-model granularity, workers-ai-provider v3.0.2 (AI SDK v5), LoRA rank up to 32 (was 8), 100 adapters per account
- October 2025: Model deprecations (use Llama 4, GPT-OSS instead)
Quick Start (5 Minutes)
// 1. Add AI binding to wrangler.jsonc
{ "ai": { "binding": "AI" } }
// 2. Run model with streaming (recommended)
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true, // Always stream for text generation!
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' },
});
},
};
Why streaming? Prevents buffering in memory, faster time-to-first-token, avoids Worker timeout issues.
Known Issues Prevention
This skill prevents 7 documented issues:
Issue #1: Context Window Validation Changed to Tokens (February 2025)
Error: "Exceeded character limit" despite model supporting larger context
Source: Cloudflare Changelog
Why It Happens: Before February 2025, Workers AI validated prompts using a hard 6144 character limit, even for models with larger token-based context windows (e.g., Mistral with 32K tokens). After the update, validation switched to token-based counting.
Prevention: Calculate tokens (not characters) when checking context window limits.
import { encode } from 'gpt-tokenizer'; // or model-specific tokenizer
const tokens = encode(prompt);
const contextWindow = 32768; // Model's max tokens (check docs)
const maxResponseTokens = 2048;
if (tokens.length + maxResponseTokens > contextWindow) {
throw new Error(`Prompt exceeds context window: ${tokens.length} tokens`);
}
const response = await env.AI.run('@cf/mistral/mistral-7b-instruct-v0.2', {
messages: [{ role: 'user', content: prompt }],
max_tokens: maxResponseTokens,
});
Issue #2: Neuron Consumption Discrepancies in Dashboard
Error: Dashboard neuron usage significantly exceeds expected token-based calculations Source: Cloudflare Community Discussion Why It Happens: Users report dashboard showing hundred-million-level neuron consumption for K-level token usage, particularly with AutoRAG features and certain models. The discrepancy between expected neuron consumption (based on pricing docs) and actual dashboard metrics is not fully documented. Prevention: Monitor neuron usage via AI Gateway logs and correlate with requests. File support ticket if consumption significantly exceeds expectations.
// Use AI Gateway for detailed request logging
const response = await env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{ messages: [{ role: 'user', content: query }] },
{ gateway: { id: 'my-gateway' } }
);
// Monitor dashboard at: https://dash.cloudflare.com → AI → Workers AI
// Compare neuron usage with token counts
// File support ticket with details if discrepancy persists
Issue #3: AI Binding Requires Remote or Latest Tooling in Local Dev
Error: "MiniflareCoreError: wrapped binding module can't be resolved (internal modules only)"
Source: GitHub Issue #6796
Why It Happens: When using Workers AI bindings with Miniflare in local development (particularly with custom Vite plugins), the AI binding requires external workers that aren't properly exposed by older unstable_getMiniflareWorkerOptions. The error occurs when Miniflare can't resolve the internal AI worker module.
Prevention: Use remote bindings for AI in local dev, or update to latest @cloudflare/vite-plugin.
// wrangler.jsonc - Option 1: Use remote AI binding in local dev
{
"ai": { "binding": "AI" },
"dev": {
"remote": true // Use production AI binding locally
}
}
# Option 2: Update to latest tooling
npm install -D @cloudflare/vite-plugin@latest
# Option 3: Use wrangler dev instead of custom Miniflare
npm run dev
Issue #4: Flux Image Generation NSFW Filter False Positives
Error: "AiError: Input prompt contains NSFW content (code 3030)" for innocent prompts
Source: Cloudflare Community Discussion
Why It Happens: Flux image generation models (@cf/black-forest-labs/flux-1-schnell) sometimes trigger false positive NSFW content errors even with innocent single-word prompts like "hamburger". The NSFW filter can be overly sensitive without context.
Prevention: Add descriptive context around potential trigger words instead of using single-word prompts.
// ❌ May trigger error 3030
const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: 'hamburger', // Single word triggers filter
});
// ✅ Add context to avoid false positives
const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: 'A photo of a delicious large hamburger on a plate with lettuce and tomato',
num_steps: 4,
});
Issue #5: Image Generation Error 1000 - Missing num_steps Parameter
Error: "Error: unexpected type 'int32' with value 'undefined' (code 1000)"
Source: Cloudflare Community Discussion
Why It Happens: Image generation API calls return error code 1000 when the num_steps parameter is not provided, even though documentation suggests it's optional. The parameter is actually required for most Flux models.
Prevention: Always include num_steps: 4 for image generation models (typically 4 for Flux Schnell).
// ✅ Always include num_steps for image generation
const image = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: 'A beautiful sunset over mountains',
num_steps: 4, // Required - typically 4 for Flux Schnell
});
// Note: FLUX.2 [klein] 4B has fixed steps=4 (cannot be adjusted)
Issue #6: Zod v4 Incompatibility with Structured Output Tools
Error: Syntax errors and failed transpilation when using Stagehand with Zod v4
Source: GitHub Issue #10798
Why It Happens: Stagehand (browser automation) and some structured output examples in Workers AI fail with Zod v4 (now default). The underlying zod-to-json-schema library doesn't yet support Zod v4, causing transpilation failures.
Prevention: Pin Zod to v3 until zod-to-json-schema supports v4.
# Install Zod v3 specifically
npm install zod@3
# Or pin in package.json
{
"dependencies": {
"zod": "~3.23.8" // Pin to v3 for compatibility
}
}
Issue #7: AI Gateway Cache Headers for Per-Request Control
Not an error, but important feature: AI Gateway supports per-request cache control via HTTP headers for custom TTL, cache bypass, and custom cache keys beyond dashboard defaults. Source: AI Gateway Caching Documentation Use When: You need different caching behavior for different requests (e.g., 1 hour for expensive queries, skip cache for real-time data). Implementation: See AI Gateway Integration section below for header usage.
API Reference
env.AI.run(
model: string,
inputs: ModelInputs,
options?: { gateway?: { id: string; skipCache?: boolean } }
): PromHow to use cloudflare-workers-ai on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add cloudflare-workers-ai
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches cloudflare-workers-ai from GitHub repository jezweb/claude-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate cloudflare-workers-ai. Access the skill through slash commands (e.g., /cloudflare-workers-ai) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.7★★★★★58 reviews- ★★★★★Noor Kim· Dec 8, 2024
Keeps context tight: cloudflare-workers-ai is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Noor Huang· Dec 8, 2024
We added cloudflare-workers-ai from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Li Jain· Dec 8, 2024
cloudflare-workers-ai reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Dhruvi Jain· Dec 4, 2024
cloudflare-workers-ai has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Lucas Abebe· Dec 4, 2024
Registry listing for cloudflare-workers-ai matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Naina Dixit· Nov 27, 2024
Registry listing for cloudflare-workers-ai matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Noor Rahman· Nov 27, 2024
cloudflare-workers-ai fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Nikhil Bhatia· Nov 27, 2024
cloudflare-workers-ai has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Oshnikdeep· Nov 23, 2024
cloudflare-workers-ai reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Neel Perez· Nov 23, 2024
Keeps context tight: cloudflare-workers-ai is the kind of skill you can hand to a new teammate without a long onboarding doc.
showing 1-10 of 58