Skip to main content
skillsFirst-partyReview first Safety · Privacy ·

Cloudflare Workers AI Edge Functions Skill

Deploy AI models and serverless functions to Cloudflare's global edge network with sub-5ms cold starts and 40% edge computing market share. Access 50+ open-source AI models (Llama-2, Whisper, Stable Diffusion) with pay-per-use pricing.

by JSONbored·added 2025-10-16·
Claude CodeCodexWindsurfGeminiCursorCLI
HarnessClaude CodeCodexWindsurfGeminiCursorCLI
Level:advancedType:generalVerified:draft
Review first review before installing

Open the source and read safety notes before installing.

Prerequisites

  • Cloudflare account
  • Wrangler CLI 3.0+
  • Node.js 18+
  • @cloudflare/workers-types
  • Cloudflare account with Workers AI enabled (available on Free and Paid plans)
  • Wrangler CLI authentication configured (wrangler login) for deployment access

Schema details

Install type
package
Reading time
6 min
Difficulty score
100
Troubleshooting
Yes
Breaking changes
No
Package metadata
Package verified
Yes
SHA-256
8cf522b452d3699ef4bc63ebfb8e326609b053e7c7234a44aef1b2b2adeee6d8
Skill and platform metadata
Skill type
general
Skill level
advanced
Verification
draft
Verified at
2025-10-16
Retrieval sources
https://developers.cloudflare.com/workers-ai/
Tested platforms
ClaudeCodexOpenClawCursorWindsurfGemini
PlatformSupportInstall path
claude-codeNative.claude/skills/<skill-name>/SKILL.md
codexNative.agents/skills/<skill-name>/SKILL.md
windsurfNative.windsurf/skills/<skill-name>/SKILL.md
geminiNative.gemini/skills/<skill-name>/SKILL.md or .agents/skills/<skill-name>/SKILL.md
cursorAdapter.cursor/rules/<skill-name>.mdc
cliManualAGENTS.md or tool-specific context file
Full copyable content
export interface Env {
  AI: any;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    if (request.method !== 'POST') {
      return new Response('Method not allowed', { status: 405 });
    }

    const { messages } = await request.json<{ messages: any[] }>();

    const response = await env.AI.run('@cf/meta/llama-2-7b-chat-int8', {
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        ...messages,
      ],
      stream: true,
    });

    return new Response(response, {
      headers: {
        'content-type': 'text/event-stream',
        'cache-control': 'no-cache',
      },
    });
  },
};

About this resource

Deploy AI models and serverless functions to Cloudflare's global edge network with sub-5ms cold starts and 40% edge computing market share. Access 50+ open-source AI models (Llama-2, Whisper, Stable Diffusion) with pay-per-use pricing. Includes 10,000 free Neurons per day, integrated D1/R2/KV storage, and deployment to 275+ cities worldwide.

Content

Cloudflare Workers AI Edge Functions Skill

What This Skill Enables

Claude can build and deploy AI-powered serverless functions on Cloudflare's global edge network, spanning 275+ cities with sub-5ms cold start times (10-80x faster than AWS Lambda@Edge). With 40% edge computing market share and 4,000% year-over-year growth in AI inference requests, Cloudflare Workers AI brings machine learning models directly to users worldwide with minimal latency.

Compatibility

Native

  • Claude Code / Claude: native skill usage via SKILL.md.
  • Codex/OpenAI workflows: compatible with Agent Skills-style SKILL.md content as reusable workflow instructions.

Manual Adaptation

  • Gemini CLI: native skill usage via .gemini/skills/<skill-name>/SKILL.md or .agents/skills/<skill-name>/SKILL.md where supported.
  • Cursor: use the generated .cursor/rules/*.mdc adapter for project rules.
  • OpenClaw and similar agents: use the same skill content as a reusable prompt/workflow file when native skill import is unavailable.

Prerequisites

Required:

  • Claude Pro subscription or Claude Code CLI
  • Cloudflare account (free tier available)
  • Wrangler CLI installed (npm install -g wrangler)
  • Basic understanding of JavaScript/TypeScript

What Claude handles automatically:

  • Writing Workers code with TypeScript types
  • Configuring wrangler.toml for deployments
  • Implementing AI model bindings (Llama-2, Whisper, Stable Diffusion)
  • Setting up D1 database and R2 storage integrations
  • Managing environment variables and secrets
  • Deploying to Cloudflare's edge network
  • Optimizing for V8 isolate performance

How to Use This Skill

Deploy a Basic Edge Function

Prompt: "Create a Cloudflare Worker that responds to HTTP requests with JSON data and deploys to the edge."

Claude will:

  1. Generate a Worker with proper fetch event handler
  2. Create wrangler.toml configuration
  3. Set up TypeScript types for Request/Response
  4. Add error handling and CORS headers
  5. Deploy with wrangler publish
  6. Provide the deployed Worker URL

AI Model Integration (Llama-2 Chat)

Prompt: "Build a Cloudflare Worker that uses Llama-2 to generate chat responses. Accept POST requests with user messages and stream the AI responses back."

Claude will:

  1. Configure AI binding in wrangler.toml
  2. Implement streaming response with ReadableStream
  3. Add proper prompt formatting for Llama-2
  4. Set up rate limiting to control costs
  5. Include request validation and error handling
  6. Deploy with Workers AI binding enabled

Image Generation with Stable Diffusion

Prompt: "Create an edge function that generates images using Stable Diffusion XL. Accept a text prompt via API and return the generated image URL stored in R2."

Claude will:

  1. Set up Workers AI binding for Stable Diffusion
  2. Configure R2 bucket for image storage
  3. Implement image generation with proper parameters
  4. Upload generated images to R2 with public URLs
  5. Add caching headers for CDN optimization
  6. Include usage analytics with D1 database

Real-Time Translation API

Prompt: "Build a translation API using Cloudflare Workers AI that detects the source language and translates to the target language. Support 50+ languages with edge caching."

Claude will:

  1. Use Workers AI translation models
  2. Implement language detection
  3. Set up KV namespace for translation caching
  4. Add rate limiting per IP address
  5. Configure CDN cache for common translations
  6. Include usage metrics and error logging

Tips for Best Results

  1. Leverage V8 Isolates: Workers use V8 isolates that start in <5ms and use 1/10th the memory of Node.js. Design stateless functions that take advantage of this architecture.

  2. Use Durable Objects for State: For stateful operations (WebSockets, real-time collaboration), request Durable Objects implementation instead of external databases.

  3. Model Selection: Choose appropriate AI models based on latency requirements. Smaller models like Llama-2-7B offer faster inference than larger variants.

  4. Edge Caching: Implement Cache API or KV storage for frequently accessed data to reduce AI inference costs.

  5. Cost Optimization: Workers AI charges per request. Use caching, rate limiting, and request batching to optimize costs.

  6. Geographic Routing: Workers automatically route to the nearest data center. For AI models, consider pinning specific regions for data residency compliance.

Common Workflows

Full-Stack AI Application

"Create a complete AI-powered application on Cloudflare:
1. Workers AI for text generation (Llama-2)
2. D1 database for storing conversations
3. R2 for file uploads and generated content
4. KV for session management and caching
5. Pages for frontend deployment
6. Queue for background job processing
Include TypeScript types and deployment scripts."

Content Moderation API

"Build an edge API that:
1. Accepts text content via POST request
2. Uses Workers AI to detect harmful content
3. Classifies content as safe/unsafe with confidence scores
4. Logs results to D1 database
5. Returns moderation decision in <100ms
6. Handles 10,000 requests per minute"

Smart Image CDN

"Create a Cloudflare Worker that:
1. Intercepts image requests
2. Analyzes image with Workers AI (OCR, object detection)
3. Automatically optimizes images for device/bandwidth
4. Stores optimized versions in R2
5. Serves from edge cache on subsequent requests
6. Includes usage analytics and cost tracking"

Real-Time Sentiment Analysis

"Build a WebSocket-based sentiment analysis service:
1. Accept streaming text via WebSocket
2. Process chunks with Workers AI sentiment model
3. Return real-time sentiment scores
4. Store aggregate results in D1
5. Support 1000 concurrent connections
6. Deploy across all Cloudflare edge locations"

Troubleshooting

Issue: Worker exceeds CPU time limits Solution: Workers have a 50ms CPU time limit on free tier (30s on paid). Optimize by using streaming responses, reducing synchronous processing, or upgrading to Unbound workers for longer execution.

Issue: AI model inference too slow Solution: Use smaller model variants (e.g., Llama-2-7B instead of 13B), implement request queuing with Workers Queue, or cache common responses in KV storage.

Issue: CORS errors when calling from frontend Solution: Add proper CORS headers in Worker response. Ask Claude to include OPTIONS method handler and appropriate Access-Control-* headers.

Issue: Workers AI billing concerns Solution: Implement rate limiting with Durable Objects or KV, cache responses aggressively, use smaller models for simpler tasks, and set up billing alerts in Cloudflare dashboard.

Issue: Cannot access environment variables Solution: Ensure secrets are set with wrangler secret put and bindings are properly configured in wrangler.toml. Access via env.SECRET_NAME in Worker code.

Issue: Cold start latency for complex Workers Solution: Minimize dependencies (Workers bundle size should be <1MB), use dynamic imports for optional features, and consider splitting into multiple Workers for different routes.

Learn More

Features

  • Sub-5ms cold starts with V8 isolates
  • 20+ AI models: Llama-2, Whisper, Stable Diffusion
  • Deploy to 275+ cities globally
  • Integrated with D1, R2, KV, Queues
  • 50+ open-source AI models in catalog
  • Pay-per-use pricing with 10,000 free Neurons/day
  • Integrated with D1, R2, KV, Queues, Durable Objects
  • Real-time streaming responses with Server-Sent Events (SSE) support for AI model outputs, enabling progressive response delivery and improved user experience for long-running AI operations

Use Cases

  • Edge AI inference with minimal latency
  • Serverless APIs with global distribution
  • Real-time content moderation and analysis
  • Content moderation APIs with real-time classification
  • Multi-language translation services with edge caching
  • AI-powered image generation and processing pipelines
#cloudflare#edge-computing#ai#serverless#workers

Source citations

Signals

Loading live community signals…

More like this, weekly

A short, calm digest of reviewed Claude resources. Unsubscribe any time.