What is Workers AI
Cloudflare Workers AI is a service that allows you to run AI models on Cloudflare’s edge network. It performs AI inference with low latency and processes data without sending it to the cloud.
Supported Models
Text Generation (LLM)
| Model | Features |
|---|
| Llama 3 8B | General purpose, high performance |
| Mistral 7B | Fast, efficient |
| Gemma 7B | Google developed, lightweight |
| Phi-2 | Microsoft developed, compact |
Image & Vision
| Model | Use Case |
|---|
| Stable Diffusion XL | Image generation |
| LLaVA | Image understanding |
| CLIP | Image classification |
Audio
| Model | Use Case |
|---|
| Whisper | Speech recognition |
| TTS | Text-to-speech |
Basic Usage
Text Generation
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const response = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Tell me 3 benefits of TypeScript' }
]
});
return Response.json(response);
}
};
Streaming Response
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const stream = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
messages: [
{ role: 'user', content: 'Explain the future of AI' }
],
stream: true
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' }
});
}
};
Image Generation
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const response = await env.AI.run('@cf/stabilityai/stable-diffusion-xl-base-1.0', {
prompt: 'A futuristic city with flying cars, cyberpunk style',
num_steps: 20
});
return new Response(response, {
headers: { 'content-type': 'image/png' }
});
}
};
Image Analysis
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const imageData = await request.arrayBuffer();
const response = await env.AI.run('@cf/llava-hf/llava-1.5-7b-hf', {
image: [...new Uint8Array(imageData)],
prompt: 'What is in this image?',
max_tokens: 512
});
return Response.json(response);
}
};
Speech Recognition
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const audioData = await request.arrayBuffer();
const response = await env.AI.run('@cf/openai/whisper', {
audio: [...new Uint8Array(audioData)]
});
return Response.json({
text: response.text,
language: response.detected_language
});
}
};
Integration with Vectorize
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const question = await request.text();
const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: question
});
const matches = await env.VECTORIZE.query(embedding.data[0], {
topK: 3
});
const context = matches.map(m => m.metadata.text).join('\n');
const response = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
messages: [
{ role: 'system', content: `Answer using the following context:\n${context}` },
{ role: 'user', content: question }
]
});
return Response.json(response);
}
};
Pricing
Pay-as-you-go:
- Text generation: $0.011 / 1,000 neurons
- Image generation: $0.01 / image
- Speech recognition: $0.01 / minute
Free tier:
- Up to 10,000 neurons per day free
Deployment
[ai]
binding = "AI"
npx wrangler deploy
Use Cases
✓ Chatbots
✓ Content generation
✓ Image processing pipelines
✓ Audio transcription
✓ Retrieval Augmented Generation (RAG)
✓ Content moderation
Summary
Cloudflare Workers AI is a powerful platform for running AI inference at the edge. With low latency, global distribution, and simple APIs, it makes AI application development easy.
← Back to list