Workers AI Provider¶
Module: bernstein.core.routing.cloudflare_ai Class: WorkersAIProvider
Cloudflare Workers AI provides free-tier LLM models that Bernstein can use for task decomposition, planning, manager decisions, and structured output generation. This lets you run the orchestrator's internal LLM calls at zero cost.
Available models¶
All models listed below are free on Workers AI:
| Model | Context | Speed | Best for |
|---|---|---|---|
@cf/meta/llama-3.1-70b-instruct | 131,072 | Medium | Planning, decomposition (default) |
@cf/meta/llama-3.1-8b-instruct | 131,072 | Fast | Simple classification, routing |
@cf/mistral/mistral-7b-instruct-v0.2 | 32,768 | Fast | Quick completions |
@cf/google/gemma-7b-it | 8,192 | Fast | Short prompts, simple tasks |
@cf/qwen/qwen1.5-14b-chat | 32,768 | Medium | Multilingual tasks |
Zero-cost planning
Use Workers AI as your internal_llm_provider in bernstein.yaml to eliminate LLM costs for orchestrator-internal calls (task decomposition, priority assignment, plan optimization). Agent execution still uses your configured CLI adapter.
Configuration¶
WorkersAIConfig dataclass fields:
| Field | Type | Default | Description |
|---|---|---|---|
account_id | str | (required) | Cloudflare account ID |
api_token | str | (required) | API token with Workers AI: Run permission |
model | str | "@cf/meta/llama-3.1-70b-instruct" | Model identifier |
max_tokens | int | 4096 | Maximum output tokens |
temperature | float | 0.3 | Sampling temperature |
timeout_seconds | int | 60 | HTTP request timeout |
Usage¶
Text completion¶
from bernstein.core.routing.cloudflare_ai import WorkersAIConfig, WorkersAIProvider
provider = WorkersAIProvider(WorkersAIConfig(
account_id="abc123",
api_token="cf_token_...",
))
response = await provider.complete(
"Decompose this task into 3 subtasks: Add authentication to the API",
system="You are a senior engineering manager planning work for a team.",
)
print(response.text)
print(response.model) # "@cf/meta/llama-3.1-70b-instruct"
print(response.input_tokens) # token count from API
print(response.output_tokens)
print(response.is_free) # True for free-tier models
Structured JSON output¶
schema = {
"type": "object",
"properties": {
"subtasks": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"role": {"type": "string"},
"priority": {"type": "integer"},
},
},
},
},
}
result = await provider.structured(
"Decompose: Add OAuth2 login to the web app",
schema=schema,
system="Return a task decomposition as JSON.",
)
# result is a parsed dict matching the schema
print(result["subtasks"])
JSON parsing
The structured() method automatically strips markdown code fences from model output before parsing. If the model returns invalid JSON, a json.JSONDecodeError is raised.
Cost estimation¶
cost = provider.estimate_cost(input_tokens=1000, output_tokens=500)
print(f"${cost:.6f}") # $0.000000 for free models
# List all available models with metadata
models = WorkersAIProvider.available_models()
for name, info in models.items():
print(f"{name}: free={info['free']}, context={info['context']}")
Response type¶
WorkersAIResponse fields:
| Field | Type | Description |
|---|---|---|
text | str | Generated text |
model | str | Model identifier used |
input_tokens | int | Input token count (from API usage) |
output_tokens | int | Output token count |
is_free | bool | Whether this model is on the free tier |
Integration with bernstein.yaml¶
To use Workers AI as the internal scheduler LLM:
# bernstein.yaml
internal_llm_provider: cloudflare_ai
internal_llm_model: "@cf/meta/llama-3.1-70b-instruct"
This routes all orchestrator-internal LLM calls (task decomposition, priority assignment) through Workers AI while agents still use your configured CLI adapter (Claude, Codex, Gemini, etc.).
Cost comparison¶
| Provider | Model | Input cost/1M tokens | Output cost/1M tokens | Planning cost for 50-task run |
|---|---|---|---|---|
| Workers AI | Llama 3.1 70B | $0.00 | $0.00 | $0.00 |
| Workers AI | Llama 3.1 8B | $0.00 | $0.00 | $0.00 |
| Anthropic | Claude Haiku | ~$0.25 | ~$1.25 | ~$0.50 |
| Anthropic | Claude Sonnet | ~$3.00 | ~$15.00 | ~$6.00 |
| OpenAI | GPT-4o-mini | ~$0.15 | ~$0.60 | ~$0.30 |
Hybrid approach
Use Workers AI for planning/decomposition (free) and Claude/Codex/Gemini for actual code generation (paid but high quality). This eliminates orchestrator overhead costs entirely.