Workers AI Provider¶

Module: bernstein.core.routing.cloudflare_ai Class: WorkersAIProvider

Cloudflare Workers AI provides free-tier LLM models that Bernstein can use for task decomposition, planning, manager decisions, and structured output generation. This lets you run the orchestrator's internal LLM calls at zero cost.

Available models¶

All models listed below are free on Workers AI:

Model	Context	Speed	Best for
`@cf/meta/llama-3.1-70b-instruct`	131,072	Medium	Planning, decomposition (default)
`@cf/meta/llama-3.1-8b-instruct`	131,072	Fast	Simple classification, routing
`@cf/mistral/mistral-7b-instruct-v0.2`	32,768	Fast	Quick completions
`@cf/google/gemma-7b-it`	8,192	Fast	Short prompts, simple tasks
`@cf/qwen/qwen1.5-14b-chat`	32,768	Medium	Multilingual tasks

Zero-cost planning

Use Workers AI as your internal_llm_provider in bernstein.yaml to eliminate LLM costs for orchestrator-internal calls (task decomposition, priority assignment, plan optimization). Agent execution still uses your configured CLI adapter.

Configuration¶

WorkersAIConfig dataclass fields:

Field	Type	Default	Description
`account_id`	`str`	(required)	Cloudflare account ID
`api_token`	`str`	(required)	API token with Workers AI: Run permission
`model`	`str`	`"@cf/meta/llama-3.1-70b-instruct"`	Model identifier
`max_tokens`	`int`	`4096`	Maximum output tokens
`temperature`	`float`	`0.3`	Sampling temperature
`timeout_seconds`	`int`	`60`	HTTP request timeout

Usage¶

Text completion¶

from bernstein.core.routing.cloudflare_ai import WorkersAIConfig, WorkersAIProvider

provider = WorkersAIProvider(WorkersAIConfig(
    account_id="abc123",
    api_token="cf_token_...",
))

response = await provider.complete(
    "Decompose this task into 3 subtasks: Add authentication to the API",
    system="You are a senior engineering manager planning work for a team.",
)

print(response.text)
print(response.model)          # "@cf/meta/llama-3.1-70b-instruct"
print(response.input_tokens)   # token count from API
print(response.output_tokens)
print(response.is_free)        # True for free-tier models

Structured JSON output¶

schema = {
    "type": "object",
    "properties": {
        "subtasks": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "role": {"type": "string"},
                    "priority": {"type": "integer"},
                },
            },
        },
    },
}

result = await provider.structured(
    "Decompose: Add OAuth2 login to the web app",
    schema=schema,
    system="Return a task decomposition as JSON.",
)
# result is a parsed dict matching the schema
print(result["subtasks"])

JSON parsing

The structured() method automatically strips markdown code fences from model output before parsing. If the model returns invalid JSON, a json.JSONDecodeError is raised.

Cost estimation¶

cost = provider.estimate_cost(input_tokens=1000, output_tokens=500)
print(f"${cost:.6f}")  # $0.000000 for free models

# List all available models with metadata
models = WorkersAIProvider.available_models()
for name, info in models.items():
    print(f"{name}: free={info['free']}, context={info['context']}")

Response type¶

WorkersAIResponse fields:

Field	Type	Description
`text`	`str`	Generated text
`model`	`str`	Model identifier used
`input_tokens`	`int`	Input token count (from API usage)
`output_tokens`	`int`	Output token count
`is_free`	`bool`	Whether this model is on the free tier

Integration with bernstein.yaml¶

To use Workers AI as the internal scheduler LLM:

# bernstein.yaml
internal_llm_provider: cloudflare_ai
internal_llm_model: "@cf/meta/llama-3.1-70b-instruct"

This routes all orchestrator-internal LLM calls (task decomposition, priority assignment) through Workers AI while agents still use your configured CLI adapter (Claude, Codex, Gemini, etc.).

Cost comparison¶

Provider	Model	Input cost/1M tokens	Output cost/1M tokens	Planning cost for 50-task run
Workers AI	Llama 3.1 70B	$0.00	$0.00	$0.00
Workers AI	Llama 3.1 8B	$0.00	$0.00	$0.00
Anthropic	Claude Haiku	~$0.25	~$1.25	~$0.50
Anthropic	Claude Sonnet	~$3.00	~$15.00	~$6.00
OpenAI	GPT-4o-mini	~$0.15	~$0.60	~$0.30

Hybrid approach

Use Workers AI for planning/decomposition (free) and Claude/Codex/Gemini for actual code generation (paid but high quality). This eliminates orchestrator overhead costs entirely.