Custom Models
Add custom providers and models (Ollama, vLLM, LM Studio, proxies) via ~/.pi/agent/models.json.
Table of Contents
Section titled “Table of Contents”- Minimal Example
- Full Example
- Supported APIs
- Provider Configuration
- Model Configuration
- Overriding Built-in Providers
- Per-model Overrides
- Anthropic Messages Compatibility
- OpenAI Compatibility
Minimal Example
Section titled “Minimal Example”For local models (Ollama, LM Studio, vLLM), only id is required per model:
{ "providers": { "ollama": { "baseUrl": "http://localhost:11434/v1", "api": "openai-completions", "apiKey": "ollama", "models": [ { "id": "llama3.1:8b" }, { "id": "qwen2.5-coder:7b" } ] } }}The apiKey is required but Ollama ignores it, so any value works.
Some OpenAI-compatible servers do not understand the developer role used for reasoning-capable models. For those providers, set compat.supportsDeveloperRole to false so pi sends the system prompt as a system message instead. If the server also does not support reasoning_effort, set compat.supportsReasoningEffort to false too.
You can set compat at the provider level to apply to all models, or at the model level to override a specific model. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.
{ "providers": { "ollama": { "baseUrl": "http://localhost:11434/v1", "api": "openai-completions", "apiKey": "ollama", "compat": { "supportsDeveloperRole": false, "supportsReasoningEffort": false }, "models": [ { "id": "gpt-oss:20b", "reasoning": true } ] } }}Full Example
Section titled “Full Example”Override defaults when you need specific values:
{ "providers": { "ollama": { "baseUrl": "http://localhost:11434/v1", "api": "openai-completions", "apiKey": "ollama", "models": [ { "id": "llama3.1:8b", "name": "Llama 3.1 8B (Local)", "reasoning": false, "input": ["text"], "contextWindow": 128000, "maxTokens": 32000, "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 } } ] } }}The file reloads each time you open /model. Edit during session; no restart needed.
Google AI Studio Example
Section titled “Google AI Studio Example”Use google-generative-ai with a baseUrl to add models from Google AI Studio, including custom Gemma 4 entries:
{ "providers": { "my-google": { "baseUrl": "https://generativelanguage.googleapis.com/v1beta", "api": "google-generative-ai", "apiKey": "$GEMINI_API_KEY", "models": [ { "id": "gemma-4-31b-it", "name": "Gemma 4 31B", "input": ["text", "image"], "contextWindow": 262144, "reasoning": true } ] } }}The baseUrl is required when adding custom models to the google-generative-ai API type.
Supported APIs
Section titled “Supported APIs”| API | Description |
|---|---|
openai-completions | OpenAI Chat Completions (most compatible) |
openai-responses | OpenAI Responses API |
anthropic-messages | Anthropic Messages API |
google-generative-ai | Google Generative AI |
Set api at provider level (default for all models) or model level (override per model).
Provider Configuration
Section titled “Provider Configuration”| Field | Description |
|---|---|
baseUrl | API endpoint URL |
api | API type (see above) |
apiKey | API key (see value resolution below) |
headers | Custom headers (see value resolution below) |
authHeader | Set true to add Authorization: Bearer <apiKey> automatically |
models | Array of model configurations |
modelOverrides | Per-model overrides for built-in models on this provider |
Value Resolution
Section titled “Value Resolution”The apiKey and headers fields support command execution, environment interpolation, and literals:
- Shell command:
"!command"at the start executes the whole value as a command and uses stdout"apiKey": "!security find-generic-password -ws 'anthropic'""apiKey": "!op read 'op://vault/item/credential'" - Environment interpolation:
"$ENV_VAR"or"${ENV_VAR}"uses the value of the named variable. Interpolation works inside larger literals."apiKey": "$MY_API_KEY""apiKey": "${KEY_PREFIX}_${KEY_SUFFIX}"$FOO_BARis the variableFOO_BAR; use${FOO}_BARwhenBARis literal text. Missing environment variables make the value unresolved. - Escapes:
"$$"emits a literal"$";"$!"emits a literal"!"without triggering command execution."apiKey": "$$literal-dollar-prefix""apiKey": "$!literal-bang-prefix" - Literal value: Used directly. Plain uppercase strings such as
MY_API_KEYare literals; use$MY_API_KEYfor environment variables."apiKey": "sk-..."
For models.json, shell commands are resolved at request time. pi intentionally does not apply built-in TTL, stale reuse, or recovery logic for arbitrary commands. Different commands need different caching and failure strategies, and pi cannot infer the right one.
If your command is slow, expensive, rate-limited, or should keep using a previous value on transient failures, wrap it in your own script or command that implements the caching or TTL behavior you want.
/model availability checks use configured auth presence and do not execute shell commands.
Custom Headers
Section titled “Custom Headers”{ "providers": { "custom-proxy": { "baseUrl": "https://proxy.example.com/v1", "apiKey": "$MY_API_KEY", "api": "anthropic-messages", "headers": { "x-portkey-api-key": "$PORTKEY_API_KEY", "x-secret": "!op read 'op://vault/item/secret'" }, "models": [...] } }}Model Configuration
Section titled “Model Configuration”| Field | Required | Default | Description |
|---|---|---|---|
id | Yes | — | Model identifier (passed to the API) |
name | No | id | Human-readable model label. Used for matching (--model patterns) and shown as secondary model detail text. |
api | No | provider’s api | Override provider’s API for this model |
reasoning | No | false | Supports extended thinking |
thinkingLevelMap | No | omitted | Maps pi thinking levels to provider values and marks unsupported levels (see below) |
input | No | ["text"] | Input types: ["text"] or ["text", "image"] |
contextWindow | No | 128000 | Context window size in tokens |
maxTokens | No | 16384 | Maximum output tokens |
cost | No | all zeros | {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0} (per million tokens) |
compat | No | provider compat | Provider compatibility overrides. Merged with provider-level compat when both are set. |
Current behavior:
/model,--list-models, and the interactive footer display entries by modelid.- The configured
nameis used for model matching and secondary model detail text. It does not replace the footer/status-bar model id.
Thinking Level Map
Section titled “Thinking Level Map”Use thinkingLevelMap on a model to describe model-specific thinking controls. Keys are pi thinking levels: off, minimal, low, medium, high, xhigh.
Values are tristate:
| Value | Meaning |
|---|---|
| omitted | Level is supported and uses the provider’s default mapping |
| string | Level is supported and this value is sent to the provider |
null | Level is unsupported and hidden/skipped/clamped away |
Example for a model that only supports off, high, and max reasoning:
{ "id": "deepseek-v4-pro", "reasoning": true, "thinkingLevelMap": { "minimal": null, "low": null, "medium": null, "high": "high", "xhigh": "max" }}Example for a model where thinking cannot be disabled:
{ "id": "always-thinking-model", "reasoning": true, "thinkingLevelMap": { "off": null }}Migration: older configs that used compat.reasoningEffortMap should move that mapping to model-level thinkingLevelMap. Use null for levels that should not appear in the UI.
Overriding Built-in Providers
Section titled “Overriding Built-in Providers”Route a built-in provider through a proxy without redefining models:
{ "providers": { "anthropic": { "baseUrl": "https://my-proxy.example.com/v1" } }}All built-in Anthropic models remain available. Existing OAuth or API key auth continues to work.
To merge custom models into a built-in provider, include the models array:
{ "providers": { "anthropic": { "baseUrl": "https://my-proxy.example.com/v1", "apiKey": "$ANTHROPIC_API_KEY", "api": "anthropic-messages", "models": [...] } }}Merge semantics:
- Built-in models are kept.
- Custom models are upserted by
idwithin the provider. - If a custom model
idmatches a built-in modelid, the custom model replaces that built-in model. - If a custom model
idis new, it is added alongside built-in models.
Per-model Overrides
Section titled “Per-model Overrides”Use modelOverrides to customize specific built-in models without replacing the provider’s full model list.
{ "providers": { "openrouter": { "modelOverrides": { "anthropic/claude-sonnet-4": { "name": "Claude Sonnet 4 (Bedrock Route)", "compat": { "openRouterRouting": { "only": ["amazon-bedrock"] } } } } } }}modelOverrides supports these fields per model: name, reasoning, input, cost (partial), contextWindow, maxTokens, headers, compat.
Behavior notes:
modelOverridesare applied to built-in provider models.- Unknown model IDs are ignored.
- You can combine provider-level
baseUrl/headerswithmodelOverrides. - Overriding
namechanges model matching and secondary detail text only; the footer and primary model lists continue to show the modelid. - If
modelsis also defined for a provider, custom models are merged after built-in overrides. A custom model with the sameidreplaces the overridden built-in model entry.
Anthropic Messages Compatibility
Section titled “Anthropic Messages Compatibility”For providers or proxies using api: "anthropic-messages", use compat to control Anthropic-specific request compatibility.
By default pi sends per-tool eager_input_streaming: true. If a proxy or Anthropic-compatible backend rejects that field, set supportsEagerToolInputStreaming to false. Pi will omit tools[].eager_input_streaming and send the legacy fine-grained-tool-streaming-2025-05-14 beta header for tool-enabled requests instead.
Some Anthropic models require adaptive thinking (thinking.type: "adaptive" plus output_config.effort) instead of the legacy budget-based thinking payload. Built-in models set this automatically. For custom providers or aliases that route to those models, set forceAdaptiveThinking to true.
Some Anthropic-compatible providers emit thinking blocks with empty signatures and still expect them on replay. Set allowEmptySignature to true only for those providers; real Anthropic rejects empty thinking signatures.
{ "providers": { "anthropic-proxy": { "baseUrl": "https://proxy.example.com", "api": "anthropic-messages", "apiKey": "$ANTHROPIC_PROXY_KEY", "compat": { "supportsEagerToolInputStreaming": false, "supportsLongCacheRetention": true, "forceAdaptiveThinking": true, "allowEmptySignature": true }, "models": [ { "id": "claude-opus-4-7", "reasoning": true, "input": ["text", "image"] } ] } }}| Field | Description |
|---|---|
supportsEagerToolInputStreaming | Whether the provider accepts per-tool eager_input_streaming. Default: true. Set to false to omit that field and use the legacy fine-grained tool streaming beta header on tool-enabled requests. |
supportsLongCacheRetention | Whether the provider accepts Anthropic long cache retention (cache_control.ttl: "1h") when cache retention is long. Default: true. |
sendSessionAffinityHeaders | Whether to send x-session-affinity from the session id when caching is enabled. Default: auto-detected for known providers. |
supportsCacheControlOnTools | Whether the provider accepts Anthropic-style cache_control markers on tool definitions. Default: true. |
forceAdaptiveThinking | Whether to send adaptive thinking (thinking.type: "adaptive" plus output_config.effort) for this model. Built-in adaptive models set this automatically. Default: false. |
allowEmptySignature | Whether to replay empty thinking signatures as signature: "" instead of converting thinking to text. Default: false. |
OpenAI Compatibility
Section titled “OpenAI Compatibility”For providers with partial OpenAI compatibility, use the compat field.
- Provider-level
compatapplies defaults to all models under that provider. - Model-level
compatoverrides provider-level values for that model.
{ "providers": { "local-llm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "compat": { "supportsUsageInStreaming": false, "maxTokensField": "max_tokens" }, "models": [...] } }}| Field | Description |
|---|---|
supportsStore | Provider supports store field |
supportsDeveloperRole | Use developer vs system role |
supportsReasoningEffort | Support for reasoning_effort parameter |
supportsUsageInStreaming | Supports stream_options: { include_usage: true } (default: true) |
maxTokensField | Use max_completion_tokens or max_tokens |
requiresToolResultName | Include name on tool result messages |
requiresAssistantAfterToolResult | Insert an assistant message before a user message after tool results |
requiresThinkingAsText | Convert thinking blocks to plain text |
requiresReasoningContentOnAssistantMessages | Include empty reasoning_content on all replayed assistant messages when reasoning is enabled |
thinkingFormat | Use reasoning_effort, openrouter, deepseek, together, zai, qwen, or qwen-chat-template thinking parameters |
cacheControlFormat | Use Anthropic-style cache_control markers on the system prompt, last tool definition, and last user/assistant text content. Currently only anthropic is supported. |
supportsStrictMode | Include the strict field in tool definitions |
supportsLongCacheRetention | Whether the provider accepts long cache retention when cache retention is long: prompt_cache_retention: "24h" for OpenAI prompt caching, or cache_control.ttl: "1h" when cacheControlFormat is anthropic. Default: true. |
openRouterRouting | OpenRouter provider routing preferences. This object is sent as-is in the provider field of the OpenRouter API request. |
vercelGatewayRouting | Vercel AI Gateway routing config for provider selection (only, order) |
openrouter uses reasoning: { effort }. together uses reasoning: { enabled } and also reasoning_effort when supportsReasoningEffort is enabled. qwen uses top-level enable_thinking. Use qwen-chat-template for local Qwen-compatible servers that require chat_template_kwargs.enable_thinking.
cacheControlFormat: "anthropic" is for OpenAI-compatible providers that expose Anthropic-style prompt caching through cache_control markers on text content and tool definitions.
Example:
{ "providers": { "openrouter": { "baseUrl": "https://openrouter.ai/api/v1", "apiKey": "$OPENROUTER_API_KEY", "api": "openai-completions", "models": [ { "id": "openrouter/anthropic/claude-3.5-sonnet", "name": "OpenRouter Claude 3.5 Sonnet", "compat": { "openRouterRouting": { "allow_fallbacks": true, "require_parameters": false, "data_collection": "deny", "zdr": true, "enforce_distillable_text": false, "order": ["anthropic", "amazon-bedrock", "google-vertex"], "only": ["anthropic", "amazon-bedrock"], "ignore": ["gmicloud", "friendli"], "quantizations": ["fp16", "bf16"], "sort": { "by": "price", "partition": "model" }, "max_price": { "prompt": 10, "completion": 20 }, "preferred_min_throughput": { "p50": 100, "p90": 50 }, "preferred_max_latency": { "p50": 1, "p90": 3, "p99": 5 } } } } ] } }}Vercel AI Gateway example:
{ "providers": { "vercel-ai-gateway": { "baseUrl": "https://ai-gateway.vercel.sh/v1", "apiKey": "$AI_GATEWAY_API_KEY", "api": "openai-completions", "models": [ { "id": "moonshotai/kimi-k2.5", "name": "Kimi K2.5 (Fireworks via Vercel)", "reasoning": true, "input": ["text", "image"], "cost": { "input": 0.6, "output": 3, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": 262144, "maxTokens": 262144, "compat": { "vercelGatewayRouting": { "only": ["fireworks", "novita"], "order": ["fireworks", "novita"] } } } ] } }}