Claude Versatile is a multi-model AI orchestration framework that lets Claude Code act as the primary controller, delegating sub-tasks to external AI models (OpenAI GPT, Grok, future Gemini) through the Model Context Protocol (MCP). I built this system to solve a real problem in AI-assisted development: no single model excels at everything. By keeping Claude in control while routing specialized work to the best-fit model, the framework combines the strengths of multiple AI providers without sacrificing the unified development experience.
Codex Delegation Demo
Two-Layer Architecture
The system splits into two layers, each targeting a different complexity level. Layer 1 handles lightweight, single-shot API calls (code review, web search, generation). Layer 2 runs an autonomous Agent with its own reasoning loop for complex multi-step analysis that would exceed a single API call’s capacity.
flowchart TD
subgraph Claude["Claude Code (Orchestrator)"]
CC[Claude Code CLI]
SK[Skills Layer]
end
subgraph L1["Layer 1: Direct API Calls"]
MCP1[codex MCP Server]
MCP2[grok MCP Server]
MCP3[future providers...]
end
subgraph L2["Layer 2: Agent Delegation"]
AMCP[agent MCP Server]
subgraph Worker["Agent Worker Process"]
P[Planner]
CM[Context Manager]
RT[Read-Only Tools]
end
end
subgraph Models["External AI Models"]
GPT[OpenAI GPT-5.4]
GRK[Grok-4]
GMN[Gemini...]
end
CC --> SK
SK --> MCP1
SK --> MCP2
SK --> MCP3
CC --> AMCP
AMCP --> Worker
MCP1 --> GPT
MCP2 --> GRK
MCP3 --> GMN
Worker --> GPT
Worker --> GRK
P --> CM
CM --> RT
All external models are strictly read-only. They cannot modify files, run shell commands, or access git. Every code suggestion returns as plain text for Claude to review and apply. This preserves Claude Code’s rewind mechanism for full rollback capability.
Declarative Provider Framework
Adding a new AI model provider to the system requires roughly 25 lines of code. I designed a defineProvider() lifecycle framework that handles configuration loading, environment variable injection, client creation, and error mapping automatically. Developers only implement the onRegisterTools hook to define their MCP tools.
// A complete MCP Server for any OpenAI-compatible API
defineProvider({
type: "openai",
name: "claude-versatile-codex",
version: "0.3.0",
configFile: "codex.agent.json",
onRegisterTools(server, ctx) {
server.tool("codex_chat", schema, async (params) => {
const result = await ctx.complete({
model: params.model,
messages: [{ role: "user", content: params.prompt }],
});
return { content: [{ type: "text", text: result.content }] };
});
},
});
The framework supports two provider types: "openai" for OpenAI-compatible APIs (automatic config and client handling) and "native" for custom SDK integrations (user implements onCreateClient). The lifecycle flows through four stages, each with sensible defaults that can be selectively overridden:
flowchart LR
A["onLoadConfig"] --> B["onCreateClient"]
B --> C["onRegisterTools"]
C --> D["onServerReady"]
For OpenAI-compatible providers, the onRegisterTools hook receives a context object with ctx.complete(), a convenience method that encapsulates the full pipeline of message building, completion execution, usage formatting, and error mapping in a single call. Native SDK providers get full control over client creation and tool registration while the framework still handles config loading and server startup.
Request Flow
When Claude delegates a task, the request flows through a well-defined pipeline. The MCP Server lazily initializes its API client on the first tool call (so the server can start without a valid API key), normalizes the response into a CompletionResult format, and maps any provider-specific errors to user-friendly MCP responses.
sequenceDiagram
participant U as User
participant C as Claude Code
participant M as MCP Server
participant P as Provider API
U->>C: "Use codex to review this function"
C->>C: Select tool: codex_chat
C->>M: MCP tool call (prompt, model, params)
M->>M: Load config from .versatile/
M->>M: Initialize client (lazy)
M->>P: chat.completions.create()
P-->>M: CompletionResult (content, usage)
M-->>C: MCP response (text + usage footer)
C->>C: Review result, decide next action
C-->>U: Present findings
Data-Driven Model Routing
Adding support for a new AI provider requires exactly one line in the route table. The MODEL_ROUTES configuration maps model name prefixes to their API credentials (config file paths and environment variable names). The Agent’s collectEnv() function traverses this table to dynamically load all provider configurations, and the Worker’s createProviderFromEnv() uses resolveModelRoute() to create the correct CompletionProvider at runtime.
graph LR
M[model name] --> R{MODEL_ROUTES}
R -->|"grok-*"| G[grok.agent.json]
R -->|"default"| O[codex.agent.json]
G --> P[CompletionProvider]
O --> P
P --> Planner
This means users can switch between GPT-5.4, Grok-4, or any future model by simply passing a model parameter. No code changes, no server restarts. The Planner depends on the CompletionProvider interface rather than any specific SDK, so future non-OpenAI adapters (Gemini, Claude native) can plug in without changing Planner code.
Autonomous Agent with Adaptive Control
The Agent runs as an independent child process with its own LLM-driven ReAct loop (Think, Act, Observe, Decide). It reads files, searches patterns, builds understanding incrementally, and returns structured analysis. Process isolation means a crashed Agent never takes down the MCP Server.
Agent ReAct Loop
The MCP Server forks a Worker process via IPC. The Worker drives the Planner, which calls the external LLM with OpenAI function calling (tools parameter + tool_calls response) rather than asking models to hand-write XML or JSON. This is critical for reasoning models (gpt-5.4, o1, o3) where the content field is often null, but tool_calls always returns correctly.
sequenceDiagram
participant C as Claude Code
participant M as MCP Server
participant W as Worker Process
participant LLM as External LLM
C->>M: agent_execute(goal, model, autoMode)
M->>W: fork() + IPC start
opt autoMode enabled
W->>LLM: Prompt with goal + plan tool
LLM-->>W: tool_calls: plan(estimated_steps)
W->>W: Set effectiveMax = ceil(estimated * 1.5)
end
loop ReAct Loop
W->>LLM: Goal + context + tool definitions
LLM-->>W: tool_calls: read_file / search_pattern / ...
W->>W: Execute read-only tools
W->>W: Update context (sliding window + summary)
W->>W: Check: iterations, tokens, repetition
W->>M: IPC status update
alt LLM calls done tool
W->>M: IPC complete(result)
end
end
M-->>C: Formatted result (summary, files, tokens)
The Agent’s built-in tool set is strictly read-only: read_file, list_dir, search_pattern, plan (autoMode only), and done. All paths are validated with resolveSafePath() to prevent directory traversal. The Context Manager uses a sliding window with automatic summarization (triggered at 80% of max context, compresses to 50%) to prevent token explosion on long-running tasks.
I implemented a two-layer adaptive iteration control system (autoMode) that automatically manages how long the Agent runs:
L1 Complexity Estimation
The Agent calls a plan tool on its first iteration, outputting an estimated_steps count. The Planner dynamically sets effectiveMaxIterations = ceil(estimated * 1.5). This prevents simple tasks from burning through unnecessary iterations while giving complex tasks room to breathe.
L2 Runtime Guards
L2 guards operate continuously during execution:
- Repetition Detection tracks the last 5 tool calls in a sliding window. Two consecutive identical calls trigger a redirect message. Two redirects force termination.
- Token Budget caps cumulative token consumption (default 100k). The Agent stops gracefully when the budget is exhausted.
stateDiagram-v2
[*] --> Planning: agent_execute(goal)
Planning --> Executing: plan tool sets effectiveMax
Executing --> Thinking: ReAct Loop
Thinking --> Acting: Select tool
Acting --> Observing: Execute tool
Observing --> Thinking: Continue reasoning
Thinking --> Done: done tool called
Thinking --> Terminated: Max iterations reached
Observing --> Terminated: Token budget exceeded
Acting --> Redirected: Repetition detected
Redirected --> Thinking: Inject redirect hint
Redirected --> Terminated: 2nd redirect
Terminated --> [*]
Done --> [*]
When autoMode is disabled, the system falls back to a fixed maxIterations count, and the plan tool is hidden from the LLM.
Skill Encapsulation and Configuration
Skill Layer
Skills are optional behavior orchestration layers that sit on top of MCP tools. While the tools themselves are self-describing and work without Skills, the Skill layer adds intelligent context assembly and result presentation.
The codex-task Skill automatically collects relevant code context (file contents, project structure, dependency graphs), assembles a structured prompt with appropriate token budgets, and delegates to codex_chat. The grok-search Skill analyzes search intent (factual, news, technical, comparative, exploratory), optimizes the query, selects a matching system prompt, and delegates to grok_search.
Skills are defined as YAML/Markdown files in .claude/skills/, making them version-controllable and shareable. The separation between MCP (tool capability) and Skill (behavior orchestration) keeps each layer focused: MCP Servers can be installed independently via npm, while Skills provide the optional intelligence layer.
Unified Configuration and Timeout Strategy
All configuration lives in a .versatile/ directory (gitignored), with one JSON file per provider. Values are resolved through a three-level priority chain:
flowchart LR
A[".versatile/*.json"] -->|not found| B["process.env"]
B -->|not found| C["Hardcoded default"]
Missing files are auto-generated from templates on first run. Placeholder API keys (YOUR_API_KEY_HERE) are detected and treated as missing, falling back to environment variables with a warning. This means the server process can launch and register tools even before the user has configured their API key.
The timeout system operates at two independent levels: per-call timeout (singleCallTimeout, default 2 minutes) controls individual LLM API requests using the OpenAI SDK’s built-in retry with exponential backoff (0.5s x 2^n, max 8s, 25% jitter, covering 408/429/500+ transient errors), while the task-level maxTimeMs (default 5 minutes) caps the total Agent execution time. The Agent checks all termination conditions (iteration count, token budget, repetition, time, abort signal, done tool) on every cycle, and the first condition to trigger wins.
Design Philosophy
Claude Stays in Control — External models are tools, not peers. Claude maintains full context awareness, reviews all suggestions, and executes all code modifications. This ensures the rewind mechanism works and prevents unauthorized changes from external models.
Extensibility Through Data, Not Code — The provider framework, model routing table, and Skill definitions are all data-driven. Adding a new model, a new provider, or a new behavior pattern requires configuration changes, not architectural rewrites.
Isolation as a Feature — The Agent runs in a child process. MCP Servers self-read their configuration. Skills are optional overlays. Each component can fail independently without cascading. This isolation also enables future enhancements (sandboxed code execution, parallel agents) without restructuring the core.
Read-Only by Default — External models cannot write files, run commands, or touch git. This is not a limitation but a deliberate safety boundary. All modifications flow through Claude, creating a single auditable point of control with full rollback capability.
Progressive Complexity — Layer 1 handles 80% of use cases with simple API calls. Layer 2 activates only when the task genuinely requires multi-step reasoning. Skills add intelligence only when the base tools are insufficient. The system avoids unnecessary complexity at every level.