Claude Versatile is a multi-model AI orchestration framework that lets Claude Code act as the primary controller, delegating sub-tasks to external AI models (OpenAI GPT, Grok, future Gemini) through the Model Context Protocol (MCP). I built this system to solve a real problem in AI-assisted development: no single model excels at everything. By keeping Claude in control while routing specialized work to the best-fit model, the framework combines the strengths of multiple AI providers without sacrificing the unified development experience.
Codex Delegation Demo
Two-Layer Architecture
The system splits into two layers, each targeting a different complexity level. Layer 1 handles lightweight, single-shot API calls (code review, web search, generation). Layer 2 runs an autonomous Agent with its own reasoning loop for complex multi-step analysis that would exceed a single API call’s capacity.
flowchart TD
subgraph Claude["Claude Code (Orchestrator)"]
CC[Claude Code CLI]
SK[Skills Layer]
end
subgraph L1["Layer 1: Direct API Calls"]
MCP1[codex MCP Server]
MCP2[grok MCP Server]
MCP3[future providers...]
end
subgraph L2["Layer 2: Agent Delegation"]
AMCP[agent MCP Server]
subgraph Worker["Agent Worker Process"]
P[Planner]
CM[Context Manager]
subgraph TR["ToolRegistry (Plugin)"]
FS[filesystem/]
CR[core/]
GK[grok/]
CX[codex/]
end
end
end
subgraph Models["External AI Models"]
GPT[OpenAI GPT-5.4]
GRK[Grok-4]
GMN[Gemini...]
end
CC --> SK
SK --> MCP1
SK --> MCP2
SK --> MCP3
CC --> AMCP
AMCP --> Worker
MCP1 --> GPT
MCP2 --> GRK
MCP3 --> GMN
P --> CM
CM --> TR
GK --> GRK
Worker --> GPT
Worker --> GRK
All external models are strictly read-only. They cannot modify files, run shell commands, or access git. Every code suggestion returns as plain text for Claude to review and apply. This preserves Claude Code’s rewind mechanism for full rollback capability.
Declarative Provider Framework
Adding a new AI model provider to the system requires roughly 25 lines of code. I designed a defineProvider() lifecycle framework that handles configuration loading, environment variable injection, client creation, and error mapping automatically. Developers only implement the onRegisterTools hook to define their MCP tools.
// A complete MCP Server for any OpenAI-compatible API
defineProvider({
type: "openai",
name: "claude-versatile-codex",
version: "0.3.0",
configFile: "codex.agent.json",
onRegisterTools(server, ctx) {
server.tool("codex_chat", schema, async (params) => {
const result = await ctx.complete({
model: params.model,
messages: [{ role: "user", content: params.prompt }],
});
return { content: [{ type: "text", text: result.content }] };
});
},
});
The framework supports two provider types: "openai" for OpenAI-compatible APIs (automatic config and client handling) and "native" for custom SDK integrations (user implements onCreateClient). The lifecycle flows through four stages, each with sensible defaults that can be selectively overridden:
flowchart LR
A["onLoadConfig"] --> B["onCreateClient"]
B --> C["onRegisterTools"]
C --> D["onServerReady"]
For OpenAI-compatible providers, the onRegisterTools hook receives a context object with ctx.complete(), a convenience method that encapsulates the full pipeline of message building, completion execution, usage formatting, and error mapping in a single call. Native SDK providers get full control over client creation and tool registration while the framework still handles config loading and server startup.
Request Flow
When Claude delegates a task, the request flows through a well-defined pipeline. The MCP Server lazily initializes its API client on the first tool call (so the server can start without a valid API key), normalizes the response into a CompletionResult format, and maps any provider-specific errors to user-friendly MCP responses.
sequenceDiagram
participant U as User
participant C as Claude Code
participant M as MCP Server
participant P as Provider API
U->>C: "Use codex to review this function"
C->>C: Select tool: codex_chat
C->>M: MCP tool call (prompt, model, params)
M->>M: Load config from .versatile/
M->>M: Initialize client (lazy)
M->>P: chat.completions.create()
P-->>M: CompletionResult (content, usage)
M-->>C: MCP response (text + usage footer)
C->>C: Review result, decide next action
C-->>U: Present findings
Data-Driven Model Routing
Adding support for a new AI provider requires exactly one line in the route table. The MODEL_ROUTES configuration maps model name prefixes to their API credentials (config file paths and environment variable names) and capability flags (such as supportsFunctionCalling). The Agent’s collectEnv() function traverses this table to dynamically load all provider configurations, and the Worker’s createProviderFromEnv() uses resolveModelRoute() to create the correct CompletionProvider at runtime.
graph LR
M[model name] --> R{MODEL_ROUTES}
R -->|"grok-*"| G["grok.agent.json (FC: false)"]
R -->|"default"| O["codex.agent.json (FC: true)"]
G --> P[CompletionProvider]
O --> P
P --> Planner
This means users can switch between GPT-5.4, Grok-4, or any future model by simply passing a model parameter. No code changes, no server restarts. The Planner depends on the CompletionProvider interface rather than any specific SDK, and automatically adapts its tool calling strategy based on the route’s supportsFunctionCalling flag. Future non-OpenAI adapters (Gemini, Claude native) can plug in without changing Planner code.
Autonomous Agent with Adaptive Control
The Agent runs as an independent child process with its own LLM-driven ReAct loop (Think, Act, Observe, Decide). It reads files, searches patterns, builds understanding incrementally, and returns structured analysis. Process isolation means a crashed Agent never takes down the MCP Server.
Agent ReAct Loop
The MCP Server forks a Worker process via IPC. The Worker drives the Planner, which communicates with the external LLM through one of two modes: OpenAI function calling (tools parameter + tool_calls response) for models that support it, or prompt-based XML format (<thought> + <action>) for models that don’t. Function calling is preferred for reasoning models (gpt-5.4, o1, o3) where the content field is often null, but tool_calls always returns correctly. The mode is selected automatically based on the supportsFunctionCalling flag in MODEL_ROUTES.
sequenceDiagram
participant C as Claude Code
participant M as MCP Server
participant W as Worker Process
participant LLM as External LLM
C->>M: agent_execute(goal, model, autoMode)
M->>W: fork() + IPC start
opt autoMode enabled
W->>LLM: Prompt with goal + plan tool
LLM-->>W: tool_calls: plan(estimated_steps)
W->>W: Set effectiveMax = ceil(estimated * 1.5)
end
loop ReAct Loop
W->>LLM: Goal + context + tool definitions
LLM-->>W: tool_calls: read_file / search_pattern / ...
W->>W: Execute read-only tools
W->>W: Update context (sliding window + summary)
W->>W: Check: iterations, tokens, repetition
W->>M: IPC status update
alt LLM calls done tool
W->>M: IPC complete(result)
end
end
M-->>C: Formatted result (summary, files, tokens)
The Agent’s built-in tool set is strictly read-only: read_file, list_dir, search_pattern, web_search (Grok-powered, conditional on API key), plan (autoMode only), and done. All file paths are validated with resolveSafePath() to prevent directory traversal. The Context Manager uses a sliding window with automatic summarization (triggered at 80% of max context, compresses to 50%) to prevent token explosion on long-running tasks.
I implemented a two-layer adaptive iteration control system (autoMode) that automatically manages how long the Agent runs:
L1 Complexity Estimation
The Agent calls a plan tool on its first iteration, outputting an estimated_steps count. The Planner dynamically sets effectiveMaxIterations = ceil(estimated * 1.5). This prevents simple tasks from burning through unnecessary iterations while giving complex tasks room to breathe.
L2 Runtime Guards
L2 guards operate continuously during execution:
- Repetition Detection tracks the last 5 tool calls in a sliding window. Two consecutive identical calls trigger a redirect message. Two redirects force termination.
- Token Budget caps cumulative token consumption (default 100k). The Agent stops gracefully when the budget is exhausted.
stateDiagram-v2
[*] --> Planning: agent_execute(goal)
Planning --> Executing: plan tool sets effectiveMax
Executing --> Thinking: ReAct Loop
Thinking --> Acting: Select tool
Acting --> Observing: Execute tool
Observing --> Thinking: Continue reasoning
Thinking --> Done: done tool called
Thinking --> Terminated: Max iterations reached
Observing --> Terminated: Token budget exceeded
Acting --> Redirected: Repetition detected
Redirected --> Thinking: Inject redirect hint
Redirected --> Terminated: 2nd redirect
Terminated --> [*]
Done --> [*]
When autoMode is disabled, the system falls back to a fixed maxIterations count, and the plan tool is hidden from the LLM.
Plugin-Based Tool Architecture
In v0.3.0-alpha.5, I refactored the Agent’s monolithic tool system into a plugin-based architecture. The original tools.ts (270 lines of tightly coupled tool definitions) was split into categorized directories (core/, filesystem/, grok/, codex/), each registering tools through a central ToolRegistry class. This makes the tool system open for extension without modifying existing code.
flowchart TD
subgraph Registry["ToolRegistry"]
R[register / getEnabled / has]
end
subgraph Core["core/"]
P[plan]
D[done]
end
subgraph FS["filesystem/"]
RF[read_file]
LD[list_dir]
SP[search_pattern]
end
subgraph Grok["grok/"]
WS[web_search]
end
subgraph Codex["codex/"]
FT[future tools...]
end
Core --> Registry
FS --> Registry
Grok --> Registry
Codex --> Registry
Registry --> Worker
The createBuiltinRegistry(env) factory function conditionally registers provider-specific tools based on available API keys in the environment. If GROK_API_KEY is present, the Agent gains a web_search tool powered by Grok’s built-in search capability. If OPENAI_API_KEY is present, future Codex-specific tools become available. Core and filesystem tools are always registered.
Metadata-Driven Planner
Each tool now carries an AgentToolMetadata object that drives Planner behavior declaratively:
interface AgentToolMetadata {
category?: "core" | "filesystem" | "external" | "custom";
tracksFileRead?: boolean; // Planner tracks 'path' arg in filesRead
skipRepetitionCheck?: boolean; // Exempt from L2 repetition detection
systemPromptHint?: string; // Injected into system prompt when tool is available
}
Previously, the Planner contained hardcoded tool name checks (if (toolName === "read_file") ...). Now it reads metadata to decide behavior: whether to track file reads, whether to skip repetition detection for certain tools, and what hints to inject into the system prompt. This means third-party tools can declare their own Planner behavior without modifying Planner code.
Config-Driven Tool Enablement
The enabledTools field in agent.json controls which tools the Agent exposes to the LLM. This replaces the previous hardcoded tool list and allows users to customize the Agent’s capabilities per-project:
{
"enabledTools": ["read_file", "list_dir", "search_pattern", "done", "plan", "web_search"]
}
Omitting a tool from this list hides it from the LLM entirely. The ToolRegistry.getEnabled(names) method filters the registered tools to only those specified, so the Planner never sees tools the user has disabled.
Multi-Model Tool Calling Adaptation
Not all models support OpenAI’s function calling protocol. Grok-4 via the realseek proxy, for example, returns plain text instead of structured tool_calls. I implemented a dual-mode system that automatically adapts to each model’s capabilities.
flowchart TD
M[Model Request] --> Check{supportsFunctionCalling?}
Check -->|true| FC[Function Calling Mode]
Check -->|false| XML[XML Prompt Mode]
FC --> Tools["tools param + tool_calls response"]
XML --> Prompt["System prompt describes tools"]
XML --> Format["LLM outputs <thought> + <action> XML"]
XML --> Parse["parseLegacyContent extracts tool calls"]
Tools --> Norm[Normalized CompletionResult]
Parse --> Norm
The MODEL_ROUTES table now includes a supportsFunctionCalling flag per model prefix. When the Planner detects a model without function calling support, it:
- Omits the
toolsparameter from the API request - Injects XML format instructions into the system prompt, describing available tools and the expected
<thought>+<action>response format - Parses the LLM’s plain-text response using
parseLegacyContentto extract tool calls - The
ContextManagerconvertstool_call/toolmessages to plainassistant/usermessages for models that don’t understand the function calling message format
This adaptation is transparent to the rest of the system. The Planner always receives a normalized CompletionResult with content, toolCalls, and usage fields regardless of which mode was used. Future non-OpenAI adapters (Gemini’s functionDeclarations, Claude’s tool_use) can be implemented in lib/adapters/ without changing Planner code.
Skill Encapsulation and Configuration
Skill Layer
Skills are optional behavior orchestration layers that sit on top of MCP tools. While the tools themselves are self-describing and work without Skills, the Skill layer adds intelligent context assembly and result presentation.
The codex-task Skill automatically collects relevant code context (file contents, project structure, dependency graphs), assembles a structured prompt with appropriate token budgets, and delegates to codex_chat. The grok-search Skill analyzes search intent (factual, news, technical, comparative, exploratory), optimizes the query, selects a matching system prompt, and delegates to grok_search.
Skills are defined as YAML/Markdown files in .claude/skills/, making them version-controllable and shareable. The separation between MCP (tool capability) and Skill (behavior orchestration) keeps each layer focused: MCP Servers can be installed independently via npm, while Skills provide the optional intelligence layer.
Unified Configuration and Timeout Strategy
All configuration lives in a .versatile/ directory (gitignored), with one JSON file per provider. Values are resolved through a three-level priority chain:
flowchart LR
A[".versatile/*.json"] -->|not found| B["process.env"]
B -->|not found| C["Hardcoded default"]
Missing files are auto-generated from templates on first run. Placeholder API keys (YOUR_API_KEY_HERE) are detected and treated as missing, falling back to environment variables with a warning. This means the server process can launch and register tools even before the user has configured their API key.
The timeout system operates at two independent levels: per-call timeout (singleCallTimeout, default 2 minutes) controls individual LLM API requests using the OpenAI SDK’s built-in retry with exponential backoff (0.5s x 2^n, max 8s, 25% jitter, covering 408/429/500+ transient errors), while the task-level maxTimeMs (default 5 minutes) caps the total Agent execution time. The Agent checks all termination conditions (iteration count, token budget, repetition, time, abort signal, done tool) on every cycle, and the first condition to trigger wins.
Design Philosophy
Claude Stays in Control — External models are tools, not peers. Claude maintains full context awareness, reviews all suggestions, and executes all code modifications. This ensures the rewind mechanism works and prevents unauthorized changes from external models.
Extensibility Through Data, Not Code — The provider framework, model routing table, tool metadata, and Skill definitions are all data-driven. Adding a new model, a new provider, a new Agent tool, or a new behavior pattern requires configuration changes, not architectural rewrites. The plugin-based tool system extends this principle: third-party tools declare their Planner behavior through metadata rather than requiring Planner modifications.
Isolation as a Feature — The Agent runs in a child process. MCP Servers self-read their configuration. Skills are optional overlays. Each component can fail independently without cascading. This isolation also enables future enhancements (sandboxed code execution, parallel agents) without restructuring the core.
Read-Only by Default — External models cannot write files, run commands, or touch git. This is not a limitation but a deliberate safety boundary. All modifications flow through Claude, creating a single auditable point of control with full rollback capability.
Progressive Complexity — Layer 1 handles 80% of use cases with simple API calls. Layer 2 activates only when the task genuinely requires multi-step reasoning. Skills add intelligence only when the base tools are insufficient. The dual-mode tool calling system adapts transparently to model capabilities without exposing complexity to the user. The system avoids unnecessary complexity at every level.