Contributing

swarm-mcp is an open-source project and welcomes contributions. This page covers development setup, project architecture, and how to add new features.

Development Setup

Prerequisites

Python 3.12+
uv — Python package manager
Docker — for running integration tests
Claude Code CLI — with OAuth configured (claude login)

Clone and install

git clone https://github.com/ahodge/swarm-mcp
cd swarm-mcp

# Install in editable mode with dev dependencies
uv sync --dev

Build the agent image

# Copy the claude and uv binaries into the project root
cp "$(which claude)" ./claude
cp "$(which uv)" ./uv

# Build the Docker image
docker build -t swarm-agent .

Run the server locally

uv run swarm-mcp

Or via MCP in Claude Code, pointing at the local checkout:

{
  "mcpServers": {
    "swarm": {
      "command": "uv",
      "args": ["--directory", "/path/to/swarm-mcp", "run", "swarm-mcp"]
    }
  }
}

Running Tests

# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run a specific test file
uv run pytest tests/test_monads.py

# Run tests matching a name pattern
uv run pytest -k "test_encrypt"

Test directory

The tests/ directory at the project root is where all tests live. Add new test files named test_<module>.py following pytest conventions.

Architecture Overview

The server is split into focused modules. Understanding which layer to touch for a given change is the key to navigating the codebase quickly.

src/swarm_mcp/
├── server.py      # MCP tool definitions (the public API)
├── agent.py       # Docker execution and output parsing
├── sandbox.py     # SandboxSpec dataclass and resolution
├── monads.py      # Ref enrichment: provenance, cost, encryption, classification
├── types.py       # Natural language type system (load, resolve, validate)
├── registry.py    # Search paths for sandboxes, types, pipelines
└── docker.py      # Docker command construction and image management

`server.py` — tool definitions

Every MCP tool (run, par, map, reduce, map_reduce, chain, pipeline, unwrap, inspect, encrypt, decrypt, classify, guard, filter, race, retry, validate, wrap, wrap_project, save_sandbox_spec, list_sandbox_specs, list_type_registry, get_type_definition) is registered here using @mcp.tool() decorators.

Tool handlers parse arguments, call lower-level modules, and format the return value. They do not contain business logic directly — they delegate to agent.py, monads.py, and types.py.

The global concurrency semaphore (_semaphore) and resource pool dict (_resource_pools) live at module level in server.py.

`agent.py` — execution engine

run_agent(prompt, spec, run_id, agent_id) is the core function. It:

Creates the output directory at /tmp/swarm-mcp/{run_id}/{agent_id}/
Injects type context into the prompt (via types.build_type_context)
Writes prompt.txt
Calls _setup_agent_home() to generate a minimal HOME with claude config, MCP config, CLAUDE.md, and the PostToolUse hook
Builds the Docker command (via docker.get_docker_run_cmd)
Launches the container with stdin from prompt.txt and stdout to stream.jsonl
Waits for completion (or kills on timeout)
Parses stream.jsonl via _parse_stream_output()
Writes result.json and returns an AgentResult

`sandbox.py` — sandbox specs

SandboxSpec is a dataclass capturing everything needed to configure an agent container. resolve_sandbox() accepts a name (looked up via registry), a raw JSON string, or None (defaults), and applies any inline overrides on top.

`monads.py` — ref enrichment

Contains all the "monad" stamps applied to refs:

stamp_provenance() — SHA-256 content hash and parent ref chain
stamp_cost() — budget tracking fields
stamp_deadline() — deadline tracking
stamp_validated() — type validation verdict
stamp_retry() — retry attempt tracking
stamp_classification() — sensitivity level and MCP allowlist
stamp_encrypted() / encrypt_text() / decrypt_text() — Fernet encryption

enrich_ref() is the single call that applies all relevant stamps in one shot.

`types.py` — type system

Loads .md type files from registry search paths. resolve_type() walks [type-name] references recursively (up to depth 3). build_validation_prompt() generates a structured prompt for a validator agent. build_type_context() injects input_type / output_type descriptions into agent prompts.

`registry.py` — search paths

Manages three search path lists (pipelines, sandboxes, types). Priority order: explicitly registered paths → SWARM_PROJECT_DIR env var → ~/.claude/. wrap_file() copies host files into the ref namespace. wrap_project() registers a project directory's subdirectories.

`docker.py` — container management

get_docker_run_cmd() assembles the full docker run command from a SandboxSpec. Handles GPU flags, memory/CPU limits, network mode, volume mounts, MCP project mounts, the PostToolUse hook mount, and all Claude CLI flags (--model, --allowedTools, --output-format stream-json, etc.). ensure_image() auto-builds the swarm-agent image if it's missing.

How to Add a New Combinator Tool

Here is the full checklist for adding a new MCP tool — for example, a hypothetical tournament() combinator.

1. Define the tool in `server.py`

@mcp.tool()
def tournament(
    tasks: str,
    rounds: int = 2,
    model: str = "sonnet",
    max_concurrency: int = 4,
) -> str:
    """Run tasks in elimination rounds, keeping the best result per round.

    Args:
        tasks: JSON array of prompt strings.
        rounds: Number of elimination rounds.
        model: Claude model for all agents.
        max_concurrency: Max agents running simultaneously.

    Returns:
        JSON with the winning ref and per-round results.
    """
    task_list = json.loads(tasks)
    run_id = uuid.uuid4().hex[:12]

    # ... implementation ...

    return json.dumps({"winner": winning_ref, "rounds": round_results}, indent=2)

Tools must: - Accept all arguments as strings or primitives (the MCP protocol is text-based) - Parse JSON arguments using json.loads() - Return a JSON string - Have a docstring — it becomes the tool description shown to Claude

2. Acquire resources before spawning agents

Use the global semaphore and resource pools (see existing combinators for the pattern):

# Inside your tool, before launching containers:
if not _semaphore.acquire(timeout=RESOURCE_QUEUE_TIMEOUT):
    raise TimeoutError("Could not acquire execution slot")
try:
    result = run_agent(prompt, spec, run_id, agent_id)
finally:
    _semaphore.release()

3. Use `ThreadPoolExecutor` for concurrent execution

from concurrent.futures import ThreadPoolExecutor, as_completed

with ThreadPoolExecutor(max_workers=max_concurrency) as executor:
    futures = {executor.submit(run_one, task): task for task in task_list}
    results = []
    for future in as_completed(futures):
        results.append(future.result())

4. Enrich refs with monadic context

After each agent run, call enrich_ref() to attach provenance, cost tracking, and any other monad layers:

ref = result.to_ref_dict(run_id)
enrich_ref(
    ref,
    run_id,
    text=result.text,
    parent_refs=parent_ref_ids,
    budget_limit=budget,
    spent_so_far=total_cost,
)

5. Write tests

Add a test file at tests/test_tournament.py. Test the core logic in isolation (mock run_agent so tests don't require Docker):

from unittest.mock import patch, MagicMock
from swarm_mcp.agent import AgentResult

def make_result(text="output", exit_code=0):
    return AgentResult(
        agent_id="agent-0",
        text=text,
        exit_code=exit_code,
        duration_seconds=1.0,
        cost_usd=0.01,
        model="sonnet",
        output_dir="/tmp/test",
    )

@patch("swarm_mcp.server.run_agent", return_value=make_result())
def test_tournament_returns_winner(mock_run):
    from swarm_mcp.server import tournament
    result = tournament(tasks='["task A", "task B"]', rounds=1)
    data = json.loads(result)
    assert "winner" in data

6. Update the nav (if it's a top-level page)

If your addition warrants a new documentation page, add it to mkdocs.yml:

nav:
  - Examples:
      - Tournament: examples/tournament.md

Code Style

Python 3.12+. Use type annotations on all public functions.
Use dataclasses for structured data (see SandboxSpec, AgentResult).
Keep server.py thin — business logic belongs in the module it's about.
Log with logging.getLogger(__name__). Use logger.info for normal operations, logger.warning for recoverable errors, logger.exception for unexpected failures.
Format with ruff format and lint with ruff check (not yet enforced in CI, but preferred).

Submitting Changes

Fork the repository on GitHub.
Create a branch: git checkout -b feature/my-combinator.
Make your changes and add tests.
Run uv run pytest to verify nothing is broken.
Open a pull request with a description of what the change does and why.

For large changes, open an issue first to discuss the design before writing code.