Contributing
swarm-mcp is an open-source project and welcomes contributions. This page covers development setup, project architecture, and how to add new features.
Development Setup
Prerequisites
- Python 3.12+
- uv — Python package manager
- Docker — for running integration tests
- Claude Code CLI — with OAuth configured (
claude login)
Clone and install
git clone https://github.com/ahodge/swarm-mcp
cd swarm-mcp
# Install in editable mode with dev dependencies
uv sync --dev
Build the agent image
# Copy the claude and uv binaries into the project root
cp "$(which claude)" ./claude
cp "$(which uv)" ./uv
# Build the Docker image
docker build -t swarm-agent .
Run the server locally
Or via MCP in Claude Code, pointing at the local checkout:
{
"mcpServers": {
"swarm": {
"command": "uv",
"args": ["--directory", "/path/to/swarm-mcp", "run", "swarm-mcp"]
}
}
}
Running Tests
# Run all tests
uv run pytest
# Run with verbose output
uv run pytest -v
# Run a specific test file
uv run pytest tests/test_monads.py
# Run tests matching a name pattern
uv run pytest -k "test_encrypt"
Test directory
The tests/ directory at the project root is where all tests live. Add
new test files named test_<module>.py following pytest conventions.
Architecture Overview
The server is split into focused modules. Understanding which layer to touch for a given change is the key to navigating the codebase quickly.
src/swarm_mcp/
├── server.py # MCP tool definitions (the public API)
├── agent.py # Docker execution and output parsing
├── sandbox.py # SandboxSpec dataclass and resolution
├── monads.py # Ref enrichment: provenance, cost, encryption, classification
├── types.py # Natural language type system (load, resolve, validate)
├── registry.py # Search paths for sandboxes, types, pipelines
└── docker.py # Docker command construction and image management
server.py — tool definitions
Every MCP tool (run, par, map, reduce, map_reduce, chain,
pipeline, unwrap, inspect, encrypt, decrypt, classify, guard,
filter, race, retry, validate, wrap, wrap_project, save_sandbox_spec,
list_sandbox_specs, list_type_registry, get_type_definition) is
registered here using @mcp.tool() decorators.
Tool handlers parse arguments, call lower-level modules, and format the return
value. They do not contain business logic directly — they delegate to agent.py,
monads.py, and types.py.
The global concurrency semaphore (_semaphore) and resource pool dict
(_resource_pools) live at module level in server.py.
agent.py — execution engine
run_agent(prompt, spec, run_id, agent_id) is the core function. It:
- Creates the output directory at
/tmp/swarm-mcp/{run_id}/{agent_id}/ - Injects type context into the prompt (via
types.build_type_context) - Writes
prompt.txt - Calls
_setup_agent_home()to generate a minimal HOME with claude config, MCP config, CLAUDE.md, and the PostToolUse hook - Builds the Docker command (via
docker.get_docker_run_cmd) - Launches the container with stdin from
prompt.txtand stdout tostream.jsonl - Waits for completion (or kills on timeout)
- Parses
stream.jsonlvia_parse_stream_output() - Writes
result.jsonand returns anAgentResult
sandbox.py — sandbox specs
SandboxSpec is a dataclass capturing everything needed to configure an agent
container. resolve_sandbox() accepts a name (looked up via registry), a raw
JSON string, or None (defaults), and applies any inline overrides on top.
monads.py — ref enrichment
Contains all the "monad" stamps applied to refs:
stamp_provenance()— SHA-256 content hash and parent ref chainstamp_cost()— budget tracking fieldsstamp_deadline()— deadline trackingstamp_validated()— type validation verdictstamp_retry()— retry attempt trackingstamp_classification()— sensitivity level and MCP allowliststamp_encrypted()/encrypt_text()/decrypt_text()— Fernet encryption
enrich_ref() is the single call that applies all relevant stamps in one shot.
types.py — type system
Loads .md type files from registry search paths. resolve_type() walks
[type-name] references recursively (up to depth 3). build_validation_prompt()
generates a structured prompt for a validator agent. build_type_context()
injects input_type / output_type descriptions into agent prompts.
registry.py — search paths
Manages three search path lists (pipelines, sandboxes, types). Priority
order: explicitly registered paths → SWARM_PROJECT_DIR env var → ~/.claude/.
wrap_file() copies host files into the ref namespace. wrap_project()
registers a project directory's subdirectories.
docker.py — container management
get_docker_run_cmd() assembles the full docker run command from a
SandboxSpec. Handles GPU flags, memory/CPU limits, network mode, volume
mounts, MCP project mounts, the PostToolUse hook mount, and all Claude CLI
flags (--model, --allowedTools, --output-format stream-json, etc.).
ensure_image() auto-builds the swarm-agent image if it's missing.
How to Add a New Combinator Tool
Here is the full checklist for adding a new MCP tool — for example, a
hypothetical tournament() combinator.
1. Define the tool in server.py
@mcp.tool()
def tournament(
tasks: str,
rounds: int = 2,
model: str = "sonnet",
max_concurrency: int = 4,
) -> str:
"""Run tasks in elimination rounds, keeping the best result per round.
Args:
tasks: JSON array of prompt strings.
rounds: Number of elimination rounds.
model: Claude model for all agents.
max_concurrency: Max agents running simultaneously.
Returns:
JSON with the winning ref and per-round results.
"""
task_list = json.loads(tasks)
run_id = uuid.uuid4().hex[:12]
# ... implementation ...
return json.dumps({"winner": winning_ref, "rounds": round_results}, indent=2)
Tools must:
- Accept all arguments as strings or primitives (the MCP protocol is text-based)
- Parse JSON arguments using json.loads()
- Return a JSON string
- Have a docstring — it becomes the tool description shown to Claude
2. Acquire resources before spawning agents
Use the global semaphore and resource pools (see existing combinators for the pattern):
# Inside your tool, before launching containers:
if not _semaphore.acquire(timeout=RESOURCE_QUEUE_TIMEOUT):
raise TimeoutError("Could not acquire execution slot")
try:
result = run_agent(prompt, spec, run_id, agent_id)
finally:
_semaphore.release()
3. Use ThreadPoolExecutor for concurrent execution
from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=max_concurrency) as executor:
futures = {executor.submit(run_one, task): task for task in task_list}
results = []
for future in as_completed(futures):
results.append(future.result())
4. Enrich refs with monadic context
After each agent run, call enrich_ref() to attach provenance, cost tracking,
and any other monad layers:
ref = result.to_ref_dict(run_id)
enrich_ref(
ref,
run_id,
text=result.text,
parent_refs=parent_ref_ids,
budget_limit=budget,
spent_so_far=total_cost,
)
5. Write tests
Add a test file at tests/test_tournament.py. Test the core logic in
isolation (mock run_agent so tests don't require Docker):
from unittest.mock import patch, MagicMock
from swarm_mcp.agent import AgentResult
def make_result(text="output", exit_code=0):
return AgentResult(
agent_id="agent-0",
text=text,
exit_code=exit_code,
duration_seconds=1.0,
cost_usd=0.01,
model="sonnet",
output_dir="/tmp/test",
)
@patch("swarm_mcp.server.run_agent", return_value=make_result())
def test_tournament_returns_winner(mock_run):
from swarm_mcp.server import tournament
result = tournament(tasks='["task A", "task B"]', rounds=1)
data = json.loads(result)
assert "winner" in data
6. Update the nav (if it's a top-level page)
If your addition warrants a new documentation page, add it to mkdocs.yml:
Code Style
- Python 3.12+. Use type annotations on all public functions.
- Use
dataclassesfor structured data (seeSandboxSpec,AgentResult). - Keep
server.pythin — business logic belongs in the module it's about. - Log with
logging.getLogger(__name__). Uselogger.infofor normal operations,logger.warningfor recoverable errors,logger.exceptionfor unexpected failures. - Format with
ruff formatand lint withruff check(not yet enforced in CI, but preferred).
Submitting Changes
- Fork the repository on GitHub.
- Create a branch:
git checkout -b feature/my-combinator. - Make your changes and add tests.
- Run
uv run pytestto verify nothing is broken. - Open a pull request with a description of what the change does and why.
For large changes, open an issue first to discuss the design before writing code.
See also
- Concepts: Refs — understand the ref and monad system before modifying
monads.py - Concepts: Combinators — the full combinator reference
- Observability — how to debug agent runs during development