Tugboat

One decision, four axes.

Every turn goes through the same deterministic function: route(turn, policy) → decision. The decision spans the four things every agent actually has to answer.

Memory

Which slices to load, with a token budget. Identity is always in.

Subagent

Whether to delegate, to which scoped spec, and how to merge its side-effects.

Skill

Which tools are in scope, which are blocked. External-action gates live here.

Model

Which engine and model id to call. Local by default, cloud on long context.

The policy file IS the router.

Plain markdown. No YAML, no DSL, no code. Rules fire in file order. Later rules win. Two invariants the policy cannot override: external actions always confirm, and identity memory is always loaded.

ben_policy.md

# Defaults
- model.engine: ollama
- model.model: qwen2.5:14b-instruct
- memory.budget_tokens: 2000
- memory.slices: [identity, recent_daily]

## Rule: local for drafting
when: task_class == drafting
then:
  - model.engine = ollama
  - model.model = qwen2.5:14b-instruct

## Rule: cloud fallback for long context
when: context_tokens > 8000
then:
  - model.engine = cloud
  - model.model = claude-opus-4-6
  - memory.budget_tokens = 6000

## Rule: delegate research
when: task_class == research
then:
  - subagent.delegate = true
  - subagent.spec_name = researcher
  - subagent.merge_strategy = merge_memory

basic_usage.py

from tugboat import Tugboat, Turn
from tugboat.channels import (
    InMemoryMemoryChannel,
    DictSkillChannel,
    MultiEngineModelChannel,
)
from tugboat.engines import MockEngine
from tugboat.policy  import load_policy

policy = load_policy("examples/ben_policy.md")
tug    = Tugboat(
    policy=policy,
    memory=InMemoryMemoryChannel(...),
    skills=DictSkillChannel({"formatter": fmt}),
    models=MultiEngineModelChannel({
        "ollama": MockEngine(),
        "cloud":  MockEngine(),
    }),
)

# Plugin mode: decision only, you run it
decision = tug.route(Turn(
    text="draft a tweet",
    task_class="drafting",
))

# Driver mode: route + execute end-to-end
result = tug.execute(Turn(
    text="research the harness thesis",
    task_class="research",
))

Where Tugboat sits.

Three-layer stack. Tugboat is the middle layer — the one most production agents have, but rarely name. Making it explicit is what lets you audit, diff, and eval it.

Knowledge Layer

what the agent knows

identity · memory · skills · world-state · RAG · graphs

↓

Routing Layer · Tugboat

what it decides to do this turn

memory · subagent · skill · model

↓

Harness / Execution Layer

how it actually runs

prompt assembly · tool calls · ReAct loop · retries

↓

Model

the LLM itself

fixed across evals; the point is you change everything above it

Why Tugboat and not an existing framework.

Tiny.

~1,200 lines of stdlib Python. Zero dependencies. You can read the whole thing in an hour.

Deterministic.

route() is pure. Same (turn, policy) → same decision. That's what makes evals actually measure something.

Plugin-shaped.

Drop it into any harness as a decision function and ignore execute() entirely. It wraps; it does not replace.

Policy as data.

The router is a markdown file, not a code path. Diff it. Version it. Hand it to a non-engineer.

Subagent-first.

The one thing every harness post-mortem cites as the lever is subagent orchestration. Tugboat makes it the point.

OODA built in.

Log every turn. Mine patterns offline. Emit proposed policy edits as markdown diffs for a human to accept.

Install in one line.

No pip install. No virtualenv. Python 3.10+ and a terminal.

terminal

$ git clone https://github.com/atxgreene/tugboat
$ cd tugboat
$ python -m unittest discover tests    # 34 tests, < 30ms
$ python examples/basic_usage.py        # end-to-end demo

View on GitHub → Read the architecture doc

One decision, four axes.

Memory

Subagent

Skill

Model

See it route.

The policy file IS the router.

Where Tugboat sits.

what the agent knows

what it decides to do this turn

how it actually runs

the LLM itself

Why Tugboat and not an existing framework.

Tiny.

Deterministic.

Plugin-shaped.

Policy as data.

Subagent-first.

OODA built in.

Install in one line.