Bring your own agent

The Celestial platform expects you to bring your own onboarding agent. ss exposes a deliberately small set of deterministic, JSON-emitting primitives; the agent’s job is to read them, dispatch the fixes ss already knows how to apply, and escalate only the things that genuinely need human judgment.

A complete reference agent is under 100 lines of either Python or TypeScript. Below: the contract, a worked example, and the runbook.

The contract

ss promises three things to your agent:

Every diagnostic emits a stable shape. ss workspace doctor --json returns { workspace, env, checks_run, findings: DiagnosticFinding[] } where each finding carries id, severity, message, an optional suggestedFix (free text for humans), and — most importantly — an optional autoFixCommand (a shell string the agent can dispatch verbatim).
Every state primitive is queryable. ss services list --json, ss vault list-secrets --json, ss artifact build --dry-run --json all emit machine-readable output. Your agent never has to scrape human text.
Every mutation is idempotent. Re-running an autoFixCommand is a no-op when the desired state already matches. Your agent can be simple-minded about retry: it doesn’t need to track what it already did.

In return, the agent honors two rules:

Dispatch every autoFixCommand verbatim. Don’t paraphrase. ss wrote it; it knows what’s safe.
Escalate findings without autoFixCommand to the human. Surface the suggestedFix text + location. Don’t guess.

Reference: doctor finding shape

{
  "id":         "vault-required.promotable:WORKOS_API_KEY",
  "severity":   "warn",
  "message":    "WORKOS_API_KEY missing from app-secrets but workos.api_key exists as a provider-key.",
  "location":   { "path": "packages/dashboard-server/dashboard-server.ssmod.yaml", "line": 47 },
  "suggestedFix":   "Promote the provider-key into the workspace+env app-secret namespace.",
  "autoFixCommand": "ss vault promote workos.api_key WORKOS_API_KEY --workspace=celestial --env=prod",
  "context":    { "providerKey": "workos.api_key", "appSecretName": "WORKOS_API_KEY" }
}

severity ranges over error (blocks deploy), warn (will bite later), info (FYI). The agent’s policy:

severity	autoFixCommand present?	agent action
any	yes	run it, log result
`error`	no	abort + escalate
`warn`	no	warn user + continue
`info`	no	log + continue

Python reference agent (74 lines)

#!/usr/bin/env python3
"""
ss-onboard.py — runs ss workspace doctor, applies autofixes, escalates
the rest. Suitable for `python ss-onboard.py --workspace-file=foo.ssws.yaml`.

Exit codes:
  0  every check clean OR every error-level finding had an autoFixCommand
     that succeeded.
  1  one or more error-level findings remained unfixed.
  2  ss itself failed (couldn't compose, vault unreachable, etc).
"""
import argparse, json, subprocess, sys

def run(cmd, check=False):
    """Wrapper that always returns (rc, stdout, stderr)."""
    r = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    if check and r.returncode != 0:
        sys.exit(2)
    return r.returncode, r.stdout, r.stderr

def doctor(workspace_file):
    cmd = f"ss workspace doctor --json --workspace-file={workspace_file}"
    rc, out, _ = run(cmd, check=False)   # doctor exits non-zero on errors
    return json.loads(out)

def apply_fixes(findings):
    fixed, escalated = [], []
    for f in findings:
        cmd = f.get("autoFixCommand")
        if cmd:
            rc, _, err = run(cmd)
            (fixed if rc == 0 else escalated).append((f, err))
        else:
            escalated.append((f, None))
    return fixed, escalated

def deploy(workspace_file, service):
    rc, out, err = run(
        f"ss artifact deploy {service} --env=prod --workspace-file={workspace_file}"
    )
    if rc != 0:
        print(f"deploy failed:\n{err}", file=sys.stderr)
        sys.exit(1)
    return out

def main():
    p = argparse.ArgumentParser()
    p.add_argument("--workspace-file", required=True)
    p.add_argument("--service",        required=True, help="e.g. dashboard-server.web")
    args = p.parse_args()

    # 1. Diagnose
    print(f"→ ss workspace doctor --workspace-file={args.workspace_file}")
    report = doctor(args.workspace_file)
    findings = report.get("findings", [])
    print(f"  {report['checks_run']} checks, {len(findings)} findings")

    # 2. Apply
    fixed, escalated = apply_fixes(findings)
    for f, _ in fixed:
        print(f"  ✓ fixed: {f['id']}")
    for f, err in escalated:
        sev = f["severity"]
        msg = f.get("suggestedFix") or f["message"]
        sym = {"error": "✗", "warn": "⚠", "info": "ℹ"}[sev]
        print(f"  {sym} [{sev}] {f['id']}: {msg}", file=sys.stderr)
        if err:
            print(f"    (autofix attempted, failed: {err.strip()[:200]})", file=sys.stderr)

    # 3. Hard-block on unfixed errors
    unfixed_errors = [f for f, _ in escalated if f["severity"] == "error"]
    if unfixed_errors:
        print(f"\n{len(unfixed_errors)} error(s) need human attention.", file=sys.stderr)
        sys.exit(1)

    # 4. Deploy
    print(f"\n→ ss artifact deploy {args.service}")
    deploy(args.workspace_file, args.service)
    print("✓ deployed")

    # 5. Verify in mesh registry
    rc, out, _ = run(f"ss services list --json --workspace-file={args.workspace_file}")
    instances = json.loads(out).get("instances", []) if rc == 0 else []
    print(f"\n{len(instances)} services registered:")
    for i in instances:
        print(f"  {i['serviceId']:30} {i.get('url', '(no url)')} (v {i['version'][:8]})")

if __name__ == "__main__":
    main()

TypeScript reference agent

#!/usr/bin/env node
import { execFileSync } from "node:child_process";

interface Finding {
  id:              string;
  severity:        "error" | "warn" | "info";
  message:         string;
  suggestedFix?:   string;
  autoFixCommand?: string;
}
interface DoctorReport {
  workspace:    string;
  env:          string;
  checks_run:   number;
  findings:     Finding[];
}

function ss(args: string[]): { rc: number; stdout: string; stderr: string } {
  try {
    const stdout = execFileSync("ss", args, { encoding: "utf-8", stdio: ["ignore", "pipe", "pipe"] });
    return { rc: 0, stdout, stderr: "" };
  } catch (e) {
    const err = e as { status?: number; stdout?: string; stderr?: string };
    return { rc: err.status ?? 1, stdout: err.stdout ?? "", stderr: err.stderr ?? "" };
  }
}

const [wsFile, service] = [process.argv[2], process.argv[3]];
if (!wsFile || !service) {
  console.error("usage: ss-onboard.ts <workspace-file> <service>");
  process.exit(2);
}

// 1. Doctor
const { stdout: docJson } = ss(["workspace", "doctor", "--json", `--workspace-file=${wsFile}`]);
const report = JSON.parse(docJson) as DoctorReport;
console.log(`→ ${report.checks_run} checks, ${report.findings.length} findings`);

// 2. Apply autofixes; escalate the rest
const escalated: Finding[] = [];
for (const f of report.findings) {
  if (!f.autoFixCommand) { escalated.push(f); continue; }
  const argv = f.autoFixCommand.split(/\s+/).slice(1);   // drop the leading "ss"
  const { rc, stderr } = ss(argv);
  if (rc === 0) console.log(`  ✓ fixed: ${f.id}`);
  else { console.error(`  ✗ autofix failed for ${f.id}: ${stderr.slice(0, 200)}`); escalated.push(f); }
}

// 3. Hard-block on unfixed errors
const errs = escalated.filter(f => f.severity === "error");
if (errs.length > 0) {
  for (const f of errs) console.error(`  ✗ [error] ${f.id}: ${f.suggestedFix ?? f.message}`);
  process.exit(1);
}

// 4. Deploy
console.log(`→ ss artifact deploy ${service}`);
ss(["artifact", "deploy", service, "--env=prod", `--workspace-file=${wsFile}`]);
console.log("✓ deployed");

// 5. Confirm via mesh
const { stdout: svcJson } = ss(["services", "list", "--json", `--workspace-file=${wsFile}`]);
const services = (JSON.parse(svcJson).instances ?? []) as Array<{ serviceId: string; url?: string; version: string }>;
console.log(`\n${services.length} services registered:`);
for (const s of services) console.log(`  ${s.serviceId.padEnd(30)} ${s.url ?? "(no url)"} (v ${s.version.slice(0, 8)})`);

The runbook

For a brand-new app the customer wants ss to provision:

Write your ssmod.yaml declaring target: { type: web-frontend, ... } (see the web-frontend job docs for the schema).
Add it to your *.ssws.yaml workspace under modules:.
Set provider credentials in vault (ss vault set-key <provider> ...).
Run your agent: python ss-onboard.py --workspace-file=... --service=<service-id>. The agent:
- runs ss workspace doctor and applies every autoFixCommand (vault promotion, etc.)
- escalates anything still broken
- runs ss artifact deploy if everything is green
- shows the resulting mesh registry

End-to-end, with no hand-edits.

What the agent can’t do (yet)

Three categories of work still need human or LLM judgment because they involve real product decisions, not mechanical fixes:

Class	Example	Why
Authoring the ssmod	”Should this be `runtime: node` or `static`?”	Depends on whether the app needs server-side rendering, SSE, websocket, etc. The agent doesn’t read your code.
Choosing third-party providers	”Should I use Supabase or Neon for Postgres?”	Cost / region / feature trade-offs. `ss catalog` surfaces options; the customer picks.
External-system config	”Configure WorkOS AuthKit branding to match active theme.”	Some vendors (WorkOS as of May 2026) have no Management API for the surface we want to drive. ss writes a `manual_action_required` row to `celestial_service_state`; the agent reads it and tells the user what to paste.

Everything else — package layout, transitive vendoring, DNS, Caddy, TLS, systemd, vault promotion, mesh registration — is mechanical and ss owns it.

Smoke test (run before shipping)

To verify the deterministic surface on your own monorepo:

# 1. Doctor your workspace
ss workspace doctor --workspace-file=<your.ssws.yaml> --json | python3 -m json.tool

# 2. Auto-apply every fix the doctor knows
ss workspace doctor --workspace-file=<your.ssws.yaml> --fix

# 3. Verify the registry reflects every deployed service
ss services list --workspace-file=<your.ssws.yaml> --json

If you can run those three in a fresh checkout and have a deployed app that registers in the mesh — your customer’s agent has everything it needs. No hand-edits required.