Skip to content

Bring your own agent

The Celestial platform expects you to bring your own onboarding agent. ss exposes a deliberately small set of deterministic, JSON-emitting primitives; the agent’s job is to read them, dispatch the fixes ss already knows how to apply, and escalate only the things that genuinely need human judgment.

A complete reference agent is under 100 lines of either Python or TypeScript. Below: the contract, a worked example, and the runbook.

ss promises three things to your agent:

  1. Every diagnostic emits a stable shape. ss workspace doctor --json returns { workspace, env, checks_run, findings: DiagnosticFinding[] } where each finding carries id, severity, message, an optional suggestedFix (free text for humans), and — most importantly — an optional autoFixCommand (a shell string the agent can dispatch verbatim).
  2. Every state primitive is queryable. ss services list --json, ss vault list-secrets --json, ss artifact build --dry-run --json all emit machine-readable output. Your agent never has to scrape human text.
  3. Every mutation is idempotent. Re-running an autoFixCommand is a no-op when the desired state already matches. Your agent can be simple-minded about retry: it doesn’t need to track what it already did.

In return, the agent honors two rules:

  1. Dispatch every autoFixCommand verbatim. Don’t paraphrase. ss wrote it; it knows what’s safe.
  2. Escalate findings without autoFixCommand to the human. Surface the suggestedFix text + location. Don’t guess.
{
"id": "vault-required.promotable:WORKOS_API_KEY",
"severity": "warn",
"message": "WORKOS_API_KEY missing from app-secrets but workos.api_key exists as a provider-key.",
"location": { "path": "packages/dashboard-server/dashboard-server.ssmod.yaml", "line": 47 },
"suggestedFix": "Promote the provider-key into the workspace+env app-secret namespace.",
"autoFixCommand": "ss vault promote workos.api_key WORKOS_API_KEY --workspace=celestial --env=prod",
"context": { "providerKey": "workos.api_key", "appSecretName": "WORKOS_API_KEY" }
}

severity ranges over error (blocks deploy), warn (will bite later), info (FYI). The agent’s policy:

severityautoFixCommand present?agent action
anyyesrun it, log result
errornoabort + escalate
warnnowarn user + continue
infonolog + continue
#!/usr/bin/env python3
"""
ss-onboard.py — runs ss workspace doctor, applies autofixes, escalates
the rest. Suitable for `python ss-onboard.py --workspace-file=foo.ssws.yaml`.
Exit codes:
0 every check clean OR every error-level finding had an autoFixCommand
that succeeded.
1 one or more error-level findings remained unfixed.
2 ss itself failed (couldn't compose, vault unreachable, etc).
"""
import argparse, json, subprocess, sys
def run(cmd, check=False):
"""Wrapper that always returns (rc, stdout, stderr)."""
r = subprocess.run(cmd, shell=True, capture_output=True, text=True)
if check and r.returncode != 0:
sys.exit(2)
return r.returncode, r.stdout, r.stderr
def doctor(workspace_file):
cmd = f"ss workspace doctor --json --workspace-file={workspace_file}"
rc, out, _ = run(cmd, check=False) # doctor exits non-zero on errors
return json.loads(out)
def apply_fixes(findings):
fixed, escalated = [], []
for f in findings:
cmd = f.get("autoFixCommand")
if cmd:
rc, _, err = run(cmd)
(fixed if rc == 0 else escalated).append((f, err))
else:
escalated.append((f, None))
return fixed, escalated
def deploy(workspace_file, service):
rc, out, err = run(
f"ss artifact deploy {service} --env=prod --workspace-file={workspace_file}"
)
if rc != 0:
print(f"deploy failed:\n{err}", file=sys.stderr)
sys.exit(1)
return out
def main():
p = argparse.ArgumentParser()
p.add_argument("--workspace-file", required=True)
p.add_argument("--service", required=True, help="e.g. dashboard-server.web")
args = p.parse_args()
# 1. Diagnose
print(f"→ ss workspace doctor --workspace-file={args.workspace_file}")
report = doctor(args.workspace_file)
findings = report.get("findings", [])
print(f" {report['checks_run']} checks, {len(findings)} findings")
# 2. Apply
fixed, escalated = apply_fixes(findings)
for f, _ in fixed:
print(f" ✓ fixed: {f['id']}")
for f, err in escalated:
sev = f["severity"]
msg = f.get("suggestedFix") or f["message"]
sym = {"error": "", "warn": "", "info": ""}[sev]
print(f" {sym} [{sev}] {f['id']}: {msg}", file=sys.stderr)
if err:
print(f" (autofix attempted, failed: {err.strip()[:200]})", file=sys.stderr)
# 3. Hard-block on unfixed errors
unfixed_errors = [f for f, _ in escalated if f["severity"] == "error"]
if unfixed_errors:
print(f"\n{len(unfixed_errors)} error(s) need human attention.", file=sys.stderr)
sys.exit(1)
# 4. Deploy
print(f"\n→ ss artifact deploy {args.service}")
deploy(args.workspace_file, args.service)
print("✓ deployed")
# 5. Verify in mesh registry
rc, out, _ = run(f"ss services list --json --workspace-file={args.workspace_file}")
instances = json.loads(out).get("instances", []) if rc == 0 else []
print(f"\n{len(instances)} services registered:")
for i in instances:
print(f" {i['serviceId']:30} {i.get('url', '(no url)')} (v {i['version'][:8]})")
if __name__ == "__main__":
main()
#!/usr/bin/env node
import { execFileSync } from "node:child_process";
interface Finding {
id: string;
severity: "error" | "warn" | "info";
message: string;
suggestedFix?: string;
autoFixCommand?: string;
}
interface DoctorReport {
workspace: string;
env: string;
checks_run: number;
findings: Finding[];
}
function ss(args: string[]): { rc: number; stdout: string; stderr: string } {
try {
const stdout = execFileSync("ss", args, { encoding: "utf-8", stdio: ["ignore", "pipe", "pipe"] });
return { rc: 0, stdout, stderr: "" };
} catch (e) {
const err = e as { status?: number; stdout?: string; stderr?: string };
return { rc: err.status ?? 1, stdout: err.stdout ?? "", stderr: err.stderr ?? "" };
}
}
const [wsFile, service] = [process.argv[2], process.argv[3]];
if (!wsFile || !service) {
console.error("usage: ss-onboard.ts <workspace-file> <service>");
process.exit(2);
}
// 1. Doctor
const { stdout: docJson } = ss(["workspace", "doctor", "--json", `--workspace-file=${wsFile}`]);
const report = JSON.parse(docJson) as DoctorReport;
console.log(`${report.checks_run} checks, ${report.findings.length} findings`);
// 2. Apply autofixes; escalate the rest
const escalated: Finding[] = [];
for (const f of report.findings) {
if (!f.autoFixCommand) { escalated.push(f); continue; }
const argv = f.autoFixCommand.split(/\s+/).slice(1); // drop the leading "ss"
const { rc, stderr } = ss(argv);
if (rc === 0) console.log(` ✓ fixed: ${f.id}`);
else { console.error(` ✗ autofix failed for ${f.id}: ${stderr.slice(0, 200)}`); escalated.push(f); }
}
// 3. Hard-block on unfixed errors
const errs = escalated.filter(f => f.severity === "error");
if (errs.length > 0) {
for (const f of errs) console.error(` ✗ [error] ${f.id}: ${f.suggestedFix ?? f.message}`);
process.exit(1);
}
// 4. Deploy
console.log(`→ ss artifact deploy ${service}`);
ss(["artifact", "deploy", service, "--env=prod", `--workspace-file=${wsFile}`]);
console.log("✓ deployed");
// 5. Confirm via mesh
const { stdout: svcJson } = ss(["services", "list", "--json", `--workspace-file=${wsFile}`]);
const services = (JSON.parse(svcJson).instances ?? []) as Array<{ serviceId: string; url?: string; version: string }>;
console.log(`\n${services.length} services registered:`);
for (const s of services) console.log(` ${s.serviceId.padEnd(30)} ${s.url ?? "(no url)"} (v ${s.version.slice(0, 8)})`);

For a brand-new app the customer wants ss to provision:

  1. Write your ssmod.yaml declaring target: { type: web-frontend, ... } (see the web-frontend job docs for the schema).
  2. Add it to your *.ssws.yaml workspace under modules:.
  3. Set provider credentials in vault (ss vault set-key <provider> ...).
  4. Run your agent: python ss-onboard.py --workspace-file=... --service=<service-id>. The agent:
    • runs ss workspace doctor and applies every autoFixCommand (vault promotion, etc.)
    • escalates anything still broken
    • runs ss artifact deploy if everything is green
    • shows the resulting mesh registry

End-to-end, with no hand-edits.

Three categories of work still need human or LLM judgment because they involve real product decisions, not mechanical fixes:

ClassExampleWhy
Authoring the ssmod”Should this be runtime: node or static?”Depends on whether the app needs server-side rendering, SSE, websocket, etc. The agent doesn’t read your code.
Choosing third-party providers”Should I use Supabase or Neon for Postgres?”Cost / region / feature trade-offs. ss catalog surfaces options; the customer picks.
External-system config”Configure WorkOS AuthKit branding to match active theme.”Some vendors (WorkOS as of May 2026) have no Management API for the surface we want to drive. ss writes a manual_action_required row to celestial_service_state; the agent reads it and tells the user what to paste.

Everything else — package layout, transitive vendoring, DNS, Caddy, TLS, systemd, vault promotion, mesh registration — is mechanical and ss owns it.

To verify the deterministic surface on your own monorepo:

Terminal window
# 1. Doctor your workspace
ss workspace doctor --workspace-file=<your.ssws.yaml> --json | python3 -m json.tool
# 2. Auto-apply every fix the doctor knows
ss workspace doctor --workspace-file=<your.ssws.yaml> --fix
# 3. Verify the registry reflects every deployed service
ss services list --workspace-file=<your.ssws.yaml> --json

If you can run those three in a fresh checkout and have a deployed app that registers in the mesh — your customer’s agent has everything it needs. No hand-edits required.

  • Service registry / mesh awarenessss services CLI
  • ss workspace doctor source — packages/starsystem-cli/src/doctor/
  • ss vault promote — for the canonical autofix command the doctor emits
  • Theming across surfaces — for the WorkOS-style “no API, show banner” pattern