Skip to main content

Self-Hosting (Production / Cloud Mode)

This guide covers running SignalPilot yourself in cloud mode — a multi-tenant deployment with Clerk authentication and sandboxed notebook pods on Kubernetes.

If you want the local single-node Docker stack instead, see Install. If you want a fully managed gateway, see SignalPilot Cloud.

1. Overview

A production deployment has two parts:

  • Gateway — a single container (FastAPI + MCP server) that owns connections, governance, audit logs, auth, and the Kubernetes orchestrator. It runs behind a TLS reverse proxy.
  • Notebook pods — short-lived, per-session pods scheduled by the gateway onto a Kubernetes cluster. Each runs user/agent-supplied code, so each is wrapped in a kernel-level sandbox (gVisor runsc) and isolated by per-org namespace plus NetworkPolicy.

Cloud mode vs local mode is selected by SP_DEPLOYMENT_MODE:

localcloud
Authlocal API keyClerk JWT, multi-tenant
Notebooksone direct container (SP_NOTEBOOK_DIRECT_URL)sandboxed pods on Kubernetes
Hardeningrelaxedassert_cloud_hardening_intact() enforced at boot
Image refsany tagdigest-pinned (@sha256:…)

In cloud mode the gateway runs assert_cloud_hardening_intact() during startup and refuses to boot if any security kill-switch is disabled. The rules are reproduced verbatim in section 3.

2. Prerequisites

  • Docker and a container registry you control (your-registry) for the gateway and notebook images.
  • A PostgreSQL database for gateway persistence (connections, audit logs, etc.).
  • A Kubernetes cluster for notebook pods, with:
    • a NetworkPolicy-enforcing CNI (Calico, Cilium, AWS VPC CNI policy add-on, or GKE Dataplane V2). Without one, cross-org isolation policies have no effect.
    • a sandbox RuntimeClass installed on every node (gVisor runsc, or Kata).
    • IMDSv2 hop-limit=1 on node launch templates.
    • a podPidsLimit (e.g. 4096) on every node (fork-bomb containment).
    • the gateway RBAC and admission policies applied.
  • A TLS reverse proxy (Caddy, nginx, or an ingress) terminating HTTPS in front of the gateway.

The Kubernetes setup is the single source of truth in deploy/k8s/README.md — follow it for CNI, RBAC, RuntimeClass, IMDS, PID-limit, and admission-policy details. This page only summarizes the gateway-facing configuration.

3. Required environment variables

In cloud mode the gateway validates the following at startup. Anything that violates a kill-switch causes a RuntimeError: Cloud mode hardening violations: […]. Refusing to boot. (the error lists variable names only, never values).

VariableRequired valueWhy
SP_DEPLOYMENT_MODEcloudEnables cloud-mode paths and all checks below.
SP_NOTEBOOK_RUNTIME_CLASSnon-empty, e.g. gvisorSandbox RuntimeClass for notebook pods. Empty string is forbidden in cloud mode.
SP_NOTEBOOK_IMAGEyour-registry/notebook@sha256:<64-hex>Notebook pod image. Must be a digest reference in cloud mode; floating tags (:latest) are rejected. See below.
SP_ALLOWED_ORIGINScomma-separated, all https:// (or http://localhost / http://127.0.0.1)CORS allow-list. Must be set; no wildcards (*); no plain non-loopback http://.
SP_NOTEBOOK_NETWORK_POLICYtrue (default)Setting false logs a loud cloud warning but does not block boot — needed where gVisor and the AWS VPC CNI NetworkPolicy agent don't compose. IMDS credential theft stays blocked via node IMDS hop-limit + the always-on block-imds-egress policy.
SP_NOTEBOOK_DIRECT_URLmust be empty/unsetAny non-empty value is forbidden in cloud mode (direct-notebook mode is local-only).
SP_DISABLE_SANDBOXmust not be true/1/yesDisabling the sandbox is forbidden.
CLERK_JWT_AUDIENCEoptional (leave unset)See section 4. Optional request-time aud check; the gateway boots without it.
SP_ENCRYPTION_KEYFernet key (base64)Encrypts stored connection credentials. Generate once; rotating it invalidates stored secrets.
DATABASE_URLpostgresql+asyncpg://user:pass@host:5432/dbGateway persistence.
SP_PUBLIC_GATEWAY_URLhttps://gateway.example.comGateway URL injected into pods. The local default http://gateway:3300 is rejected in cloud mode.
SP_NOTEBOOK_UPSTREAM_MODEpod_ipRequired for the Kubernetes orchestrator.

See deploy/k8s/README.md section (d) for the full set of orchestrator-only variables (SP_PUBLIC_GATEWAY_PORT, SP_GATEWAY_NAMESPACE, SP_GATEWAY_POD_SELECTOR, SP_GATEWAY_SERVICE_ACCOUNT, SP_NOTEBOOK_NAMESPACE_PREFIX, SP_NOTEBOOK_EGRESS_CIDR, SP_SESSION_JWT_TTL_SECONDS).

Image digest requirement

In cloud mode SP_NOTEBOOK_IMAGE must match @sha256:<64 lowercase hex>. This guarantees pod image immutability — a floating tag could be re-pushed under you. Resolve the digest after pushing:

resolve image digest
$ crane digest your-registry/notebook:VERSION
# or
$ docker buildx imagetools inspect your-registry/notebook:VERSION

Then set the pinned reference:

pinned notebook image
SP_NOTEBOOK_IMAGE=your-registry/notebook@sha256:<64-hex>

Generate the Fernet encryption key

generate fernet key
$ python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Store the output as SP_ENCRYPTION_KEY. Treat it as a long-lived secret — losing it makes stored connection credentials undecryptable.

4. Clerk authentication

Cloud mode authenticates users with Clerk JWTs. From the Clerk dashboard (your application → API Keys), collect:

VariableSource
CLERK_PUBLISHABLE_KEYClerk dashboard → API Keys (publishable key).
CLERK_SECRET_KEYClerk dashboard → API Keys (secret key).
CLERK_JWT_AUDIENCEOptional — only set it if your tokens carry an aud (see note).
SP_EXPECTED_AZPYour app origin(s), e.g. https://app.yourdomain.com. Recommended.

What the gateway enforces (current code):

  • CLERK_JWT_AUDIENCE is optional. Clerk's default session tokens do not carry an aud claim, so the gateway boots without this set and skips audience verification. Leave it unset unless you have configured a Clerk JWT template (or session-token claims) that emits an aud — only then set it, and gateway/auth/user.py will validate aud against it at request time.
  • SP_EXPECTED_AZP is the authorized-party check and the recommended binding for Clerk session tokens (which carry azp, not aud). When set, the gateway rejects any token whose azp claim is not in this comma-separated allow-list. Set it to your front-end origin(s) (e.g. https://app.yourdomain.com) to bind tokens to your app.
  • SP_JWT_LEEWAY (default 30) is the allowed clock-skew in seconds for exp/iat.

5. Deploy the gateway

Build and push the gateway image, then run it with the cloud-mode environment. The gateway listens on port 3300; terminate TLS in front of it.

build and push the gateway image
$ docker build -f Dockerfile.gateway -t your-registry/gateway:VERSION signalpilot/gateway/
$ docker push your-registry/gateway:VERSION
run the gateway (cloud mode)
$ docker run -d --name signalpilot-gateway -p 127.0.0.1:3300:3300 \
-e SP_DEPLOYMENT_MODE=cloud \
-e DATABASE_URL='postgresql+asyncpg://user:pass@db.example.com:5432/signalpilot' \
-e SP_ENCRYPTION_KEY='<fernet-key>' \
-e SP_ALLOWED_ORIGINS='https://app.yourdomain.com' \
-e SP_PUBLIC_GATEWAY_URL='https://gateway.example.com' \
-e SP_NOTEBOOK_UPSTREAM_MODE=pod_ip \
-e SP_NOTEBOOK_RUNTIME_CLASS=gvisor \
-e SP_NOTEBOOK_IMAGE='your-registry/notebook@sha256:<64-hex>' \
-e SP_NOTEBOOK_NETWORK_POLICY=true \
-e CLERK_PUBLISHABLE_KEY='<pk>' \
-e CLERK_SECRET_KEY='<sk>' \
-e SP_EXPECTED_AZP='https://app.yourdomain.com' \
your-registry/gateway:VERSION

Bind the container port to loopback (127.0.0.1:3300) and put HTTPS in front of it. Example Caddy config:

caddy reverse proxy (tls)
gateway.example.com {
reverse_proxy 127.0.0.1:3300
}

SP_ALLOWED_ORIGINS and SP_PUBLIC_GATEWAY_URL must use the public HTTPS origin, not the internal container address. The gateway must run as the signalpilot-gateway ServiceAccount (see RBAC in section 6) so it can manage org namespaces and notebook pods.

6. Deploy notebook orchestration

The notebook side runs on Kubernetes. Follow deploy/k8s/README.md in order — the steps below are the gateway-relevant summary.

  1. Install a NetworkPolicy-enforcing CNI (section a). Verify enforcement with the cross-namespace nc test in section (f).
  2. Apply the gateway RBAC so it can create per-org namespaces and pods:
apply gateway rbac
$ kubectl create namespace signalpilot --dry-run=client -o yaml | kubectl apply -f -
$ kubectl apply -f deploy/k8s/gateway-rbac.yaml
  1. Install the sandbox RuntimeClass on every node and apply the resource. gVisor is the recommended runtime; the gateway will not start in cloud mode without SP_NOTEBOOK_RUNTIME_CLASS set, and the require-gvisor admission policy rejects any notebook pod that is not sandboxed:
gvisor runtimeclass
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
  1. Build, push, and digest-pin the notebook image, then set SP_NOTEBOOK_IMAGE to the @sha256:… reference (see section 3). Floating tags are rejected at boot.
  2. Apply the admission policies (defense-in-depth). Use the ValidatingAdmissionPolicy variants on vanilla k8s 1.30+, or the Kyverno variants:
apply admission policies
$ kubectl apply -f deploy/k8s/admission/require-gvisor-validatingadmissionpolicy.yaml
$ kubectl apply -f deploy/k8s/admission/restrict-pod-exec-validatingadmissionpolicy.yaml
  1. Enforce node-level hardening: IMDSv2 HttpTokens=required with HttpPutResponseHopLimit=1, and podPidsLimit on every node. See deploy/k8s/README.md sections (e) and (h).

Workspaces are git-backed; pod-local state is ephemeral, so no PersistentVolume or shared filesystem is required.

7. Secrets management

  • Never bake secrets into images. SP_ENCRYPTION_KEY, CLERK_SECRET_KEY, DATABASE_URL, and stored connection credentials must be injected at runtime.
  • Use a secret store — Kubernetes Secrets (with encryption at rest / a KMS provider), AWS SSM Parameter Store, HashiCorp Vault, or your platform's equivalent — and mount them as environment variables.
  • The gateway never logs secret values; hardening violations report variable names only. Keep that property: do not echo env into logs.
  • Rotating SP_ENCRYPTION_KEY invalidates all stored connection credentials. Plan rotation as a re-entry of connections, not a hot swap.

8. Verification

  1. Health check — confirm the gateway is up behind TLS:
health check
$ curl -sf https://gateway.example.com/health
  1. Create a connection — authenticate with a Clerk-issued token and add a database (see Connect a Database for per-dialect payloads):
create a connection
$ curl -X POST https://gateway.example.com/api/connections \
-H "Authorization: Bearer <clerk-jwt>" \
-H "Content-Type: application/json" \
-d '{
$ "name": "prod-postgres",
$ "db_type": "postgresql",
$ "host": "db.example.com",
$ "port": 5432,
$ "database": "analytics",
$ "username": "readonly",
$ "password": "..."
$ }'
  1. Run a query — point an MCP client at the gateway and call query_database, or verify reachability of all connections with connection_health:
connect an mcp client
$ claude mcp add --transport http signalpilot https://gateway.example.com/mcp \
--header "Authorization: Bearer <clerk-jwt>"

If the gateway fails to start, check the logs for Cloud mode hardening violations: […] — the listed variable names map directly to the rules in section 3.