Architecture & components¶
Part of the self-contained SRE guide
This chapter restates the identity conventions (namespace-per-project, environments-as-labels, the
HTTPRoute name, the starform.io/* label set) inline so the guide stands alone. Owned by PRD
§20.1 / §20.2 / §24.1 — the PRD wins on conflict. The Reference collects every
inlined contract with its provenance.
In plain words
Two kinds of footprint. Customer clusters run the apps plus seven lightweight components that collect and route signal. The control-plane region (one at MVP) holds Starbase + the dashboard; a per-region telemetry tier — ClickHouse + VictoriaMetrics on DO VM droplets, plus the Vector aggregator — holds the stores. Collectors push out; nothing reaches into customer clusters to pull. Platform self-monitoring runs in Grafana Cloud (external), via a dedicated Grafana Alloy agent in each cluster.
Identity & naming. Before the pipelines make sense, three conventions decide how a metric or log line knows which tenant it belongs to: one namespace per project, environments as labels, and identity encoded in the route name and the resource labels. Taken one at a time. Until Starbase + Shuttle ship, you apply the labels and route names by hand — see chapter 9.
flowchart LR
classDef third fill:transparent,stroke:#808080,color:#808080;
classDef ok fill:#0EAB5D1f,stroke:#0EAB5D,color:#0EAB5D;
classDef warn fill:#ED8C2B1f,stroke:#ED8C2B,color:#ED8C2B;
subgraph NS["namespace: proj-<project_slug> — one per project, holds every environment"]
direction LR
A["service: web<br/>env=production<br/>is_protected"]:::ok
B["service: web<br/>env=staging"]:::third
C["service: web<br/>env=preview-pr-42<br/>is_ephemeral"]:::warn
end
style NS stroke:#5B5EE8,stroke-width:1px,stroke-dasharray:6 4
env= label differs.A project gets exactly one Kubernetes namespace, proj-<project_slug>. Its environments —
production, staging, a preview like preview-pr-42 — are not separate namespaces; they are
label values on the resources inside it. Names are customer-chosen, validated as an RFC 1123 label
(≤30 chars) with no fixed dev/staging/prod enum, and previews are flagged structurally with
is_ephemeral rather than matched by name. The consequence drives everything below: the namespace
alone can't tell production from staging, so identity has to be encoded in the route name (2b) and
the labels (2c).
flowchart LR
classDef proj fill:#3434DC22,stroke:#3434DC,color:#5B5EE8;
classDef svc fill:#ED8C2B1f,stroke:#ED8C2B,color:#ED8C2B;
classDef env fill:#0EAB5D1f,stroke:#0EAB5D,color:#0EAB5D;
P["project_uuid → 32 hex<br/><br/>chars[0:32]<br/>= project_id"]:::proj
S["service_uuid → 32 hex<br/><br/>chars[32:64]<br/>= service_id"]:::svc
E["-environment<br/><br/>after the hyphen<br/>= environment"]:::env
P --- S --- E
<project32><service32>-<environment>, parsed positionally with no lookup or join.Envoy sees a request by its route, not its tenant — so Shuttle encodes the full identity into the
HTTPRoute name: <project_uuid><service_uuid>-<environment>, each UUID hyphen-stripped to 32 hex
characters. The fixed width lets the metrics pipeline parse it positionally — chars[0:32] =
project, [32:64] = service, the segment after the hyphen = environment — with no lookup or join.
Hyphens inside the environment name are fine, since the first 64 characters are fixed-width hex. This
is the only customer identity on per-route Envoy metrics (§20.2), and it is coupled to Envoy
Gateway's cluster-name format — re-verify the parse on every EG upgrade.
flowchart LR
classDef built fill:#3434DC22,stroke:#3434DC,color:#5B5EE8;
classDef third fill:transparent,stroke:#808080,color:#808080;
L["starform.io/* · stamped on every resource<br/><br/>managed-by=shuttle · Informer filter / GC<br/>project-id · environment · service-id (the tenant key)<br/>workspace-id · billing boundary (not in the key)<br/>service-name · cluster-id · tier · var-group-id"]:::built
R["read by<br/><br/>Fluent Bit → log enrichment<br/>vmagent + KSM → metric attribution<br/>NetworkPolicy → environment isolation<br/>Stardeck → dashboard display<br/>billing cron → workspace aggregation"]:::third
L --> R
Everywhere other than the Envoy route name, identity travels as starform.io/* labels that Shuttle
stamps on every resource it creates. The tenant key is project-id + environment + service-id;
workspace-id rides along as the billing boundary, not part of the key. managed-by=shuttle is the
primary Informer filter and garbage-collection key. Each consumer reads only what it needs: Fluent
Bit enriches logs, vmagent and kube-state-metrics attribute metrics, NetworkPolicies enforce
environment isolation, Stardeck renders the dashboard, and the billing cron aggregates by workspace.
On a resource, that's a plain metadata.labels block:
# Shuttle stamps these on every resource it creates — Deployment, Service, ConfigMap, Secret…
metadata:
labels:
starform.io/managed-by: shuttle
starform.io/workspace-id: ws-acme
starform.io/project-id: 1f2e3d4c5b6a79880011223344556677
starform.io/environment: production
starform.io/service-id: 0a9b8c7d6e5f40312233445566778899
starform.io/service-name: web
starform.io/cluster-id: nyc1-prod-01
starform.io/tier: mininova
# starform.io/var-group-id: vg-… (on Secrets only)
Shuttle's role — ships no telemetry¶
Shuttle ships no telemetry, and contains zero shipping code. Its entire observability
responsibility is the identity above: stamp the starform.io/* label set
(§24) on every resource, and encode the HTTPRoute name
(§20.2). Everything else — collection,
enrichment, batching, and transport — is done autonomously by Fluent Bit (logs), the Vector
aggregator, vmagent (customer metrics), and Grafana Alloy (platform metrics). "Autonomous"
describes Shuttle's responsibility, not the difficulty of the transport layer (which earlier PRD
versions understated — see Transport).
How to build it
Provision the seven customer-cluster components at bootstrap via Helm (order in chapter 8). The control plane is a normal deployment of Starbase + Stardeck in one region; the telemetry stores are per-region DO VM droplets (ClickHouse + VictoriaMetrics, separate droplets) provisioned out-of-band via Terraform/cloud-init — not in Kubernetes. A single customer VictoriaMetrics holds the metrics; platform self-monitoring lives in Grafana Cloud (external).
Gotchas & what lives elsewhere
- metrics-server is not bundled on DOKS. Install it explicitly, or HPA and
kubectl topsilently do nothing. - The cluster runs no telemetry stores. ClickHouse, VictoriaMetrics, and the Vector aggregator are per-region VM droplets (chapter 8), not in Kubernetes — a cluster can be rebuilt without touching telemetry data.
PRD reference & inlined contracts
Owned by §2 (topology), §4.1–4.2 (component inventory), §20.1 (namespace =
proj-<project_slug>), §20.2 (HTTPRoute name + parse),
§24.1 (labels & tenant key). The namespace rule,
route-name parse, and label set are restated above so this guide stands alone — if they ever
diverge, the PRD pages win. Canonical map: Canonical Sources.