Skip to content

Architecture & components

Part of the self-contained SRE guide

This chapter restates the identity conventions (namespace-per-project, environments-as-labels, the HTTPRoute name, the starform.io/* label set) inline so the guide stands alone. Owned by PRD §20.1 / §20.2 / §24.1 — the PRD wins on conflict. The Reference collects every inlined contract with its provenance.

In plain words

Two kinds of footprint. Customer clusters run the apps plus seven lightweight components that collect and route signal. The control-plane region (one at MVP) holds Starbase + the dashboard; a per-region telemetry tier — ClickHouse + VictoriaMetrics on DO VM droplets, plus the Vector aggregator — holds the stores. Collectors push out; nothing reaches into customer clusters to pull. Platform self-monitoring runs in Grafana Cloud (external), via a dedicated Grafana Alloy agent in each cluster.

Identity & naming. Before the pipelines make sense, three conventions decide how a metric or log line knows which tenant it belongs to: one namespace per project, environments as labels, and identity encoded in the route name and the resource labels. Taken one at a time. Until Starbase + Shuttle ship, you apply the labels and route names by hand — see chapter 9.

flowchart LR
  classDef third fill:transparent,stroke:#808080,color:#808080;
  classDef ok fill:#0EAB5D1f,stroke:#0EAB5D,color:#0EAB5D;
  classDef warn fill:#ED8C2B1f,stroke:#ED8C2B,color:#ED8C2B;

  subgraph NS["namespace: proj-<project_slug> — one per project, holds every environment"]
    direction LR
    A["service: web<br/>env=production<br/>is_protected"]:::ok
    B["service: web<br/>env=staging"]:::third
    C["service: web<br/>env=preview-pr-42<br/>is_ephemeral"]:::warn
  end

  style NS stroke:#5B5EE8,stroke-width:1px,stroke-dasharray:6 4
Diagram 2a — Namespaces & environments. One namespace per project; the environments inside it are label values, not separate namespaces. Same service, one namespace — only the env= label differs.

A project gets exactly one Kubernetes namespace, proj-<project_slug>. Its environments — production, staging, a preview like preview-pr-42 — are not separate namespaces; they are label values on the resources inside it. Names are customer-chosen, validated as an RFC 1123 label (≤30 chars) with no fixed dev/staging/prod enum, and previews are flagged structurally with is_ephemeral rather than matched by name. The consequence drives everything below: the namespace alone can't tell production from staging, so identity has to be encoded in the route name (2b) and the labels (2c).

flowchart LR
  classDef proj fill:#3434DC22,stroke:#3434DC,color:#5B5EE8;
  classDef svc fill:#ED8C2B1f,stroke:#ED8C2B,color:#ED8C2B;
  classDef env fill:#0EAB5D1f,stroke:#0EAB5D,color:#0EAB5D;

  P["project_uuid → 32 hex<br/><br/>chars[0:32]<br/>= project_id"]:::proj
  S["service_uuid → 32 hex<br/><br/>chars[32:64]<br/>= service_id"]:::svc
  E["-environment<br/><br/>after the hyphen<br/>= environment"]:::env

  P --- S --- E
Diagram 2b — The HTTPRoute name. Why it exists: it is the only place a customer's identity rides on an Envoy L7 metric. The name is <project32><service32>-<environment>, parsed positionally with no lookup or join.

Envoy sees a request by its route, not its tenant — so Shuttle encodes the full identity into the HTTPRoute name: <project_uuid><service_uuid>-<environment>, each UUID hyphen-stripped to 32 hex characters. The fixed width lets the metrics pipeline parse it positionallychars[0:32] = project, [32:64] = service, the segment after the hyphen = environment — with no lookup or join. Hyphens inside the environment name are fine, since the first 64 characters are fixed-width hex. This is the only customer identity on per-route Envoy metrics (§20.2), and it is coupled to Envoy Gateway's cluster-name format — re-verify the parse on every EG upgrade.

flowchart LR
  classDef built fill:#3434DC22,stroke:#3434DC,color:#5B5EE8;
  classDef third fill:transparent,stroke:#808080,color:#808080;

  L["starform.io/* · stamped on every resource<br/><br/>managed-by=shuttle · Informer filter / GC<br/>project-id · environment · service-id  (the tenant key)<br/>workspace-id · billing boundary (not in the key)<br/>service-name · cluster-id · tier · var-group-id"]:::built
  R["read by<br/><br/>Fluent Bit → log enrichment<br/>vmagent + KSM → metric attribution<br/>NetworkPolicy → environment isolation<br/>Stardeck → dashboard display<br/>billing cron → workspace aggregation"]:::third

  L --> R
Diagram 2c — The labels, and who reads them. The same identity, carried as labels everywhere that isn't an Envoy metric.

Everywhere other than the Envoy route name, identity travels as starform.io/* labels that Shuttle stamps on every resource it creates. The tenant key is project-id + environment + service-id; workspace-id rides along as the billing boundary, not part of the key. managed-by=shuttle is the primary Informer filter and garbage-collection key. Each consumer reads only what it needs: Fluent Bit enriches logs, vmagent and kube-state-metrics attribute metrics, NetworkPolicies enforce environment isolation, Stardeck renders the dashboard, and the billing cron aggregates by workspace. On a resource, that's a plain metadata.labels block:

metadata.labels · any resource
# Shuttle stamps these on every resource it creates — Deployment, Service, ConfigMap, Secret…
metadata:
  labels:
    starform.io/managed-by:   shuttle
    starform.io/workspace-id: ws-acme
    starform.io/project-id:   1f2e3d4c5b6a79880011223344556677
    starform.io/environment:  production
    starform.io/service-id:   0a9b8c7d6e5f40312233445566778899
    starform.io/service-name: web
    starform.io/cluster-id:   nyc1-prod-01
    starform.io/tier:         mininova
    # starform.io/var-group-id: vg-…  (on Secrets only)

Shuttle's role — ships no telemetry

Shuttle ships no telemetry, and contains zero shipping code. Its entire observability responsibility is the identity above: stamp the starform.io/* label set (§24) on every resource, and encode the HTTPRoute name (§20.2). Everything else — collection, enrichment, batching, and transport — is done autonomously by Fluent Bit (logs), the Vector aggregator, vmagent (customer metrics), and Grafana Alloy (platform metrics). "Autonomous" describes Shuttle's responsibility, not the difficulty of the transport layer (which earlier PRD versions understated — see Transport).

How to build it

Provision the seven customer-cluster components at bootstrap via Helm (order in chapter 8). The control plane is a normal deployment of Starbase + Stardeck in one region; the telemetry stores are per-region DO VM droplets (ClickHouse + VictoriaMetrics, separate droplets) provisioned out-of-band via Terraform/cloud-init — not in Kubernetes. A single customer VictoriaMetrics holds the metrics; platform self-monitoring lives in Grafana Cloud (external).

Gotchas & what lives elsewhere

  • metrics-server is not bundled on DOKS. Install it explicitly, or HPA and kubectl top silently do nothing.
  • The cluster runs no telemetry stores. ClickHouse, VictoriaMetrics, and the Vector aggregator are per-region VM droplets (chapter 8), not in Kubernetes — a cluster can be rebuilt without touching telemetry data.

PRD reference & inlined contracts

Owned by §2 (topology), §4.1–4.2 (component inventory), §20.1 (namespace = proj-<project_slug>), §20.2 (HTTPRoute name + parse), §24.1 (labels & tenant key). The namespace rule, route-name parse, and label set are restated above so this guide stands alone — if they ever diverge, the PRD pages win. Canonical map: Canonical Sources.