Bootstrap & install order¶
Part of the self-contained SRE guide
This chapter restates the in-cluster install order and the not-bundled-on-DOKS dependency (metrics-server + kube-state-metrics) inline so the guide stands alone. Owned by PRD §26.3 / FR-068 — the PRD wins on conflict. The Reference collects every inlined contract with its provenance.
In plain words
Order matters: the things metrics depend on must exist before the collectors, and the collectors before the agent that labels everything.
flowchart LR
classDef built fill:#3434DC22,stroke:#3434DC,color:#5B5EE8;
classDef third fill:transparent,stroke:#808080,color:#808080;
classDef ok fill:#0EAB5D1f,stroke:#0EAB5D,color:#0EAB5D;
classDef warn fill:#ED8C2B1f,stroke:#ED8C2B,color:#ED8C2B;
S1["1 · DOKS<br/>create cluster"]:::third
S2["2 · Envoy GW<br/>via Helm"]:::third
S3["3 · metrics-server<br/>+ kube-state-metrics"]:::warn
S4["4 · Fluent Bit<br/>+ vmagent + Alloy"]:::third
S5["5 · Shuttle"]:::built
S6["6 · register"]:::ok
S1 --> S2 --> S3 --> S4 --> S5 --> S6
How to build it
Two independent installs.
Per cluster, Starbase Worker runs the in-cluster Helm sequence in order — the relabel configs, recording rules, and store endpoints are baked into the Helm values, so Shuttle itself carries no telemetry config; it's identical per cluster, parameterized only by cluster ID, region, and token:
helm install envoy-gateway oci://docker.io/envoyproxy/gateway-helm -n envoy-gateway-system
helm install metrics-server metrics-server/metrics-server
helm install kube-state-metrics prometheus-community/kube-state-metrics -f ksm.values.yaml # metricLabelsAllowlist → exposes starform.io/* on kube_pod_labels (ch.2)
helm install fluent-bit fluent/fluent-bit -f fluent-bit.values.yaml
helm install vmagent vm/victoria-metrics-agent -f vmagent.values.yaml
helm install k8s-monitoring grafana/k8s-monitoring -f alloy.values.yaml # Grafana Alloy → Grafana Cloud (platform)
kubectl apply -f shuttle/ # then register cluster active
Per region (out of band), the ClickHouse + VictoriaMetrics droplets and the Vector aggregator are provisioned once via Terraform + cloud-init — independent of the cluster lifecycle, so a cluster can be rebuilt without touching telemetry data; backups are DO volume snapshots:
resource "digitalocean_droplet" "clickhouse" { size = "so-4vcpu-32gb" user_data = file("cloud-init/clickhouse.yml") }
resource "digitalocean_droplet" "victoriametrics" { size = "c-4" user_data = file("cloud-init/vmstorage.yml") }
resource "digitalocean_droplet" "vector_agg" { size = "c-2" user_data = file("cloud-init/vector.yml") }
# all in digitalocean_vpc.telemetry ; backups via DO volume snapshots
PRD reference & inlined contracts
Owned by §26.3 (bootstrap sequence), FR-068 (metrics-server), §39.3 #26 (full bootstrap workflow). The install order and the DOKS-not-bundled dependency are restated above so this guide stands alone — if they ever diverge, the PRD wins. Canonical map: Canonical Sources.