Skip to content

Bootstrap & install order

Part of the self-contained SRE guide

This chapter restates the in-cluster install order and the not-bundled-on-DOKS dependency (metrics-server + kube-state-metrics) inline so the guide stands alone. Owned by PRD §26.3 / FR-068 — the PRD wins on conflict. The Reference collects every inlined contract with its provenance.

In plain words

Order matters: the things metrics depend on must exist before the collectors, and the collectors before the agent that labels everything.

flowchart LR
  classDef built fill:#3434DC22,stroke:#3434DC,color:#5B5EE8;
  classDef third fill:transparent,stroke:#808080,color:#808080;
  classDef ok fill:#0EAB5D1f,stroke:#0EAB5D,color:#0EAB5D;
  classDef warn fill:#ED8C2B1f,stroke:#ED8C2B,color:#ED8C2B;

  S1["1 · DOKS<br/>create cluster"]:::third
  S2["2 · Envoy GW<br/>via Helm"]:::third
  S3["3 · metrics-server<br/>+ kube-state-metrics"]:::warn
  S4["4 · Fluent Bit<br/>+ vmagent + Alloy"]:::third
  S5["5 · Shuttle"]:::built
  S6["6 · register"]:::ok

  S1 --> S2 --> S3 --> S4 --> S5 --> S6
Diagram 5 — Bootstrap. Step 3 is the one teams miss: metrics-server and kube-state-metrics aren't bundled on DOKS, and the metrics pipeline (HPA + the cAdvisor join) depends on both.

How to build it

Two independent installs.

Per cluster, Starbase Worker runs the in-cluster Helm sequence in order — the relabel configs, recording rules, and store endpoints are baked into the Helm values, so Shuttle itself carries no telemetry config; it's identical per cluster, parameterized only by cluster ID, region, and token:

Helm · in-cluster install order
helm install envoy-gateway      oci://docker.io/envoyproxy/gateway-helm -n envoy-gateway-system
helm install metrics-server     metrics-server/metrics-server
helm install kube-state-metrics prometheus-community/kube-state-metrics -f ksm.values.yaml   # metricLabelsAllowlist → exposes starform.io/* on kube_pod_labels (ch.2)
helm install fluent-bit         fluent/fluent-bit          -f fluent-bit.values.yaml
helm install vmagent            vm/victoria-metrics-agent  -f vmagent.values.yaml
helm install k8s-monitoring     grafana/k8s-monitoring     -f alloy.values.yaml   # Grafana Alloy → Grafana Cloud (platform)
kubectl apply -f shuttle/       # then register cluster active

Per region (out of band), the ClickHouse + VictoriaMetrics droplets and the Vector aggregator are provisioned once via Terraform + cloud-init — independent of the cluster lifecycle, so a cluster can be rebuilt without touching telemetry data; backups are DO volume snapshots:

Terraform · regional store droplets
resource "digitalocean_droplet" "clickhouse"      { size = "so-4vcpu-32gb"  user_data = file("cloud-init/clickhouse.yml") }
resource "digitalocean_droplet" "victoriametrics" { size = "c-4"            user_data = file("cloud-init/vmstorage.yml")  }
resource "digitalocean_droplet" "vector_agg"      { size = "c-2"            user_data = file("cloud-init/vector.yml")     }
# all in digitalocean_vpc.telemetry ; backups via DO volume snapshots

PRD reference & inlined contracts

Owned by §26.3 (bootstrap sequence), FR-068 (metrics-server), §39.3 #26 (full bootstrap workflow). The install order and the DOKS-not-bundled dependency are restated above so this guide stands alone — if they ever diverge, the PRD wins. Canonical map: Canonical Sources.