Logs pipeline¶

Part of the self-contained SRE guide

This chapter restates the starform.io/* label set (the enrichment keys) and the per-cluster auth model inline so the guide stands alone. Owned by PRD §35.1 / §35.4 / §24.1 — the PRD wins on conflict. The Reference collects every inlined contract with its provenance.

In plain words

Completely different mechanism from metrics — no scraping, no discovery. Kubernetes already writes every container's output to a file on the machine. An agent on each machine tails those files, stamps each line with the pod's identity, and forwards everything to one collector that is the only thing allowed to write to ClickHouse.

flowchart LR
  classDef built fill:#3434DC22,stroke:#3434DC,color:#5B5EE8;
  classDef third fill:transparent,stroke:#808080,color:#808080;
  classDef store stroke-dasharray:4 3,stroke:#808080,color:#808080;

  POD["pod stdout<br/>/stderr"]:::third
  FILE["node file<br/>/var/log/containers"]:::third
  FB["Fluent Bit agent<br/>tail + enrich"]:::built
  VEC["Vector aggregator<br/>sole writer"]:::third
  CH[("ClickHouse<br/>partition by project_id")]:::store

  POD --> FILE --> FB --> VEC --> CH

  linkStyle 0,1,2,3 stroke:#ED8C2B,stroke-width:1.5px;

Diagram 4 — Logs. A new pod is a new file the agent picks up automatically; nothing queries the pod. Fluent Bit stamps project · environment · service from the pod labels. The aggregator is the single fan-in point and the only client ClickHouse trusts. Amber = the logs pipe.

How to build it

Three pieces in a line — a store, a regional aggregator, and a thin per-node agent. In order:

1 · Stand up the store (per region). Provision the ClickHouse droplet and create logs.customer_logs — partitioned by project_id so retention can vary per tenant (chapter 7) — plus an ingest user (for Vector) and a read-only user (for control-plane reads, chapter 4):

ClickHouse SQL · schema

CREATE TABLE logs.customer_logs
(
  project_id   String,
  environment  LowCardinality(String),
  service_id   String,
  ts           DateTime64(3),
  level        LowCardinality(String),
  message      String
)
ENGINE = MergeTree
PARTITION BY (project_id, toDate(ts))           -- scoped by project; per-tenant retention in ch.7
ORDER BY (project_id, environment, service_id, ts);

ClickHouse SQL · users (run once)

-- ingest user — Vector writes (INSERT only)
CREATE USER vector IDENTIFIED BY '<ingest-secret>';
GRANT INSERT ON logs.customer_logs TO vector;
-- read-only user — control-plane reads over peering (SELECT only) = ch.4's "read-only ClickHouse user"
CREATE USER reader IDENTIFIED BY '<read-secret>';
GRANT SELECT ON logs.customer_logs TO reader;

2 · Stand up the Vector aggregator (per region) — the only writer ClickHouse trusts. A fluent source receives the agents, a remap normalizes identity, and the native batching clickhouse sink (with a 5 GiB disk buffer) is what avoids ClickHouse's "too many parts":

Vector · vector.toml (regional VM)

# receive Fluent Bit, normalize identity, batch into ClickHouse (native sink)
[sources.fb]
type    = "fluent"
address = "0.0.0.0:24224"

[transforms.norm]
type   = "remap"
inputs = ["fb"]
source = '''
  .project_id  = .kubernetes.labels."starform.io/project-id"
  .environment = .kubernetes.labels."starform.io/environment"
  .service_id  = .kubernetes.labels."starform.io/service-id"
'''

[sinks.ch]
type     = "clickhouse"
inputs   = ["norm"]
endpoint = "http://clickhouse.<region>.internal:8123"
database = "logs"
table    = "customer_logs"
skip_unknown_fields = true

[sinks.ch.batch]                      # batching is what avoids "too many parts"
max_events   = 100000
timeout_secs = 10

[sinks.ch.buffer]
type      = "disk"
max_size  = 5368709120                # 5 GiB on-disk backpressure buffer
when_full = "block"

3 · Deploy Fluent Bit as a DaemonSet — deliberately thin (~64Mi/node), it tails the container log files, enriches each line with the starform.io/* pod labels, and forwards to the regional aggregator (it never writes ClickHouse itself):

Fluent Bit · DaemonSet config

# tail container logs, enrich with pod labels, forward to the regional Vector aggregator
[INPUT]
    Name              tail
    Path              /var/log/containers/*.log
    multiline.parser  cri
    Tag               kube.*
    Skip_Long_Lines   On
[FILTER]
    Name                kubernetes
    Match               kube.*
    Merge_Log           On
    Labels              On            # brings starform.io/* pod labels
    K8S-Logging.Parser  On
[OUTPUT]
    Name        forward              # Fluent forward protocol → Vector `fluent` source
    Match       *
    Host        vector-agg.<region>.internal
    Port        24224
    tls         On
    Shared_Key  ${CLUSTER_BEARER}    # per-cluster auth, even over VPC

4 · Wire per-cluster auth. The Fluent Bit Shared_Key matches the Vector fluent source token — even though the hop is intra-VPC. See chapter 4 · Auth.

5 · Add build logs. Starbase Worker streams build logs into the same ClickHouse, so build and runtime logs share one store. (§16.6.)

6 · Verify. A pod's stdout line lands in logs.customer_logs stamped project·env·service within seconds; live tail streams; restart the aggregator → Fluent Bit disk-buffers and replays. FR-049

Gotchas & what lives elsewhere

Don't let agents write to ClickHouse directly. Frequent small inserts create too many parts and degrade ClickHouse. The aggregator exists to batch.
Agents disk-buffer for backpressure; if the aggregator or ClickHouse is briefly down, lines queue locally and replay.
Per-tenant log retention lives in ClickHouse (partition TTL), not Vector — see chapter 7.

PRD reference & inlined contracts

Owned by §35.1 (log flow), §35.4 (transport), §16.6 (build-log streaming), §24.1 (label set, the enrichment keys); FR-049. The label set is restated above so this guide stands alone — if it ever diverges, the PRD page wins. Canonical map: Canonical Sources.