Reference

Helm Values Reference

This document provides reference documentation for the Helm values generated by the monitoring CDK8S constructs.

Overview

The monitoring stack uses three Helm charts:

  1. kube-prometheus-stack - Prometheus, Grafana, Alertmanager

  2. loki - Log aggregation (SimpleScalable mode)

  3. alloy - Log collection agent (DaemonSet)

All values are generated from TypeScript constructs and injected into HelmChart resources.

kube-prometheus-stack Helm Values

Chart: prometheus-community/kube-prometheus-stack

Version: Configurable in config.yaml (versions.prometheus)

Generated by: PrometheusHelmChartConstruct

Prometheus Configuration

prometheus:
  prometheusSpec:
    # Replica configuration
    replicas: 2

    # Retention and storage
    retention: 3d
    retentionSize: 2500MB
    walCompression: true

    # Resource limits
    resources:
      requests:
        cpu: 100m
        memory: 1500Mi
      limits:
        cpu: 1000m
        memory: 3000Mi

    # Storage class and size
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 3Gi

    # Thanos sidecar integration
    thanos:
      image: quay.io/thanos/thanos:v0.36.1
      version: v0.36.1
      objectStorageConfig:
        existingSecret:
          name: thanos-objstore-config
          key: objstore.yml
      resources:
        requests:
          cpu: 25m
          memory: 128Mi
        limits:
          cpu: 100m
          memory: 256Mi

    # Service monitors
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false

    # External labels
    externalLabels:
      cluster: kup6s
      prometheus_replica: '$(POD_NAME)'

    # Pod configuration
    priorityClassName: high-priority

    # Node affinity (spread across zones)
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/name
                    operator: In
                    values: [prometheus]
              topologyKey: topology.kubernetes.io/zone

Grafana Configuration

grafana:
  enabled: true

  # Admin credentials
  adminPassword: ${GRAFANA_ADMIN_PASSWORD}  # From secret

  # Resource limits
  resources:
    requests:
      cpu: 50m
      memory: 512Mi
    limits:
      cpu: 500m
      memory: 1024Mi

  # Persistence
  persistence:
    enabled: true
    storageClassName: longhorn
    size: 5Gi

  # Ingress
  ingress:
    enabled: true
    ingressClassName: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-cluster-issuer
      traefik.ingress.kubernetes.io/router.tls: "true"
    hosts:
      - grafana.ops.kup6s.net
    tls:
      - secretName: grafana-tls
        hosts:
          - grafana.ops.kup6s.net

  # Datasources (managed by sidecar)
  sidecar:
    datasources:
      enabled: true
      defaultDatasourceEnabled: false
    dashboards:
      enabled: true

  # Additional datasources
  additionalDataSources:
    - name: Thanos
      type: prometheus
      url: http://thanos-query.monitoring.svc.cluster.local:9090
      access: proxy
      isDefault: true
      jsonData:
        timeInterval: 30s

    - name: Loki
      type: loki
      url: http://loki-gateway.monitoring.svc.cluster.local
      access: proxy
      jsonData:
        maxLines: 1000

  # Priority class
  priorityClassName: high-priority

Alertmanager Configuration

alertmanager:
  alertmanagerSpec:
    # Replica configuration
    replicas: 2

    # Resource limits
    resources:
      requests:
        cpu: 25m
        memory: 100Mi
      limits:
        cpu: 100m
        memory: 256Mi

    # Storage
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: hcloud-volumes
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

    # Priority class
    priorityClassName: high-priority

  # Configuration
  config:
    global:
      smtp_from: ${SMTP_FROM}
      smtp_smarthost: ${SMTP_HOST}:${SMTP_PORT}
      smtp_auth_username: ${SMTP_USERNAME}
      smtp_auth_password: ${SMTP_PASSWORD}
      smtp_require_tls: true

    route:
      receiver: email-default
      group_by: ['alertname', 'cluster', 'namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      routes:
        - match:
            severity: critical
          receiver: email-critical

    receivers:
      - name: email-default
        email_configs:
          - to: ${ALERT_EMAIL_TO}
            send_resolved: true

      - name: email-critical
        email_configs:
          - to: ${ALERT_EMAIL_TO}
            send_resolved: true
            headers:
              Subject: '[CRITICAL] {{ .GroupLabels.alertname }}'

Component-Specific Overrides

Prometheus Operator:

prometheusOperator:
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 200m
      memory: 256Mi

kube-state-metrics:

kube-state-metrics:
  resources:
    requests:
      cpu: 10m
      memory: 64Mi
    limits:
      cpu: 100m
      memory: 128Mi

Node Exporter (DaemonSet):

prometheus-node-exporter:
  resources:
    requests:
      cpu: 10m
      memory: 32Mi
    limits:
      cpu: 200m
      memory: 64Mi

  # Tolerations for all nodes
  tolerations:
    - effect: NoSchedule
      operator: Exists

Loki Helm Values

Chart: grafana/loki

Version: Configurable in config.yaml (versions.loki)

Generated by: LokiHelmChartConstruct

Deployment Mode

deploymentMode: SimpleScalable

loki:
  # Common configuration
  commonConfig:
    replication_factor: 2

  # Storage backend
  storage:
    type: s3
    bucketNames:
      chunks: logs-loki-kup6s
      ruler: logs-loki-kup6s
      admin: logs-loki-kup6s

    s3:
      endpoint: https://fsn1.your-objectstorage.com
      region: fsn1
      accessKeyId:
        existingSecret: loki-s3-config
        key: access_key_id
      secretAccessKey:
        existingSecret: loki-s3-config
        key: secret_access_key
      s3ForcePathStyle: true
      insecure: false

  # Schema configuration
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h

  # Retention
  limits_config:
    retention_period: 744h  # 31 days
    max_query_length: 721h  # Prevent queries >30 days

  # Compactor (retention enforcement)
  compactor:
    retention_enabled: true
    retention_delete_delay: 2h
    retention_delete_worker_count: 150

Write Path Configuration

write:
  replicas: 2

  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

  persistence:
    enabled: true
    storageClass: longhorn
    size: 5Gi

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app.kubernetes.io/component: write
            topologyKey: topology.kubernetes.io/zone

Read Path Configuration

read:
  replicas: 2

  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

  persistence:
    enabled: true
    storageClass: longhorn
    size: 5Gi

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app.kubernetes.io/component: read
            topologyKey: topology.kubernetes.io/zone

Backend Configuration

backend:
  replicas: 2

  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

  persistence:
    enabled: true
    storageClass: longhorn
    size: 5Gi

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app.kubernetes.io/component: backend
            topologyKey: topology.kubernetes.io/zone

Gateway Configuration

gateway:
  replicas: 1

  resources:
    requests:
      cpu: 50m
      memory: 128Mi
    limits:
      cpu: 200m
      memory: 256Mi

  ingress:
    enabled: false  # Access via Grafana only

Monitoring

monitoring:
  serviceMonitor:
    enabled: true
    interval: 30s

  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false

Alloy Helm Values

Chart: grafana/alloy

Version: Configurable in config.yaml (versions.alloy)

Generated by: AlloyConstruct

Controller Configuration

controller:
  type: daemonset

  # Update strategy
  updateStrategy:
    type: RollingUpdate

Alloy Configuration

alloy:
  # Configuration mode
  configMap:
    create: true
    content: |-
      logging {
        level  = "info"
        format = "logfmt"
      }

      discovery.kubernetes "pods" {
        role = "pod"
        selectors {
          role  = "pod"
          field = "spec.nodeName=$${HOSTNAME}"
        }
      }

      loki.source.kubernetes "pods" {
        targets    = discovery.kubernetes.pods.targets
        forward_to = [loki.relabel.logs.receiver]
      }

      loki.relabel "logs" {
        forward_to = [loki.process.logs.receiver]

        rule {
          source_labels = ["__meta_kubernetes_namespace"]
          target_label  = "namespace"
        }

        rule {
          source_labels = ["__meta_kubernetes_pod_name"]
          target_label  = "pod"
        }

        rule {
          source_labels = ["__meta_kubernetes_pod_container_name"]
          target_label  = "container"
        }

        rule {
          source_labels = ["__meta_kubernetes_pod_label_app"]
          target_label  = "app"
        }

        rule {
          source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_component"]
          target_label  = "component"
        }
      }

      loki.process "logs" {
        forward_to = [loki.write.default.receiver]

        stage.json {
          expressions = {
            timestamp    = "timestamp",
            level        = "level",
            event        = "event",
            component    = "component",
            connector_id = "connector_id",
            user_id      = "user_id",
            request_id   = "request_id",
            trace_id     = "trace_id",
            kafka        = "kafka",
          }
        }

        stage.labels {
          values = {
            level = "level",
          }
        }

        stage.structured_metadata {
          values = {
            connector_id = "connector_id",
            user_id      = "user_id",
            request_id   = "request_id",
            trace_id     = "trace_id",
          }
        }

        stage.timestamp {
          source = "timestamp"
          format = "RFC3339"
        }

        stage.output {
          source = "event"
        }
      }

      loki.write "default" {
        endpoint {
          url = "http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push"
        }
        external_labels = {
          cluster = "kup6s",
        }
      }

  # Environment variables
  extraEnv:
    - name: HOSTNAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName

Resources

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 200m
    memory: 256Mi

RBAC

rbac:
  create: true

serviceAccount:
  create: true

Priority

priorityClassName: high-priority

Environment Variable Substitution

All Helm values support environment variable substitution via the CDK8S configuration loader:

Example:

// In config.yaml
smtp:
  host: mail.example.com
  from: monitoring@kup6s.net

// Environment override
export SMTP_HOST=smtp.mailgun.org
export SMTP_FROM=alerts@kup6s.com

// Generated values use env vars if present
smtp_smarthost: smtp.mailgun.org:587
smtp_from: alerts@kup6s.com

Variables with defaults:

  • GRAFANA_ADMIN_PASSWORD - Defaults to admin (should override)

  • SMTP_HOST - Defaults to config.yaml value

  • SMTP_PORT - Defaults to 587

  • SMTP_FROM - Defaults to config.yaml value

  • SMTP_USERNAME - Defaults to config.yaml value

  • SMTP_PASSWORD - No default (must be provided)

  • ALERT_EMAIL_TO - Defaults to config.yaml value

Overriding Values

Method 1: Update config.yaml

Edit dp-infra/monitoring/config.yaml:

resources:
  prometheus:
    requests:
      cpu: 200m  # Changed from 100m
      memory: 2Gi  # Changed from 1500Mi

Then regenerate:

cd dp-infra/monitoring
npm run build

Method 2: Environment Variables

export PROMETHEUS_CPU_REQUEST=200m
export PROMETHEUS_MEMORY_REQUEST=2Gi
npm run build

Method 3: Direct Construct Modification

Edit charts/constructs/prometheus.ts and modify the generated values template.

Important: Changes to TypeScript require npm run compile before npm run synth.

Validation

Check Generated Values

View HelmChart resource:

kubectl get helmchart kube-prometheus-stack -n monitoring -o yaml

Extract values from manifest:

yq '.spec.valuesContent' dp-infra/monitoring/manifests/monitoring.k8s.yaml | \
  grep -A 1000 "kind: HelmChart" | \
  grep -A 1000 "name: kube-prometheus-stack"

Verify Applied Values

Check deployed Helm release:

# For K3S Helm controller
kubectl get helmchart -n monitoring -o yaml

# Check actual pod resources
kubectl get pods -n monitoring -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].resources}{"\n"}{end}'

Troubleshooting

Values Not Applied

Symptom: Changes to config.yaml don’t appear in cluster

Solution:

  1. Verify manifest regenerated: git diff manifests/monitoring.k8s.yaml

  2. Check ArgoCD sync status: kubectl get application monitoring -n argocd

  3. Force sync: argocd app sync monitoring

Invalid Helm Values

Symptom: HelmChart stuck in error state

Solution:

# Check HelmChart status
kubectl describe helmchart kube-prometheus-stack -n monitoring

# View Helm controller logs
kubectl logs -n kube-system -l app=helm-controller

Resource Limits Too Low

Symptom: Pods OOMKilled or CPU throttled

Solution:

  1. Check actual usage: kubectl top pods -n monitoring

  2. Update config.yaml resources

  3. Regenerate and redeploy

  4. See Resource Optimization

See Also