Reference
Helm Values Reference¶
This document provides reference documentation for the Helm values generated by the monitoring CDK8S constructs.
Overview¶
The monitoring stack uses three Helm charts:
kube-prometheus-stack - Prometheus, Grafana, Alertmanager
loki - Log aggregation (SimpleScalable mode)
alloy - Log collection agent (DaemonSet)
All values are generated from TypeScript constructs and injected into HelmChart resources.
kube-prometheus-stack Helm Values¶
Chart: prometheus-community/kube-prometheus-stack
Version: Configurable in config.yaml (versions.prometheus)
Generated by: PrometheusHelmChartConstruct
Prometheus Configuration¶
prometheus:
prometheusSpec:
# Replica configuration
replicas: 2
# Retention and storage
retention: 3d
retentionSize: 2500MB
walCompression: true
# Resource limits
resources:
requests:
cpu: 100m
memory: 1500Mi
limits:
cpu: 1000m
memory: 3000Mi
# Storage class and size
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: longhorn
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 3Gi
# Thanos sidecar integration
thanos:
image: quay.io/thanos/thanos:v0.36.1
version: v0.36.1
objectStorageConfig:
existingSecret:
name: thanos-objstore-config
key: objstore.yml
resources:
requests:
cpu: 25m
memory: 128Mi
limits:
cpu: 100m
memory: 256Mi
# Service monitors
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
# External labels
externalLabels:
cluster: kup6s
prometheus_replica: '$(POD_NAME)'
# Pod configuration
priorityClassName: high-priority
# Node affinity (spread across zones)
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values: [prometheus]
topologyKey: topology.kubernetes.io/zone
Grafana Configuration¶
grafana:
enabled: true
# Admin credentials
adminPassword: ${GRAFANA_ADMIN_PASSWORD} # From secret
# Resource limits
resources:
requests:
cpu: 50m
memory: 512Mi
limits:
cpu: 500m
memory: 1024Mi
# Persistence
persistence:
enabled: true
storageClassName: longhorn
size: 5Gi
# Ingress
ingress:
enabled: true
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: letsencrypt-cluster-issuer
traefik.ingress.kubernetes.io/router.tls: "true"
hosts:
- grafana.ops.kup6s.net
tls:
- secretName: grafana-tls
hosts:
- grafana.ops.kup6s.net
# Datasources (managed by sidecar)
sidecar:
datasources:
enabled: true
defaultDatasourceEnabled: false
dashboards:
enabled: true
# Additional datasources
additionalDataSources:
- name: Thanos
type: prometheus
url: http://thanos-query.monitoring.svc.cluster.local:9090
access: proxy
isDefault: true
jsonData:
timeInterval: 30s
- name: Loki
type: loki
url: http://loki-gateway.monitoring.svc.cluster.local
access: proxy
jsonData:
maxLines: 1000
# Priority class
priorityClassName: high-priority
Alertmanager Configuration¶
alertmanager:
alertmanagerSpec:
# Replica configuration
replicas: 2
# Resource limits
resources:
requests:
cpu: 25m
memory: 100Mi
limits:
cpu: 100m
memory: 256Mi
# Storage
storage:
volumeClaimTemplate:
spec:
storageClassName: hcloud-volumes
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
# Priority class
priorityClassName: high-priority
# Configuration
config:
global:
smtp_from: ${SMTP_FROM}
smtp_smarthost: ${SMTP_HOST}:${SMTP_PORT}
smtp_auth_username: ${SMTP_USERNAME}
smtp_auth_password: ${SMTP_PASSWORD}
smtp_require_tls: true
route:
receiver: email-default
group_by: ['alertname', 'cluster', 'namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
routes:
- match:
severity: critical
receiver: email-critical
receivers:
- name: email-default
email_configs:
- to: ${ALERT_EMAIL_TO}
send_resolved: true
- name: email-critical
email_configs:
- to: ${ALERT_EMAIL_TO}
send_resolved: true
headers:
Subject: '[CRITICAL] {{ .GroupLabels.alertname }}'
Component-Specific Overrides¶
Prometheus Operator:
prometheusOperator:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
kube-state-metrics:
kube-state-metrics:
resources:
requests:
cpu: 10m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
Node Exporter (DaemonSet):
prometheus-node-exporter:
resources:
requests:
cpu: 10m
memory: 32Mi
limits:
cpu: 200m
memory: 64Mi
# Tolerations for all nodes
tolerations:
- effect: NoSchedule
operator: Exists
Loki Helm Values¶
Chart: grafana/loki
Version: Configurable in config.yaml (versions.loki)
Generated by: LokiHelmChartConstruct
Deployment Mode¶
deploymentMode: SimpleScalable
loki:
# Common configuration
commonConfig:
replication_factor: 2
# Storage backend
storage:
type: s3
bucketNames:
chunks: logs-loki-kup6s
ruler: logs-loki-kup6s
admin: logs-loki-kup6s
s3:
endpoint: https://fsn1.your-objectstorage.com
region: fsn1
accessKeyId:
existingSecret: loki-s3-config
key: access_key_id
secretAccessKey:
existingSecret: loki-s3-config
key: secret_access_key
s3ForcePathStyle: true
insecure: false
# Schema configuration
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
# Retention
limits_config:
retention_period: 744h # 31 days
max_query_length: 721h # Prevent queries >30 days
# Compactor (retention enforcement)
compactor:
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
Write Path Configuration¶
write:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
persistence:
enabled: true
storageClass: longhorn
size: 5Gi
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: write
topologyKey: topology.kubernetes.io/zone
Read Path Configuration¶
read:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
persistence:
enabled: true
storageClass: longhorn
size: 5Gi
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: read
topologyKey: topology.kubernetes.io/zone
Backend Configuration¶
backend:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
persistence:
enabled: true
storageClass: longhorn
size: 5Gi
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: backend
topologyKey: topology.kubernetes.io/zone
Gateway Configuration¶
gateway:
replicas: 1
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
ingress:
enabled: false # Access via Grafana only
Monitoring¶
monitoring:
serviceMonitor:
enabled: true
interval: 30s
selfMonitoring:
enabled: false
grafanaAgent:
installOperator: false
Alloy Helm Values¶
Chart: grafana/alloy
Version: Configurable in config.yaml (versions.alloy)
Generated by: AlloyConstruct
Controller Configuration¶
controller:
type: daemonset
# Update strategy
updateStrategy:
type: RollingUpdate
Alloy Configuration¶
alloy:
# Configuration mode
configMap:
create: true
content: |-
logging {
level = "info"
format = "logfmt"
}
discovery.kubernetes "pods" {
role = "pod"
selectors {
role = "pod"
field = "spec.nodeName=$${HOSTNAME}"
}
}
loki.source.kubernetes "pods" {
targets = discovery.kubernetes.pods.targets
forward_to = [loki.relabel.logs.receiver]
}
loki.relabel "logs" {
forward_to = [loki.process.logs.receiver]
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
target_label = "container"
}
rule {
source_labels = ["__meta_kubernetes_pod_label_app"]
target_label = "app"
}
rule {
source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_component"]
target_label = "component"
}
}
loki.process "logs" {
forward_to = [loki.write.default.receiver]
stage.json {
expressions = {
timestamp = "timestamp",
level = "level",
event = "event",
component = "component",
connector_id = "connector_id",
user_id = "user_id",
request_id = "request_id",
trace_id = "trace_id",
kafka = "kafka",
}
}
stage.labels {
values = {
level = "level",
}
}
stage.structured_metadata {
values = {
connector_id = "connector_id",
user_id = "user_id",
request_id = "request_id",
trace_id = "trace_id",
}
}
stage.timestamp {
source = "timestamp"
format = "RFC3339"
}
stage.output {
source = "event"
}
}
loki.write "default" {
endpoint {
url = "http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push"
}
external_labels = {
cluster = "kup6s",
}
}
# Environment variables
extraEnv:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
Resources¶
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
RBAC¶
rbac:
create: true
serviceAccount:
create: true
Priority¶
priorityClassName: high-priority
Environment Variable Substitution¶
All Helm values support environment variable substitution via the CDK8S configuration loader:
Example:
// In config.yaml
smtp:
host: mail.example.com
from: monitoring@kup6s.net
// Environment override
export SMTP_HOST=smtp.mailgun.org
export SMTP_FROM=alerts@kup6s.com
// Generated values use env vars if present
smtp_smarthost: smtp.mailgun.org:587
smtp_from: alerts@kup6s.com
Variables with defaults:
GRAFANA_ADMIN_PASSWORD- Defaults toadmin(should override)SMTP_HOST- Defaults toconfig.yamlvalueSMTP_PORT- Defaults to587SMTP_FROM- Defaults toconfig.yamlvalueSMTP_USERNAME- Defaults toconfig.yamlvalueSMTP_PASSWORD- No default (must be provided)ALERT_EMAIL_TO- Defaults toconfig.yamlvalue
Overriding Values¶
Method 1: Update config.yaml¶
Edit dp-infra/monitoring/config.yaml:
resources:
prometheus:
requests:
cpu: 200m # Changed from 100m
memory: 2Gi # Changed from 1500Mi
Then regenerate:
cd dp-infra/monitoring
npm run build
Method 2: Environment Variables¶
export PROMETHEUS_CPU_REQUEST=200m
export PROMETHEUS_MEMORY_REQUEST=2Gi
npm run build
Method 3: Direct Construct Modification¶
Edit charts/constructs/prometheus.ts and modify the generated values template.
Important: Changes to TypeScript require npm run compile before npm run synth.
Validation¶
Check Generated Values¶
View HelmChart resource:
kubectl get helmchart kube-prometheus-stack -n monitoring -o yaml
Extract values from manifest:
yq '.spec.valuesContent' dp-infra/monitoring/manifests/monitoring.k8s.yaml | \
grep -A 1000 "kind: HelmChart" | \
grep -A 1000 "name: kube-prometheus-stack"
Verify Applied Values¶
Check deployed Helm release:
# For K3S Helm controller
kubectl get helmchart -n monitoring -o yaml
# Check actual pod resources
kubectl get pods -n monitoring -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].resources}{"\n"}{end}'
Troubleshooting¶
Values Not Applied¶
Symptom: Changes to config.yaml don’t appear in cluster
Solution:
Verify manifest regenerated:
git diff manifests/monitoring.k8s.yamlCheck ArgoCD sync status:
kubectl get application monitoring -n argocdForce sync:
argocd app sync monitoring
Invalid Helm Values¶
Symptom: HelmChart stuck in error state
Solution:
# Check HelmChart status
kubectl describe helmchart kube-prometheus-stack -n monitoring
# View Helm controller logs
kubectl logs -n kube-system -l app=helm-controller
Resource Limits Too Low¶
Symptom: Pods OOMKilled or CPU throttled
Solution:
Check actual usage:
kubectl top pods -n monitoringUpdate
config.yamlresourcesRegenerate and redeploy
See Also¶
Configuration Reference - config.yaml schema
Resource Optimization - Sizing methodology
Constructs API - TypeScript construct details