Explanation

Monitoring Stack Architecture Overview¶

Type: Explanation (Understanding-oriented)

Related Concepts: CDK8S Approach | Prometheus Thanos Integration | Loki Architecture

Introduction¶

The kup6s monitoring stack provides comprehensive observability for the Kubernetes cluster, combining metrics, logs, and alerting into a unified system. This document explains the overall architecture, component relationships, and data flow patterns.

System Architecture¶

┌─────────────────────────────────────────────────────────────────┐
│                         Grafana UI                              │
│                  (Visualization & Dashboards)                   │
└─────────────┬───────────────────────────────┬───────────────────┘
              │                               │
              │ Metrics Query                 │ Logs Query
              ▼                               ▼
    ┌─────────────────┐              ┌──────────────────┐
    │  Thanos Query   │              │   Loki Gateway   │
    │  (Federation)   │              │   (HTTP Proxy)   │
    └────────┬────────┘              └────────┬─────────┘
             │                                │
             ├──────────┬─────────────────────┼────────────┐
             │          │                     │            │
             ▼          ▼                     ▼            ▼
    ┌────────────┐  ┌──────────┐      ┌──────────┐  ┌──────────┐
    │ Prometheus │  │  Thanos  │      │   Loki   │  │   Loki   │
    │  Sidecars  │  │  Store   │      │  Write   │  │   Read   │
    │   (gRPC)   │  │  (S3)    │      │          │  │          │
    └─────┬──────┘  └──────────┘      └─────┬────┘  └─────┬────┘
          │                                  │             │
          ▼                                  ▼             ▼
    ┌──────────┐                      ┌───────────────────────┐
    │Prometheus│                      │   Loki Backend        │
    │  (TSDB)  │                      │   (Index + Chunks)    │
    └─────┬────┘                      └───────────┬───────────┘
          │                                       │
          │ Scrape                                │ Push
          ▼                                       ▼
    ┌──────────────────┐              ┌──────────────────┐
    │  Service         │              │   Alloy          │
    │  Monitors        │              │   (DaemonSet)    │
    │  (Targets)       │              │   Log Collector  │
    └──────────────────┘              └──────────────────┘

Core Components¶

Metrics Pipeline¶

Prometheus (2 replicas, StatefulSet)

Role: Primary metrics collection and short-term storage
Collection Method: Pull-based (scraping targets every 30s)
Storage: 3Gi Longhorn PVC per replica (3-day retention)
High Availability: 2 replicas with identical configuration
Integration: Thanos sidecar uploads 2-hour blocks to S3

Thanos Architecture

Sidecar: Runs alongside Prometheus, uploads blocks to S3, provides gRPC StoreAPI
Query (2 replicas): Unified query interface, federates queries across sidecars and stores
Store (2 replicas): Queries historical data from S3, caches indexes (10Gi PVC each)
Compactor (1 replica): Downsamples data, applies retention policies (20Gi PVC)

Why Thanos?

Unlimited retention (S3 storage is cheap)
Global query view (queries both real-time Prometheus and historical S3)
Automatic downsampling (5m and 1h resolutions for long-term data)
Cost optimization (compress and downsample old metrics)

Logs Pipeline¶

Loki (SimpleScalable mode)

Write Path (2 replicas): Receives logs from Alloy, writes to S3
Read Path (2 replicas): Serves log queries, reads from S3
Backend (2 replicas): Handles both index and chunk storage
Storage: S3 for chunks and indexes, Longhorn PVCs for WAL/cache
Retention: 744h (31 days) in S3

Alloy (DaemonSet on all nodes)

Role: Log collection agent (Grafana Agent successor)
Collection Method: Kubernetes API (no privileged access needed)
Processing: JSON parsing, structured metadata extraction
Filtering: Per-node log collection via spec.nodeName selector
Labeling: Adds cluster, namespace, pod, container labels

Why SimpleScalable?

Balanced complexity vs scalability
Separate read/write paths (independent scaling)
S3-native (no need for object storage gateways)
Suitable for clusters up to 100 nodes

Visualization & Alerting¶

Grafana (1 replica, Deployment)

Datasources: Thanos Query (metrics), Loki Gateway (logs)
Dashboards: 25 pre-configured dashboards (K8s resources, networking, storage)
Storage: 5Gi Longhorn PVC for dashboards and settings
Authentication: Admin credentials stored in Kubernetes Secret

Alertmanager (2 replicas, StatefulSet)

Role: Alert routing and notification
Clustering: 2-peer gossip cluster for high availability
Notifications: Email via SMTP
Storage: 10Gi Hetzner Volumes PVC per replica

Storage Architecture¶

Longhorn PVCs (Primary persistent storage)

Prometheus: 3Gi × 2 replicas (2-replica Longhorn volumes)
Thanos Store: 10Gi × 2 replicas (index caching)
Thanos Compactor: 20Gi (compaction workspace)
Grafana: 5Gi (dashboards and config)
Loki components: Multiple PVCs for WAL and cache

S3 Buckets (Long-term object storage)

metrics-thanos-kup6s: Prometheus metrics (730-day retention)
logs-loki-kup6s: Loki log chunks and indexes (90-day retention)
Region: fsn1 (Falkenstein, same as cluster)
Lifecycle: Automated expiration via S3 lifecycle policies

Data Flow Patterns¶

Metrics Flow¶

Collection: Prometheus scrapes targets (service monitors) every 30s
Storage: Metrics stored in local TSDB (3-day retention)
Upload: Thanos sidecar uploads 2-hour blocks to S3 every 2 hours
Compaction: Thanos Compactor downsamples blocks (5m, 1h resolutions)
Query: Thanos Query federates real-time (sidecars) + historical (store) data
Visualization: Grafana queries Thanos Query endpoint

Logs Flow¶

Collection: Alloy reads pod logs via Kubernetes API
Processing: JSON parsing, metadata extraction, labeling
Ingestion: Logs pushed to Loki Write (via Gateway)
Storage: Write path stores chunks and indexes in S3
Query: Read path serves log queries from S3
Visualization: Grafana queries Loki Gateway endpoint

Alert Flow¶

Evaluation: Prometheus evaluates alert rules every 30s
Firing: Alerts sent to Alertmanager when conditions met
Routing: Alertmanager routes by severity/namespace
Notification: Emails sent via SMTP
Silencing: Manual silences configured in Alertmanager UI

Resource Allocation¶

CPU Allocation (Total: ~2.7 cores requests)¶

Prometheus: 100m × 2 = 200m
Thanos Query: 200m × 2 = 400m
Thanos Store: 200m × 2 = 400m
Thanos Compactor: 500m = 500m
Loki Write: 100m × 2 = 200m
Loki Read: 100m × 2 = 200m
Loki Backend: 100m × 2 = 200m
Grafana: 50m = 50m
Alloy: 50m × 4 nodes = 200m
Alertmanager: 25m × 2 = 50m

Memory Allocation (Total: ~12.5 GB requests)¶

Prometheus: 1500Mi × 2 = 3000Mi (~3GB)
Thanos Query: 512Mi × 2 = 1024Mi (~1GB)
Thanos Store: 1Gi × 2 = 2048Mi (~2GB)
Thanos Compactor: 2Gi = 2048Mi (~2GB)
Loki Write: 256Mi × 2 = 512Mi
Loki Read: 256Mi × 2 = 512Mi
Loki Backend: 256Mi × 2 = 512Mi
Grafana: 512Mi = 512Mi
Alloy: 128Mi × 4 nodes = 512Mi
Alertmanager: 100Mi × 2 = 200Mi

Storage Allocation (Total: ~100 GB PVCs + S3)¶

Longhorn PVCs: ~60Gi total
- Prometheus: 6Gi (3Gi × 2)
- Thanos: 30Gi (10Gi × 2 store + 20Gi compactor)
- Grafana: 5Gi
- Loki: ~20Gi (multiple components)
S3 Storage: Unlimited (pay-per-GB)
- Metrics: ~50GB (compressed, downsampled)
- Logs: ~20GB (31-day retention)

High Availability Design¶

Component HA Status¶

Fully HA (2+ replicas):

✅ Prometheus (2 replicas, independent scraping)
✅ Thanos Query (2 replicas, stateless)
✅ Thanos Store (2 replicas, shared S3 state)
✅ Loki Write (2 replicas, shared S3 state)
✅ Loki Read (2 replicas, shared S3 state)
✅ Loki Backend (2 replicas, shared S3 state)
✅ Alertmanager (2 replicas, gossip cluster)

Single Replica (acceptable for role):

⚠️ Thanos Compactor (1 replica, background job, restartable)
⚠️ Grafana (1 replica, UI only, stateless config in PVC)

DaemonSet (node-level HA):

✅ Alloy (4 nodes, each collects its node’s logs)

Failure Scenarios¶

Prometheus Pod Failure:

Impact: Metrics gap for ~1 minute (scrape interval)
Recovery: StatefulSet restarts pod automatically
Data: No loss (replica continues scraping)

Thanos Store Failure:

Impact: Historical queries slower (1 replica serves requests)
Recovery: StatefulSet restarts pod automatically
Data: No loss (S3 is source of truth)

Loki Write Failure:

Impact: Log ingestion continues via remaining replica
Recovery: Deployment restarts pod automatically
Data: Minimal loss (Alloy retries failed pushes)

S3 Outage:

Impact: Historical metrics/logs unavailable
Recovery: Automatic when S3 recovers
Data: No loss (local caches serve recent data)

Security Model¶

Authentication¶

Grafana: Username/password (stored in K8s Secret)
Prometheus: No authentication (cluster-internal only)
Alertmanager: No authentication (cluster-internal only)

Network Security¶

Ingress: Traefik with TLS termination (Let’s Encrypt)
Service Mesh: None (all communication within cluster)
Network Policies: Not implemented (future consideration)

Credential Management¶

S3 Credentials: External Secrets Operator (ESO) replicates from crossplane-system
SMTP Credentials: Stored in ConfigMap (consider moving to Secret)
Grafana Password: Auto-generated, stored in Secret

RBAC¶

Prometheus: ClusterRole for scraping metrics
Alloy: ClusterRole for reading pod logs
Thanos/Loki: No special permissions (S3 access via credentials)

Deployment Architecture (CDK8S)¶

Repository Structure¶

dp-infra/monitoring/
├── charts/
│   ├── constructs/          # 11 TypeScript constructs
│   ├── types.ts             # Shared TypeScript interfaces
│   └── monitoring-chart.ts  # Main chart assembling constructs
├── manifests/
│   └── monitoring.k8s.yaml  # Generated manifests (committed)
├── config.yaml              # Configuration values
└── main.ts                  # Entry point, config loading

Construct Pattern¶

Each component is a TypeScript class (construct) that:

Accepts typed configuration (MonitoringConfig)
Generates Kubernetes resources (ApiObject)
Uses consistent labeling and sync waves
Documents prerequisites and behavior

Sync Waves (ArgoCD Ordering)¶

Wave 0: Namespace, PriorityClass, ProviderConfig
Wave 1: S3 Buckets, ExternalSecrets
Wave 2: HelmCharts (Prometheus, Loki)
Wave 3: Thanos components, Alloy

Integration Points¶

External Dependencies¶

Crossplane: S3 bucket provisioning
External Secrets Operator: Secret replication
Traefik: Ingress and TLS termination
Longhorn: Persistent volume provisioning
cert-manager: TLS certificate management (via Traefik)

Service Discovery¶

Prometheus: Kubernetes service discovery (role: endpoints, pod, service)
Alertmanager: Gossip protocol for peer discovery
Thanos Query: DNS SRV records for sidecar discovery

API Integrations¶

Kubernetes API: Prometheus scraping, Alloy log collection
S3 API: Thanos/Loki object storage
SMTP: Alertmanager email notifications

Monitoring the Monitoring¶

Self-Monitoring Metrics¶

up{job="kube-prometheus-stack-prometheus"}: Prometheus health
prometheus_tsdb_head_series: Cardinality tracking
thanos_sidecar_shipper_uploads_total: S3 upload success
loki_ingester_chunks_flushed_total: Loki ingestion rate

Health Checks¶

Prometheus: /-/healthy endpoint
Thanos Query: /-/healthy endpoint
Loki: /ready endpoint
Grafana: /api/health endpoint

Alerting on Monitoring Issues¶

PrometheusDown: Prometheus not scraping
ThanosCompactorFailed: Compaction errors
LokiRequestErrors: Log ingestion failures
AlertmanagerDown: Alert routing broken

Performance Characteristics¶

Query Performance¶

Recent metrics (<3 days): ~100ms (Prometheus TSDB)
Historical metrics (>3 days): ~500ms (Thanos Store S3)
Recent logs (<1 hour): ~200ms (Loki memory cache)
Historical logs (>1 hour): ~1s (S3 reads)

Ingestion Rates¶

Metrics: ~100k samples/sec (8 nodes, ~1000 targets)
Logs: ~50MB/day uncompressed (~5MB/day compressed in S3)
Alerts: ~10 evaluations/sec

Storage Growth¶

Metrics (S3): ~500MB/day (compressed)
Logs (S3): ~5MB/day (compressed)
Longhorn PVCs: Stable after initial fill

Future Enhancements¶

Planned Improvements¶

[ ] Add distributed tracing (Tempo)
[ ] Implement network policies
[ ] Add multi-cluster federation
[ ] Migrate SMTP credentials to Secret
[ ] Add SLO/SLI dashboards
[ ] Implement log sampling for high-volume namespaces

Scalability Considerations¶

Prometheus: Consider sharding for >2000 targets
Loki: Migrate to microservices mode for >100 nodes
Thanos Store: Add replicas if S3 read latency increases
Alloy: Current design scales linearly with node count