Infrastructure Layering

This document explains the layered architecture approach used in the kup6s.com cluster, separating infrastructure bootstrapping from application deployments.

Why Layered Architecture?

Problem: Kubernetes clusters have a chicken-and-egg problem:

  • Applications need storage (Longhorn, S3)

  • Applications need networking (Traefik, cert-manager)

  • Applications need GitOps (ArgoCD)

  • But these components ARE applications themselves!

Solution: Two-tier architecture with clear separation of concerns:

  1. Infrastructure Tier: Bootstrap essential platform components (OpenTofu-managed)

  2. Application Tier: Deploy applications assuming platform exists (ArgoCD-managed)

Architecture Layers

┌─────────────────────────────────────────────────────┐
│ Developer Workstation                                │
├─────────────────────────────────────────────────────┤
│                                                      │
│  source .env && tofu apply                          │
│          │                                           │
│          ▼                                           │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ INFRASTRUCTURE TIER (Bootstrap)                      │
│ Managed via: OpenTofu (kube-hetzner/)               │
├─────────────────────────────────────────────────────┤
│                                                      │
│  • Storage: Longhorn, SMB CSI Driver                │
│  • Networking: Traefik, cert-manager                │
│  • Provisioning: Crossplane                         │
│  • Secrets: External Secrets Operator               │
│  • GitOps: ArgoCD itself                            │
│                                                      │
│  Rationale: Must exist before apps can deploy       │
└─────────────────────────────────────────────────────┘
           ▼ ArgoCD syncs from git repositories
┌─────────────────────────────────────────────────────┐
│ APPLICATION TIER (Platform & Apps)                  │
│ Managed via: ArgoCD (from dp-infra/ repos)          │
├─────────────────────────────────────────────────────┤
│                                                      │
│  Platform Services:                                 │
│  • Database Operators (CloudNativePG)               │
│  • Monitoring Stack (Prometheus, Thanos, Loki)      │
│                                                      │
│  Infrastructure Applications:                       │
│  • GitLab BDA, Mailu, etc.                          │
│                                                      │
│  Application Services:                              │
│  • PostgreSQL databases                             │
│  • Redis, RabbitMQ, etc.                            │
│  • S3 buckets (Crossplane-managed)                  │
│                                                      │
│  Customer Deployments:                              │
│  • Websites, APIs, microservices                    │
│                                                      │
│  Rationale: Assumes platform components exist       │
└─────────────────────────────────────────────────────┘

Infrastructure Tier (Bootstrap)

Purpose

Components that MUST exist before applications can be deployed. These form the platform foundation.

Management

  • Tool: OpenTofu v2.17.4

  • Configuration: kube-hetzner/kube.tf and extra-manifests/

  • Deployment: During initial cluster provisioning

  • Updates: Infrequent, planned, via tofu apply

Components

Storage Operators:

  • Longhorn: Cloud-native distributed block storage

    • Provides PersistentVolumes for applications

    • Enables replication, snapshots, backups

    • Required by: Almost all stateful applications

  • SMB CSI Driver: Hetzner Storage Box integration

    • Provides shared file storage

    • Required by: Applications needing shared volumes

  • Crossplane: Dynamic S3 bucket provisioning

    • Provisions S3 buckets on demand

    • Required by: Applications needing object storage

Networking:

  • Traefik: Ingress controller

    • Routes HTTP/HTTPS traffic to services

    • Required by: All public-facing applications

  • cert-manager: TLS certificate automation

    • Provisions Let’s Encrypt certificates

    • Required by: HTTPS endpoints

  • Cilium: CNI (Container Network Interface)

    • Pod-to-pod networking

    • Required by: All cluster communication

Secrets Management:

  • External Secrets Operator (ESO)

    • Syncs secrets from external sources

    • Required by: Applications using centralized secret management

GitOps Engine:

  • ArgoCD

    • Syncs applications from git to cluster

    • Required by: All application deployments in the next tier

Why These Components?

These components are dependencies of everything else:

  • ❌ Can’t deploy database without storage (Longhorn)

  • ❌ Can’t deploy webapp without ingress (Traefik)

  • ❌ Can’t deploy monitoring without ArgoCD (sync from git)

  • ❌ Can’t deploy apps needing secrets without ESO

Therefore: They must be bootstrapped first, before ArgoCD can manage anything.

Update Process

cd kube-hetzner
source .env  # Load credentials

# Plan changes (review before applying)
tofu plan

# Apply infrastructure changes
# IMPORTANT: For kube.tf changes, use the script:
bash scripts/apply-and-configure-longhorn.sh

# For other changes (extra-manifests, variables):
tofu apply

CRITICAL: Always use apply-and-configure-longhorn.sh for kube.tf changes to ensure proper Longhorn node configuration.

See Apply Infrastructure Changes How-To.

Application Tier (Platform & Apps)

Purpose

Applications and services that assume the platform exists. These can be deployed via GitOps because ArgoCD is already running.

Management

  • Tool: ArgoCD

  • Source Repositories: dp-infra/*, external repos

  • Deployment: Automated via ArgoCD sync from git

  • Updates: Frequent, automated, via git push

Components

Platform Services (infrastructure-level applications):

  1. Database Operators (from dp-infra/cnpg/):

    • CloudNativePG operator

    • Barman-cloud plugin

    • Enables PostgreSQL cluster provisioning

  2. Monitoring Stack (from dp-infra/monitoring/):

    • Prometheus (metrics collection)

    • Thanos (long-term metrics storage)

    • Loki (log aggregation)

    • Grafana (visualization)

    • Alloy (log collector)

    • Alertmanager (alert routing)

Infrastructure Applications (from dp-infra/*/):

  • GitLab BDA (from dp-infra/gitlabbda/)

  • Mailu (from dp-infra/mailu/, planned)

  • Other infrastructure services

Application Services (from external repos):

  • PostgreSQL databases (using CloudNativePG operator)

  • Redis, RabbitMQ, Kafka

  • Custom microservices

  • S3 buckets (using Crossplane from infrastructure tier)

Customer Deployments:

  • Websites and web applications

  • APIs and backend services

  • Static site generators

Why Application Tier?

These components depend on infrastructure:

  • ✅ Database needs storage (Longhorn from infrastructure tier)

  • ✅ Monitoring needs ingress (Traefik from infrastructure tier)

  • ✅ Apps need ArgoCD (from infrastructure tier to deploy them)

Therefore: They’re deployed via ArgoCD AFTER infrastructure exists.

Update Process

For dp-infra/ repositories (CDK8S-based):

cd dp-infra/monitoring  # Or any dp-infra subdirectory

# Edit TypeScript constructs or config.yaml
vim config.yaml

# Build manifests
npm run build

# Commit and push (triggers ArgoCD sync)
git add manifests/ config.yaml
git commit -m "Update monitoring configuration"
git push

# ArgoCD automatically syncs (or manually trigger):
argocd app sync monitoring

For external repositories:

  • Push changes to git repository

  • ArgoCD automatically detects and syncs changes (if auto-sync enabled)

  • Or manually sync via ArgoCD UI/CLI

See:

Source Repositories

Infrastructure Tier Sources

kube-hetzner/ (OpenTofu):

  • Repository: git@git.bluedynamics.eu:kup6s/kube-hetzner.git

  • Contains: kube.tf, extra-manifests/, variable definitions

  • Deployment: source .env && tofu apply

Application Tier Sources

argoapps/ (ArgoCD Application definitions):

  • Repository: git@git.bluedynamics.eu:kup6s/argoapps.git

  • Contains: CDK8S-based ArgoCD Application definitions

  • Purpose: Defines WHAT to deploy (points to dp-infra/ or external repos)

  • Deployment: npm run build && kubectl apply -f dist/

dp-infra/ (Infrastructure application manifests):

  • Repository: git@git.bluedynamics.eu:kup6s/dp/dp-infra.git

  • Contains: CDK8S-based deployments (monitoring, cnpg, gitlabbda, mailu, etc.)

  • Purpose: HOW to deploy infrastructure applications

  • Deployment: ArgoCD syncs from git (manifests committed to repo)

External repositories:

  • Custom application repositories

  • Referenced by ArgoCD Applications in argoapps/

  • Deployment: ArgoCD syncs from git

Update Strategy Summary

Layer

Tool

Frequency

Trigger

Risk

Infrastructure Tier

OpenTofu

Infrequent (weeks/months)

Manual tofu apply

High - affects all apps

Database Operators

ArgoCD (dp-infra/cnpg)

Occasional (months)

Git push + ArgoCD sync

Medium - affects databases

Monitoring Stack

ArgoCD (dp-infra/monitoring)

Regular (weeks)

Git push + ArgoCD sync

Low - monitoring-only

Infrastructure Apps

ArgoCD (dp-infra/*)

Regular (days/weeks)

Git push + ArgoCD sync

Medium - specific apps

Application Services

ArgoCD (external repos)

Frequent (daily)

Git push + ArgoCD sync

Low - app-specific

Risk Management:

  • High risk (infrastructure): Plan carefully, test in dev cluster, have rollback strategy

  • Medium risk (platform services): Canary deployments, monitor closely

  • Low risk (applications): Continuous deployment, quick rollback if issues

Benefits of Layering

Separation of Concerns

Infrastructure Team:

  • Manages platform components (storage, networking, GitOps)

  • Updates infrequently with high planning

  • Focused on cluster stability

Application Teams:

  • Deploy applications via ArgoCD

  • Update frequently via git push

  • Focused on feature delivery

Risk Isolation

  • Infrastructure changes don’t mix with application changes

  • Failed application deployment doesn’t affect platform

  • Platform stability enables rapid application iteration

Clear Dependencies

  • Infrastructure provides: Storage, networking, GitOps engine

  • Applications consume: Storage, networking, GitOps automation

  • No circular dependencies

Update Independence

  • Platform updates: Planned, controlled, infrequent

  • Application updates: Automated, frequent, low-ceremony

  • Teams don’t block each other

Anti-Patterns to Avoid

Deploying Storage Operators via ArgoCD:

  • Problem: ArgoCD needs storage to work (PVCs for Redis, etc.)

  • Chicken-and-egg: Can’t deploy storage provider using storage

  • Solution: Bootstrap storage via OpenTofu

Deploying ArgoCD via ArgoCD:

  • Problem: ArgoCD can’t manage itself during initial bootstrap

  • Bootstrap paradox: ArgoCD doesn’t exist to deploy itself

  • Solution: Bootstrap ArgoCD via OpenTofu, THEN let it manage apps

Manual kubectl apply for Applications:

  • Problem: Bypasses GitOps, creates drift between git and cluster

  • Loses audit trail and versioning

  • Solution: Always deploy applications via ArgoCD (git → cluster)

Mixing Infrastructure and Application Changes:

  • Problem: Complex rollback if something fails

  • Unclear which change caused issues

  • Solution: Separate commits, separate deployments

Troubleshooting

Infrastructure Change Broke Applications

Symptom: After tofu apply, applications fail or become degraded

Diagnosis:

  1. Check what changed:

    cd kube-hetzner
    git diff HEAD~1 HEAD
    
  2. Check affected components:

    kubectl get pods -A | grep -v Running
    kubectl get events -A --sort-by='.lastTimestamp' | tail -20
    

Resolution:

  • Rollback OpenTofu: git revert HEAD && tofu apply

  • Or fix forward: Address root cause and re-apply

ArgoCD Can’t Sync Applications

Symptom: ArgoCD Applications stuck in “OutOfSync” or “Degraded”

Diagnosis:

  1. Check ArgoCD Application status:

    kubectl get applications -n argocd
    kubectl describe application <app-name> -n argocd
    
  2. Check sync errors:

    kubectl get application <app-name> -n argocd -o jsonpath='{.status.conditions}'
    

Common Causes:

  • Infrastructure tier missing (e.g., ArgoCD not running)

  • Storage class unavailable (Longhorn not ready)

  • Network policy blocking (Cilium misconfigured)

Resolution: Fix infrastructure tier first, THEN re-sync applications.

New Application Won’t Deploy

Symptom: New ArgoCD Application created, but nothing happens

Diagnosis:

  1. Check Application created:

    kubectl get application <app-name> -n argocd
    
  2. Check Application definition:

    kubectl get application <app-name> -n argocd -o yaml
    

Common Causes:

  • Application definition not applied (kubectl apply -f dist/app.yaml)

  • Repository credentials missing (private repo)

  • Invalid path in Application spec

Resolution: Verify Application manifest, apply with kubectl, check ArgoCD logs.

Further Reading