Infrastructure Layering¶

This document explains the layered architecture approach used in the kup6s.com cluster, separating infrastructure bootstrapping from application deployments.

Why Layered Architecture?¶

Problem: Kubernetes clusters have a chicken-and-egg problem:

Applications need storage (Longhorn, S3)
Applications need networking (Traefik, cert-manager)
Applications need GitOps (ArgoCD)
But these components ARE applications themselves!

Solution: Two-tier architecture with clear separation of concerns:

Infrastructure Tier: Bootstrap essential platform components (OpenTofu-managed)
Application Tier: Deploy applications assuming platform exists (ArgoCD-managed)

Architecture Layers¶

┌─────────────────────────────────────────────────────┐
│ Developer Workstation                                │
├─────────────────────────────────────────────────────┤
│                                                      │
│  source .env && tofu apply                          │
│          │                                           │
│          ▼                                           │
└─────────────────────────────────────────────────────┘
           │
           ▼
┌─────────────────────────────────────────────────────┐
│ INFRASTRUCTURE TIER (Bootstrap)                      │
│ Managed via: OpenTofu (kube-hetzner/)               │
├─────────────────────────────────────────────────────┤
│                                                      │
│  • Storage: Longhorn, SMB CSI Driver                │
│  • Networking: Traefik, cert-manager                │
│  • Provisioning: Crossplane                         │
│  • Secrets: External Secrets Operator               │
│  • GitOps: ArgoCD itself                            │
│                                                      │
│  Rationale: Must exist before apps can deploy       │
└─────────────────────────────────────────────────────┘
           │
           ▼ ArgoCD syncs from git repositories
┌─────────────────────────────────────────────────────┐
│ APPLICATION TIER (Platform & Apps)                  │
│ Managed via: ArgoCD (from dp-infra/ repos)          │
├─────────────────────────────────────────────────────┤
│                                                      │
│  Platform Services:                                 │
│  • Database Operators (CloudNativePG)               │
│  • Monitoring Stack (Prometheus, Thanos, Loki)      │
│                                                      │
│  Infrastructure Applications:                       │
│  • GitLab BDA, Mailu, etc.                          │
│                                                      │
│  Application Services:                              │
│  • PostgreSQL databases                             │
│  • Redis, RabbitMQ, etc.                            │
│  • S3 buckets (Crossplane-managed)                  │
│                                                      │
│  Customer Deployments:                              │
│  • Websites, APIs, microservices                    │
│                                                      │
│  Rationale: Assumes platform components exist       │
└─────────────────────────────────────────────────────┘

Infrastructure Tier (Bootstrap)¶

Purpose¶

Components that MUST exist before applications can be deployed. These form the platform foundation.

Management¶

Tool: OpenTofu v2.17.4
Configuration: kube-hetzner/kube.tf and extra-manifests/
Deployment: During initial cluster provisioning
Updates: Infrequent, planned, via tofu apply

Components¶

Storage Operators:

Longhorn: Cloud-native distributed block storage
- Provides PersistentVolumes for applications
- Enables replication, snapshots, backups
- Required by: Almost all stateful applications
SMB CSI Driver: Hetzner Storage Box integration
- Provides shared file storage
- Required by: Applications needing shared volumes
Crossplane: Dynamic S3 bucket provisioning
- Provisions S3 buckets on demand
- Required by: Applications needing object storage

Networking:

Traefik: Ingress controller
- Routes HTTP/HTTPS traffic to services
- Required by: All public-facing applications
cert-manager: TLS certificate automation
- Provisions Let’s Encrypt certificates
- Required by: HTTPS endpoints
Cilium: CNI (Container Network Interface)
- Pod-to-pod networking
- Required by: All cluster communication

Secrets Management:

External Secrets Operator (ESO)
- Syncs secrets from external sources
- Required by: Applications using centralized secret management

GitOps Engine:

ArgoCD
- Syncs applications from git to cluster
- Required by: All application deployments in the next tier

Why These Components?¶

These components are dependencies of everything else:

❌ Can’t deploy database without storage (Longhorn)
❌ Can’t deploy webapp without ingress (Traefik)
❌ Can’t deploy monitoring without ArgoCD (sync from git)
❌ Can’t deploy apps needing secrets without ESO

Therefore: They must be bootstrapped first, before ArgoCD can manage anything.

Update Process¶

cd kube-hetzner
source .env  # Load credentials

# Plan changes (review before applying)
tofu plan

# Apply infrastructure changes
# IMPORTANT: For kube.tf changes, use the script:
bash scripts/apply-and-configure-longhorn.sh

# For other changes (extra-manifests, variables):
tofu apply

CRITICAL: Always use apply-and-configure-longhorn.sh for kube.tf changes to ensure proper Longhorn node configuration.

See Apply Infrastructure Changes How-To.

Application Tier (Platform & Apps)¶

Purpose¶

Applications and services that assume the platform exists. These can be deployed via GitOps because ArgoCD is already running.

Management¶

Tool: ArgoCD
Source Repositories: dp-infra/*, external repos
Deployment: Automated via ArgoCD sync from git
Updates: Frequent, automated, via git push

Components¶

Platform Services (infrastructure-level applications):

Database Operators (from dp-infra/cnpg/):
- CloudNativePG operator
- Barman-cloud plugin
- Enables PostgreSQL cluster provisioning
Monitoring Stack (from dp-infra/monitoring/):
- Prometheus (metrics collection)
- Thanos (long-term metrics storage)
- Loki (log aggregation)
- Grafana (visualization)
- Alloy (log collector)
- Alertmanager (alert routing)

Infrastructure Applications (from dp-infra/*/):

GitLab BDA (from dp-infra/gitlabbda/)
Mailu (from dp-infra/mailu/, planned)
Other infrastructure services

Application Services (from external repos):

PostgreSQL databases (using CloudNativePG operator)
Redis, RabbitMQ, Kafka
Custom microservices
S3 buckets (using Crossplane from infrastructure tier)

Customer Deployments:

Websites and web applications
APIs and backend services
Static site generators

Why Application Tier?¶

These components depend on infrastructure:

✅ Database needs storage (Longhorn from infrastructure tier)
✅ Monitoring needs ingress (Traefik from infrastructure tier)
✅ Apps need ArgoCD (from infrastructure tier to deploy them)

Therefore: They’re deployed via ArgoCD AFTER infrastructure exists.

Update Process¶

For dp-infra/ repositories (CDK8S-based):

cd dp-infra/monitoring  # Or any dp-infra subdirectory

# Edit TypeScript constructs or config.yaml
vim config.yaml

# Build manifests
npm run build

# Commit and push (triggers ArgoCD sync)
git add manifests/ config.yaml
git commit -m "Update monitoring configuration"
git push

# ArgoCD automatically syncs (or manually trigger):
argocd app sync monitoring

For external repositories:

Push changes to git repository
ArgoCD automatically detects and syncs changes (if auto-sync enabled)
Or manually sync via ArgoCD UI/CLI

See:

Monitoring Deployment How-To
GitLab BDA Deployment How-To

Source Repositories¶

Infrastructure Tier Sources¶

kube-hetzner/ (OpenTofu):

Repository: git@git.bluedynamics.eu:kup6s/kube-hetzner.git
Contains: kube.tf, extra-manifests/, variable definitions
Deployment: source .env && tofu apply

Application Tier Sources¶

argoapps/ (ArgoCD Application definitions):

Repository: git@git.bluedynamics.eu:kup6s/argoapps.git
Contains: CDK8S-based ArgoCD Application definitions
Purpose: Defines WHAT to deploy (points to dp-infra/ or external repos)
Deployment: npm run build && kubectl apply -f dist/

dp-infra/ (Infrastructure application manifests):

Repository: git@git.bluedynamics.eu:kup6s/dp/dp-infra.git
Contains: CDK8S-based deployments (monitoring, cnpg, gitlabbda, mailu, etc.)
Purpose: HOW to deploy infrastructure applications
Deployment: ArgoCD syncs from git (manifests committed to repo)

External repositories:

Custom application repositories
Referenced by ArgoCD Applications in argoapps/
Deployment: ArgoCD syncs from git

Update Strategy Summary¶

Layer	Tool	Frequency	Trigger	Risk
Infrastructure Tier	OpenTofu	Infrequent (weeks/months)	Manual `tofu apply`	High - affects all apps
Database Operators	ArgoCD (dp-infra/cnpg)	Occasional (months)	Git push + ArgoCD sync	Medium - affects databases
Monitoring Stack	ArgoCD (dp-infra/monitoring)	Regular (weeks)	Git push + ArgoCD sync	Low - monitoring-only
Infrastructure Apps	ArgoCD (dp-infra/*)	Regular (days/weeks)	Git push + ArgoCD sync	Medium - specific apps
Application Services	ArgoCD (external repos)	Frequent (daily)	Git push + ArgoCD sync	Low - app-specific

Risk Management:

High risk (infrastructure): Plan carefully, test in dev cluster, have rollback strategy
Medium risk (platform services): Canary deployments, monitor closely
Low risk (applications): Continuous deployment, quick rollback if issues

Benefits of Layering¶

Separation of Concerns¶

Infrastructure Team:

Manages platform components (storage, networking, GitOps)
Updates infrequently with high planning
Focused on cluster stability

Application Teams:

Deploy applications via ArgoCD
Update frequently via git push
Focused on feature delivery

Risk Isolation¶

Infrastructure changes don’t mix with application changes
Failed application deployment doesn’t affect platform
Platform stability enables rapid application iteration

Clear Dependencies¶

Infrastructure provides: Storage, networking, GitOps engine
Applications consume: Storage, networking, GitOps automation
No circular dependencies

Update Independence¶

Platform updates: Planned, controlled, infrequent
Application updates: Automated, frequent, low-ceremony
Teams don’t block each other

Anti-Patterns to Avoid¶

❌ Deploying Storage Operators via ArgoCD:

Problem: ArgoCD needs storage to work (PVCs for Redis, etc.)
Chicken-and-egg: Can’t deploy storage provider using storage
Solution: Bootstrap storage via OpenTofu

❌ Deploying ArgoCD via ArgoCD:

Problem: ArgoCD can’t manage itself during initial bootstrap
Bootstrap paradox: ArgoCD doesn’t exist to deploy itself
Solution: Bootstrap ArgoCD via OpenTofu, THEN let it manage apps

❌ Manual kubectl apply for Applications:

Problem: Bypasses GitOps, creates drift between git and cluster
Loses audit trail and versioning
Solution: Always deploy applications via ArgoCD (git → cluster)

❌ Mixing Infrastructure and Application Changes:

Problem: Complex rollback if something fails
Unclear which change caused issues
Solution: Separate commits, separate deployments

Troubleshooting¶

Infrastructure Change Broke Applications¶

Symptom: After tofu apply, applications fail or become degraded

Diagnosis:

Check what changed:
```
cd kube-hetzner
git diff HEAD~1 HEAD
```

Check affected components:

kubectl get pods -A | grep -v Running
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

Resolution:

Rollback OpenTofu: git revert HEAD && tofu apply
Or fix forward: Address root cause and re-apply

ArgoCD Can’t Sync Applications¶

Symptom: ArgoCD Applications stuck in “OutOfSync” or “Degraded”

Diagnosis:

Check ArgoCD Application status:

kubectl get applications -n argocd
kubectl describe application <app-name> -n argocd

Check sync errors:

kubectl get application <app-name> -n argocd -o jsonpath='{.status.conditions}'

Common Causes:

Infrastructure tier missing (e.g., ArgoCD not running)
Storage class unavailable (Longhorn not ready)
Network policy blocking (Cilium misconfigured)

Resolution: Fix infrastructure tier first, THEN re-sync applications.

New Application Won’t Deploy¶

Symptom: New ArgoCD Application created, but nothing happens

Diagnosis:

Check Application created:

kubectl get application <app-name> -n argocd

Check Application definition:

kubectl get application <app-name> -n argocd -o yaml

Common Causes:

Application definition not applied (kubectl apply -f dist/app.yaml)
Repository credentials missing (private repo)
Invalid path in Application spec

Resolution: Verify Application manifest, apply with kubectl, check ArgoCD logs.

Infrastructure Layering¶

Why Layered Architecture?¶

Architecture Layers¶

Infrastructure Tier (Bootstrap)¶

Purpose¶

Management¶

Components¶

Why These Components?¶

Update Process¶

Application Tier (Platform & Apps)¶

Purpose¶

Management¶

Components¶

Why Application Tier?¶

Update Process¶

Source Repositories¶

Infrastructure Tier Sources¶

Application Tier Sources¶

Update Strategy Summary¶

Benefits of Layering¶

Separation of Concerns¶

Risk Isolation¶

Clear Dependencies¶

Update Independence¶

Anti-Patterns to Avoid¶

Troubleshooting¶

Infrastructure Change Broke Applications¶

ArgoCD Can’t Sync Applications¶

New Application Won’t Deploy¶

Further Reading¶