Explanation
Security Model¶
Understanding the security architecture and design decisions for the KUP6S cluster on Hetzner Cloud.
Overview¶
The KUP6S cluster implements a layered security model that balances strong protection with operational simplicity. Security is achieved through:
Network-level controls: Firewall restrictions on critical services
Cryptographic authentication: SSH key-based access only
Credential isolation: Environment variables and secret management
Network encryption: Wireguard mesh networking via Cilium
Minimal attack surface: Restricted Kubernetes API access
Firewall Configuration¶
Kubernetes API Access¶
Restriction Level: Highest
Source allowlist:
95.217.145.243/32(jumphost IP only)Protocol: TCP port 6443 (via control plane load balancer)
Access method: SOCKS5 proxy tunnel through jumphost
Rationale: The Kubernetes API has the broadest attack surface in the cluster. Restricting access to a single trusted IP significantly reduces risk. Even if credentials are leaked, attackers cannot access the API from other locations.
Implementation: See How-To: Access kubectl for kubectl access configuration.
SSH Access¶
Restriction Level: Open with strong authentication
Source allowlist:
0.0.0.0/0,::/0(internet-wide)Protocol: TCP port 22
Authentication: Key-based only (password authentication disabled)
Rationale: SSH with 4096-bit RSA or Ed25519 key-based authentication is cryptographically secure. Industry best practice is to secure SSH through strong authentication rather than IP restrictions. Benefits include:
Simpler operational model
No NAT gateway complexity
Direct access for troubleshooting
Easier OpenTofu provisioning
Standard practice for secure server management
Ingress Traffic¶
All HTTP/HTTPS traffic: Routed through Hetzner Cloud Load Balancer (LB11)
Load balancer type: lb11 in nbg1 location
Target: Traefik ingress controller on agent nodes
TLS termination: Handled by Traefik with Let’s Encrypt certificates
Benefits: DDoS protection, automatic failover, health checks
IPv4 Addressing Decision¶
Current Configuration¶
All cluster nodes (3 control plane + 6 agents) retain public IPv4 addresses.
Cost:
Monthly: €4.50 (9 nodes × €0.50/month)
Percentage of total: 4.8% of €93.73 cluster cost
Annual: €54
Investigation Summary¶
In October 2025, we investigated removing public IPv4 addresses from agent nodes to reduce costs, given that:
All ingress traffic flows through the load balancer
Hetzner charges €0.50/month per IPv4 address
Potential savings of €3-4.50/month
Alternatives Analyzed¶
Option 1: Remove IPv4 from agents only¶
Savings: €3/month (€36/year) - only agents lose public IP
Requirements:
Configure external jumphost as NAT gateway
Add
disable_ipv4 = trueto agent nodepool configsConfigure Hetzner private network routes
SSH proxy configuration for OpenTofu provisioning
Rejected because:
Medium-high operational complexity
Jumphost becomes critical path for agent provisioning
Manual NAT gateway configuration and maintenance
Marginal financial benefit (€36/year)
Option 2: Use kube-hetzner nat_router (all private)¶
Costs:
NAT router server: CAX21 at €6.49/month
NAT router IPv4: €0.50/month
Total: €6.99/month
Savings from removed IPs: €4.50/month
Net change: +€2.49/month INCREASE (€30/year more expensive)
Rejected because:
Costs MORE than current setup
Creates single point of failure for all egress traffic
No high availability for outbound connectivity
Defeats the purpose of cost savings
Option 3: Keep current setup (SELECTED)¶
Cost: €4.50/month Complexity: Low Reliability: High Operational overhead: Minimal
Decision Rationale¶
Public IPv4 addresses are retained for the following reasons:
1. Cost is Negligible¶
€4.50/month represents just 4.8% of the total cluster cost. In the context of a €93.73/month infrastructure, optimizing away €4.50 has minimal financial impact while introducing operational risks.
Cost perspective:
€54/year = cost of 2-3 coffee shop visits per month
Time spent managing NAT gateway > value of savings
Engineering time is more valuable than €4.50/month
2. Operational Simplicity¶
Current model (simple):
SSH directly to any node
No NAT gateway to manage
Standard network topology
Easy troubleshooting
Alternative model (complex):
SSH requires proxy configuration
NAT gateway becomes critical infrastructure
Additional monitoring and maintenance
More failure points
3. Egress Traffic Requirements¶
Agent nodes require constant internet access for:
Container operations:
Image pulls from Docker Hub, GitHub Container Registry
Base image updates
Multi-architecture manifests
System maintenance:
OS package updates via apt/dnf
K3S version upgrades
Security patches
Storage and logging:
Loki log shipping to S3 (Hetzner Object Storage - public endpoint)
Longhorn backups to Hetzner Storage Box (CIFS)
Crossplane S3 bucket management
Application traffic:
External API calls from workloads
Webhook callbacks
Third-party service integrations
Without public IPv4, ALL this traffic must route through NAT gateway, creating a single point of failure.
4. No Single Point of Failure¶
Current architecture:
Each node has independent internet access
Load balancer handles ingress (HA by design)
Agents can pull images and push logs independently
Node failures don’t affect other nodes’ connectivity
NAT gateway architecture:
All egress traffic through one gateway
Gateway failure blocks all agent outbound traffic
Image pulls fail cluster-wide
Logs stop shipping
Updates can’t be applied
5. Security Through Simplicity¶
Current security model:
SSH: Strong cryptographic authentication (industry standard)
Kubernetes API: IP-restricted to jumphost (critical boundary)
Clear security boundaries
Less configuration = less room for error
Complex model risks:
NAT gateway becomes attack target
More iptables rules to audit
Route persistence challenges
Increased misconfiguration surface
Conclusion¶
The decision: Keep public IPv4 addresses on all cluster nodes.
The verdict: €36-54/year in potential savings do not justify:
Increased operational complexity
Single point of failure introduction
Reduced reliability
Time investment in NAT gateway management
This is a classic example of premature optimization. The current architecture is simple, reliable, and maintainable. The cost is negligible in context.
Credential Management¶
Environment Variables¶
All sensitive credentials are managed via environment variables:
# Hetzner Cloud
TF_VAR_hcloud_token
# S3 Object Storage (shared)
TF_VAR_hetzner_s3_access_key
TF_VAR_hetzner_s3_secret_key
# Storage Box
TF_VAR_longhorn_cifs_username
TF_VAR_longhorn_cifs_password
TF_VAR_storagebox_csi_username
TF_VAR_storagebox_csi_password
# SMTP
TF_VAR_smtp_username
TF_VAR_smtp_password
Benefits:
No hardcoded secrets in version control
Easy rotation (update .env file and apply)
Separation between code and credentials
Standard practice for infrastructure as code
Application Secrets Management¶
For application-level secrets (database passwords, API keys, etc.), the cluster implements a dedicated namespace pattern with External Secrets Operator (ESO):
Architecture:
Source namespace:
application-secrets- holds bootstrap secretsTarget namespaces: Application namespaces (e.g.,
gitlabbda)Replication: ESO automatically syncs via ClusterSecretStore
Isolation: Namespace restrictions prevent lateral access
Security properties:
✅ No secrets in Git: Only ExternalSecret declarations (config) committed
✅ Namespace isolation: ClusterSecretStore restricted via
conditions.namespaces✅ Least privilege RBAC: ServiceAccounts have read-only access
✅ Centralized rotation: Update source secret, ESO propagates everywhere
✅ Audit trail: Kubernetes events track secret access
Example: GitLab BDA manages 15+ secrets using this pattern, with ClusterSecretStore restricted to only the gitlabbda namespace. Even though the ClusterSecretStore is cluster-scoped, namespace conditions enforce strict isolation.
See Application Secrets Architecture for complete design rationale and implementation guide.
Network Encryption¶
Cilium Wireguard Mode¶
Enabled: Transparent encryption of all pod-to-pod traffic across nodes
Implementation:
Automatic key rotation
Kernel-level encryption (fast)
Zero application changes required
Protection against node-level network sniffing
Configuration in kube.tf:
cilium_values = <<EOT
encryption:
enabled: true
type: wireguard
EOT
Benefits:
Protects data in transit between nodes
Encrypts across the Hetzner private network
Minimal performance overhead (Wireguard is fast)
Meets compliance requirements for data in transit
Secret Management in Kubernetes¶
Current approach: Kubernetes Secrets (base64-encoded)
Future considerations:
External Secrets Operator integration
HashiCorp Vault for dynamic secrets
Sealed Secrets for Git-stored encrypted secrets
Best practices followed:
RBAC restrictions on secret access
Namespace isolation
Minimal secret replication
Regular secret rotation reminders
Summary¶
The KUP6S security model prioritizes:
Strong authentication over IP restrictions for SSH
Critical API protection through jumphost-only access
Operational simplicity over theoretical security gains
Reliability through redundancy rather than single points of failure
Pragmatic cost optimization that preserves system reliability
Security is achieved through proven, industry-standard practices rather than complex custom configurations.