Explanation

Security Model¶

Type: Explanation (Understanding-oriented)

Related Concepts: Application Secrets Architecture | Access kubectl

Understanding the security architecture and design decisions for the KUP6S cluster on Hetzner Cloud.

Overview¶

The KUP6S cluster implements a layered security model that balances strong protection with operational simplicity. Security is achieved through:

Network-level controls: Firewall restrictions on critical services
Cryptographic authentication: SSH key-based access only
Credential isolation: Environment variables and secret management
Network encryption: Wireguard mesh networking via Cilium
Minimal attack surface: Restricted Kubernetes API access

Firewall Configuration¶

Kubernetes API Access¶

Restriction Level: Highest

Source allowlist: 95.217.145.243/32 (jumphost IP only)
Protocol: TCP port 6443 (via control plane load balancer)
Access method: SOCKS5 proxy tunnel through jumphost

Rationale: The Kubernetes API has the broadest attack surface in the cluster. Restricting access to a single trusted IP significantly reduces risk. Even if credentials are leaked, attackers cannot access the API from other locations.

Implementation: See How-To: Access kubectl for kubectl access configuration.

SSH Access¶

Restriction Level: Open with strong authentication

Source allowlist: 0.0.0.0/0, ::/0 (internet-wide)
Protocol: TCP port 22
Authentication: Key-based only (password authentication disabled)

Rationale: SSH with 4096-bit RSA or Ed25519 key-based authentication is cryptographically secure. Industry best practice is to secure SSH through strong authentication rather than IP restrictions. Benefits include:

Simpler operational model
No NAT gateway complexity
Direct access for troubleshooting
Easier OpenTofu provisioning
Standard practice for secure server management

Ingress Traffic¶

All HTTP/HTTPS traffic: Routed through Hetzner Cloud Load Balancer (LB11)

Load balancer type: lb11 in nbg1 location
Target: Traefik ingress controller on agent nodes
TLS termination: Handled by Traefik with Let’s Encrypt certificates
Benefits: DDoS protection, automatic failover, health checks

IPv4 Addressing Decision¶

Current Configuration¶

All cluster nodes (3 control plane + 6 agents) retain public IPv4 addresses.

Cost:

Monthly: €4.50 (9 nodes × €0.50/month)
Percentage of total: 4.8% of €93.73 cluster cost
Annual: €54

Investigation Summary¶

In October 2025, we investigated removing public IPv4 addresses from agent nodes to reduce costs, given that:

All ingress traffic flows through the load balancer
Hetzner charges €0.50/month per IPv4 address
Potential savings of €3-4.50/month

Alternatives Analyzed¶

Option 1: Remove IPv4 from agents only¶

Savings: €3/month (€36/year) - only agents lose public IP

Requirements:

Configure external jumphost as NAT gateway
Add disable_ipv4 = true to agent nodepool configs
Configure Hetzner private network routes
SSH proxy configuration for OpenTofu provisioning

Rejected because:

Medium-high operational complexity
Jumphost becomes critical path for agent provisioning
Manual NAT gateway configuration and maintenance
Marginal financial benefit (€36/year)

Option 2: Use kube-hetzner nat_router (all private)¶

Costs:

NAT router server: CAX21 at €6.49/month
NAT router IPv4: €0.50/month
Total: €6.99/month

Savings from removed IPs: €4.50/month

Net change: +€2.49/month INCREASE (€30/year more expensive)

Rejected because:

Costs MORE than current setup
Creates single point of failure for all egress traffic
No high availability for outbound connectivity
Defeats the purpose of cost savings

Option 3: Keep current setup (SELECTED)¶

Cost: €4.50/month Complexity: Low Reliability: High Operational overhead: Minimal

Decision Rationale¶

Public IPv4 addresses are retained for the following reasons:

1. Cost is Negligible¶

€4.50/month represents just 4.8% of the total cluster cost. In the context of a €93.73/month infrastructure, optimizing away €4.50 has minimal financial impact while introducing operational risks.

Cost perspective:

€54/year = cost of 2-3 coffee shop visits per month
Time spent managing NAT gateway > value of savings
Engineering time is more valuable than €4.50/month

2. Operational Simplicity¶

Current model (simple):

SSH directly to any node
No NAT gateway to manage
Standard network topology
Easy troubleshooting

Alternative model (complex):

SSH requires proxy configuration
NAT gateway becomes critical infrastructure
Additional monitoring and maintenance
More failure points

3. Egress Traffic Requirements¶

Agent nodes require constant internet access for:

Container operations:

Image pulls from Docker Hub, GitHub Container Registry
Base image updates
Multi-architecture manifests

System maintenance:

OS package updates via apt/dnf
K3S version upgrades
Security patches

Storage and logging:

Loki log shipping to S3 (Hetzner Object Storage - public endpoint)
Longhorn backups to Hetzner Storage Box (CIFS)
Crossplane S3 bucket management

Application traffic:

External API calls from workloads
Webhook callbacks
Third-party service integrations

Without public IPv4, ALL this traffic must route through NAT gateway, creating a single point of failure.

4. No Single Point of Failure¶

Current architecture:

Each node has independent internet access
Load balancer handles ingress (HA by design)
Agents can pull images and push logs independently
Node failures don’t affect other nodes’ connectivity

NAT gateway architecture:

All egress traffic through one gateway
Gateway failure blocks all agent outbound traffic
Image pulls fail cluster-wide
Logs stop shipping
Updates can’t be applied

5. Security Through Simplicity¶

Current security model:

SSH: Strong cryptographic authentication (industry standard)
Kubernetes API: IP-restricted to jumphost (critical boundary)
Clear security boundaries
Less configuration = less room for error

Complex model risks:

NAT gateway becomes attack target
More iptables rules to audit
Route persistence challenges
Increased misconfiguration surface

Conclusion¶

The decision: Keep public IPv4 addresses on all cluster nodes.

The verdict: €36-54/year in potential savings do not justify:

Increased operational complexity
Single point of failure introduction
Reduced reliability
Time investment in NAT gateway management

This is a classic example of premature optimization. The current architecture is simple, reliable, and maintainable. The cost is negligible in context.

Credential Management¶

Environment Variables¶

All sensitive credentials are managed via environment variables:

# Hetzner Cloud
TF_VAR_hcloud_token

# S3 Object Storage (shared)
TF_VAR_hetzner_s3_access_key
TF_VAR_hetzner_s3_secret_key

# Storage Box
TF_VAR_longhorn_cifs_username
TF_VAR_longhorn_cifs_password
TF_VAR_storagebox_csi_username
TF_VAR_storagebox_csi_password

# SMTP
TF_VAR_smtp_username
TF_VAR_smtp_password

Benefits:

No hardcoded secrets in version control
Easy rotation (update .env file and apply)
Separation between code and credentials
Standard practice for infrastructure as code

Shared S3 Credentials¶

Hetzner’s S3 Object Storage uses a project-level access model - any credential can access all buckets within a project. Rather than creating separate credentials that provide identical access, we use:

Single credential pair for:

etcd backup buckets
Loki log storage
Crossplane bucket management

This aligns with Hetzner’s actual security model rather than creating false separation.

Application Secrets Management¶

For application-level secrets (database passwords, API keys, etc.), the cluster implements a dedicated namespace pattern with External Secrets Operator (ESO):

Architecture:

Source namespace: application-secrets - holds bootstrap secrets
Target namespaces: Application namespaces (e.g., gitlabbda)
Replication: ESO automatically syncs via ClusterSecretStore
Isolation: Namespace restrictions prevent lateral access

Security properties:

✅ No secrets in Git: Only ExternalSecret declarations (config) committed
✅ Namespace isolation: ClusterSecretStore restricted via conditions.namespaces
✅ Least privilege RBAC: ServiceAccounts have read-only access
✅ Centralized rotation: Update source secret, ESO propagates everywhere
✅ Audit trail: Kubernetes events track secret access

Example: GitLab BDA manages 15+ secrets using this pattern, with ClusterSecretStore restricted to only the gitlabbda namespace. Even though the ClusterSecretStore is cluster-scoped, namespace conditions enforce strict isolation.

See Application Secrets Architecture for complete design rationale and implementation guide.

Network Encryption¶

Cilium Wireguard Mode¶

Enabled: Transparent encryption of all pod-to-pod traffic across nodes

Implementation:

Automatic key rotation
Kernel-level encryption (fast)
Zero application changes required
Protection against node-level network sniffing

Configuration in kube.tf:

cilium_values = <<EOT
encryption:
  enabled: true
  type: wireguard
EOT

Benefits:

Protects data in transit between nodes
Encrypts across the Hetzner private network
Minimal performance overhead (Wireguard is fast)
Meets compliance requirements for data in transit

Secret Management in Kubernetes¶

Current approach: Kubernetes Secrets (base64-encoded)

Future considerations:

External Secrets Operator integration
HashiCorp Vault for dynamic secrets
Sealed Secrets for Git-stored encrypted secrets

Best practices followed:

RBAC restrictions on secret access
Namespace isolation
Minimal secret replication
Regular secret rotation reminders

Summary¶

The KUP6S security model prioritizes:

Strong authentication over IP restrictions for SSH
Critical API protection through jumphost-only access
Operational simplicity over theoretical security gains
Reliability through redundancy rather than single points of failure
Pragmatic cost optimization that preserves system reliability

Security is achieved through proven, industry-standard practices rather than complex custom configurations.