Explanation

Security Model

Understanding the security architecture and design decisions for the KUP6S cluster on Hetzner Cloud.

Overview

The KUP6S cluster implements a layered security model that balances strong protection with operational simplicity. Security is achieved through:

  • Network-level controls: Firewall restrictions on critical services

  • Cryptographic authentication: SSH key-based access only

  • Credential isolation: Environment variables and secret management

  • Network encryption: Wireguard mesh networking via Cilium

  • Minimal attack surface: Restricted Kubernetes API access

Firewall Configuration

Kubernetes API Access

Restriction Level: Highest

  • Source allowlist: 95.217.145.243/32 (jumphost IP only)

  • Protocol: TCP port 6443 (via control plane load balancer)

  • Access method: SOCKS5 proxy tunnel through jumphost

Rationale: The Kubernetes API has the broadest attack surface in the cluster. Restricting access to a single trusted IP significantly reduces risk. Even if credentials are leaked, attackers cannot access the API from other locations.

Implementation: See How-To: Access kubectl for kubectl access configuration.

SSH Access

Restriction Level: Open with strong authentication

  • Source allowlist: 0.0.0.0/0, ::/0 (internet-wide)

  • Protocol: TCP port 22

  • Authentication: Key-based only (password authentication disabled)

Rationale: SSH with 4096-bit RSA or Ed25519 key-based authentication is cryptographically secure. Industry best practice is to secure SSH through strong authentication rather than IP restrictions. Benefits include:

  • Simpler operational model

  • No NAT gateway complexity

  • Direct access for troubleshooting

  • Easier OpenTofu provisioning

  • Standard practice for secure server management

Ingress Traffic

All HTTP/HTTPS traffic: Routed through Hetzner Cloud Load Balancer (LB11)

  • Load balancer type: lb11 in nbg1 location

  • Target: Traefik ingress controller on agent nodes

  • TLS termination: Handled by Traefik with Let’s Encrypt certificates

  • Benefits: DDoS protection, automatic failover, health checks

IPv4 Addressing Decision

Current Configuration

All cluster nodes (3 control plane + 6 agents) retain public IPv4 addresses.

Cost:

  • Monthly: €4.50 (9 nodes × €0.50/month)

  • Percentage of total: 4.8% of €93.73 cluster cost

  • Annual: €54

Investigation Summary

In October 2025, we investigated removing public IPv4 addresses from agent nodes to reduce costs, given that:

  1. All ingress traffic flows through the load balancer

  2. Hetzner charges €0.50/month per IPv4 address

  3. Potential savings of €3-4.50/month

Alternatives Analyzed

Option 1: Remove IPv4 from agents only

Savings: €3/month (€36/year) - only agents lose public IP

Requirements:

  • Configure external jumphost as NAT gateway

  • Add disable_ipv4 = true to agent nodepool configs

  • Configure Hetzner private network routes

  • SSH proxy configuration for OpenTofu provisioning

Rejected because:

  • Medium-high operational complexity

  • Jumphost becomes critical path for agent provisioning

  • Manual NAT gateway configuration and maintenance

  • Marginal financial benefit (€36/year)

Option 2: Use kube-hetzner nat_router (all private)

Costs:

  • NAT router server: CAX21 at €6.49/month

  • NAT router IPv4: €0.50/month

  • Total: €6.99/month

Savings from removed IPs: €4.50/month

Net change: +€2.49/month INCREASE (€30/year more expensive)

Rejected because:

  • Costs MORE than current setup

  • Creates single point of failure for all egress traffic

  • No high availability for outbound connectivity

  • Defeats the purpose of cost savings

Option 3: Keep current setup (SELECTED)

Cost: €4.50/month Complexity: Low Reliability: High Operational overhead: Minimal

Decision Rationale

Public IPv4 addresses are retained for the following reasons:

1. Cost is Negligible

€4.50/month represents just 4.8% of the total cluster cost. In the context of a €93.73/month infrastructure, optimizing away €4.50 has minimal financial impact while introducing operational risks.

Cost perspective:

  • €54/year = cost of 2-3 coffee shop visits per month

  • Time spent managing NAT gateway > value of savings

  • Engineering time is more valuable than €4.50/month

2. Operational Simplicity

Current model (simple):

  • SSH directly to any node

  • No NAT gateway to manage

  • Standard network topology

  • Easy troubleshooting

Alternative model (complex):

  • SSH requires proxy configuration

  • NAT gateway becomes critical infrastructure

  • Additional monitoring and maintenance

  • More failure points

3. Egress Traffic Requirements

Agent nodes require constant internet access for:

Container operations:

  • Image pulls from Docker Hub, GitHub Container Registry

  • Base image updates

  • Multi-architecture manifests

System maintenance:

  • OS package updates via apt/dnf

  • K3S version upgrades

  • Security patches

Storage and logging:

  • Loki log shipping to S3 (Hetzner Object Storage - public endpoint)

  • Longhorn backups to Hetzner Storage Box (CIFS)

  • Crossplane S3 bucket management

Application traffic:

  • External API calls from workloads

  • Webhook callbacks

  • Third-party service integrations

Without public IPv4, ALL this traffic must route through NAT gateway, creating a single point of failure.

4. No Single Point of Failure

Current architecture:

  • Each node has independent internet access

  • Load balancer handles ingress (HA by design)

  • Agents can pull images and push logs independently

  • Node failures don’t affect other nodes’ connectivity

NAT gateway architecture:

  • All egress traffic through one gateway

  • Gateway failure blocks all agent outbound traffic

  • Image pulls fail cluster-wide

  • Logs stop shipping

  • Updates can’t be applied

5. Security Through Simplicity

Current security model:

  • SSH: Strong cryptographic authentication (industry standard)

  • Kubernetes API: IP-restricted to jumphost (critical boundary)

  • Clear security boundaries

  • Less configuration = less room for error

Complex model risks:

  • NAT gateway becomes attack target

  • More iptables rules to audit

  • Route persistence challenges

  • Increased misconfiguration surface

Conclusion

The decision: Keep public IPv4 addresses on all cluster nodes.

The verdict: €36-54/year in potential savings do not justify:

  • Increased operational complexity

  • Single point of failure introduction

  • Reduced reliability

  • Time investment in NAT gateway management

This is a classic example of premature optimization. The current architecture is simple, reliable, and maintainable. The cost is negligible in context.

Credential Management

Environment Variables

All sensitive credentials are managed via environment variables:

# Hetzner Cloud
TF_VAR_hcloud_token

# S3 Object Storage (shared)
TF_VAR_hetzner_s3_access_key
TF_VAR_hetzner_s3_secret_key

# Storage Box
TF_VAR_longhorn_cifs_username
TF_VAR_longhorn_cifs_password
TF_VAR_storagebox_csi_username
TF_VAR_storagebox_csi_password

# SMTP
TF_VAR_smtp_username
TF_VAR_smtp_password

Benefits:

  • No hardcoded secrets in version control

  • Easy rotation (update .env file and apply)

  • Separation between code and credentials

  • Standard practice for infrastructure as code

Shared S3 Credentials

Hetzner’s S3 Object Storage uses a project-level access model - any credential can access all buckets within a project. Rather than creating separate credentials that provide identical access, we use:

Single credential pair for:

  • etcd backup buckets

  • Loki log storage

  • Crossplane bucket management

This aligns with Hetzner’s actual security model rather than creating false separation.

Application Secrets Management

For application-level secrets (database passwords, API keys, etc.), the cluster implements a dedicated namespace pattern with External Secrets Operator (ESO):

Architecture:

  • Source namespace: application-secrets - holds bootstrap secrets

  • Target namespaces: Application namespaces (e.g., gitlabbda)

  • Replication: ESO automatically syncs via ClusterSecretStore

  • Isolation: Namespace restrictions prevent lateral access

Security properties:

  • No secrets in Git: Only ExternalSecret declarations (config) committed

  • Namespace isolation: ClusterSecretStore restricted via conditions.namespaces

  • Least privilege RBAC: ServiceAccounts have read-only access

  • Centralized rotation: Update source secret, ESO propagates everywhere

  • Audit trail: Kubernetes events track secret access

Example: GitLab BDA manages 15+ secrets using this pattern, with ClusterSecretStore restricted to only the gitlabbda namespace. Even though the ClusterSecretStore is cluster-scoped, namespace conditions enforce strict isolation.

See Application Secrets Architecture for complete design rationale and implementation guide.

Network Encryption

Cilium Wireguard Mode

Enabled: Transparent encryption of all pod-to-pod traffic across nodes

Implementation:

  • Automatic key rotation

  • Kernel-level encryption (fast)

  • Zero application changes required

  • Protection against node-level network sniffing

Configuration in kube.tf:

cilium_values = <<EOT
encryption:
  enabled: true
  type: wireguard
EOT

Benefits:

  • Protects data in transit between nodes

  • Encrypts across the Hetzner private network

  • Minimal performance overhead (Wireguard is fast)

  • Meets compliance requirements for data in transit

Secret Management in Kubernetes

Current approach: Kubernetes Secrets (base64-encoded)

Future considerations:

  • External Secrets Operator integration

  • HashiCorp Vault for dynamic secrets

  • Sealed Secrets for Git-stored encrypted secrets

Best practices followed:

  • RBAC restrictions on secret access

  • Namespace isolation

  • Minimal secret replication

  • Regular secret rotation reminders

Summary

The KUP6S security model prioritizes:

  1. Strong authentication over IP restrictions for SSH

  2. Critical API protection through jumphost-only access

  3. Operational simplicity over theoretical security gains

  4. Reliability through redundancy rather than single points of failure

  5. Pragmatic cost optimization that preserves system reliability

Security is achieved through proven, industry-standard practices rather than complex custom configurations.