Reference

Cluster Capabilities

Target Audience: Developers writing CDK8S charts and ArgoCD application deployments

This document describes the capabilities, services, and features available in the kup6s.com Kubernetes cluster for application developers. If you’re deploying applications via ArgoCD or writing CDK8S charts, this is your reference guide.


Cluster Overview

  • Platform: K3S on Hetzner Cloud

  • Architecture: Multi-architecture (ARM64 primary, AMD64 available)

  • High Availability: 3 control plane nodes across 3 data centers

  • Deployment Method: GitOps via ArgoCD

  • Kubernetes Version: v1.31.x (automatically managed)


Compute Resources

Node Pools

Control Plane Nodes (3 nodes - not for workloads):

  • 3x ARM64 nodes (CAX21: 4 vCPU, 8GB RAM)

  • Locations: fsn1, nbg1, hel1

  • Taints: Workloads not scheduled here by default

Worker Nodes (6 nodes - for your applications):

ARM64 Workers (Primary - 4 nodes):

  • 1x ARM64 large (CAX31: 8 vCPU, 16GB RAM, 160GB SSD)

  • 2x ARM64 medium (CAX21: 4 vCPU, 8GB RAM, 80GB SSD)

  • 1x ARM64 database (CAX21: 4 vCPU, 8GB RAM) - Dedicated for PostgreSQL

  • Location: hel1

  • Default scheduling target - workloads schedule here unless specified otherwise

AMD64 Workers (Legacy - 2 nodes):

  • 1x AMD64 medium (CPX31: 4 vCPU, 8GB RAM, 160GB SSD)

  • 1x AMD64 small (CPX21: 3 vCPU, 4GB RAM, 80GB SSD)

  • Location: hel1

  • Tainted - requires explicit nodeSelector to use (see examples below)

Architecture Support

The cluster supports both ARM64 and AMD64 architectures:

  • linux/arm64 (primary, recommended - better performance and cost)

  • linux/amd64 (available for legacy workloads - requires nodeSelector)

Scheduling Behavior:

  • ARM64 nodes: Workloads schedule here by default (untainted)

  • AMD64 nodes: Tainted with kubernetes.io/arch=amd64:NoSchedule - requires explicit targeting

Best Practice:

  • Prefer ARM64: Use multi-arch or ARM64 images when possible (cheaper nodes, better performance)

  • AMD64 fallback: Use for legacy applications that don’t have ARM64 builds yet

  • Multi-platform builds: Build images for both architectures:

    docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push .
    
  • Test both: If using multi-arch images, verify on both architectures


Storage Options

1. Longhorn (Default Persistent Storage)

Use for: Stateful applications, databases, persistent volumes

  • StorageClass: longhorn (default)

  • Access Modes: ReadWriteOnce (RWO), ReadWriteMany (RWX), ReadOnlyMany (ROX)

  • File System: XFS

  • Replication: 3 replicas across nodes (configurable)

  • Backup: Automatic backup to Hetzner Storage Box (CIFS)

  • Snapshots: Supported

  • Capacity: Depends on node local storage (80-160GB per node)

Example PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-app-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi

Special Use Cases:

  • For Kafka workloads: Use longhorn-kafka StorageClass (dedicated for high-throughput workloads)

2. SMB/CIFS Storage (Hetzner Storage Box)

Use for: Shared file storage, backups, multi-pod read/write

  • StorageClass: hetzner-smb

  • Access Modes: ReadWriteMany (RWX)

  • Capacity: Large (Hetzner Storage Box)

  • Performance: Network-based (slower than Longhorn)

Example PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-uploads
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: hetzner-smb
  resources:
    requests:
      storage: 100Gi

3. S3 Object Storage (via Crossplane)

Use for: Object storage, backups, log storage, static assets

  • Provider: Hetzner Object Storage (S3-compatible)

  • Management: Crossplane-managed buckets

  • Access: Via S3 API (AWS SDK compatible)

How to Request a Bucket: Create a Crossplane Bucket resource (see How-To: Create S3 Bucket)


Networking & Ingress

Ingress Controller: Traefik

Default ingress controller for HTTP/HTTPS traffic

  • Version: v3.4.1 (pinned)

  • Features:

    • Automatic HTTPS via Let’s Encrypt (cert-manager)

    • HTTP to HTTPS redirect (enabled by default)

    • Access logs enabled

    • Proxy protocol support

Creating an Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: traefik
  tls:
    - hosts:
        - myapp.sites.kup6s.com
      secretName: myapp-tls
  rules:
    - host: myapp.sites.kup6s.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app
                port:
                  number: 80

TLS/SSL Certificates (cert-manager)

Automatic certificate management via Let’s Encrypt

  • Cluster Issuer: letsencrypt-prod

  • DNS Challenge: Not configured (use HTTP-01 challenge)

  • Renewal: Automatic (30 days before expiry)

Usage: Add annotation to Ingress (see example above)

Domain Structure

Available domain patterns:

  • *.sites.kup6s.com - Customer/project websites

  • *.ops.kup6s.net - Infrastructure tools (ArgoCD, Grafana, etc.)

  • *.nodes.kup6s.com - Node-level DNS (internal only)

Network Policy & Observability

  • CNI: Cilium (eBPF-based) with native routing mode

  • Pod-to-Pod Traffic: High-performance eBPF networking

  • Network Policies: Supported (standard Kubernetes NetworkPolicy + Cilium NetworkPolicy for L7)

  • Hubble Observability: ✅ Enabled

    • Service dependency mapping (automatic service maps)

    • Flow visibility (L3/L4/L7 traffic inspection)

    • Network troubleshooting (DNS, HTTP, TCP flows)

    • Hubble UI available for graphical network visualization

    • Metrics exported to Prometheus


Databases

CloudNativePG (PostgreSQL Operator)

Managed PostgreSQL databases via Kubernetes operator

  • Operator: CloudNativePG (CNPG) v1.27.0

  • Backup Plugin: Barman Cloud Plugin v0.7.0 (installed)

  • High Availability: Supported (with replication)

  • Backups: Integrated with S3/Longhorn via Barman Cloud Plugin

  • Monitoring: Prometheus metrics

Creating a PostgreSQL Cluster:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: myapp-db
spec:
  instances: 3
  storage:
    storageClass: longhorn
    size: 20Gi
  postgresql:
    parameters:
      max_connections: "100"

Connection: Use generated secrets for connection strings

Note: For backup configuration using the Barman Cloud Plugin, create an ObjectStore resource and reference it in your cluster’s plugins section. The plugin is deployed via 60-B-barman-plugin.yaml.tpl. See CloudNativePG documentation for details.


Monitoring & Observability

Prometheus + Grafana (kube-prometheus-stack)

Full observability stack pre-installed

Access:

  • Grafana: https://grafana.ops.kup6s.net

  • Prometheus: Internal cluster access only

Metrics Collection:

  • All cluster components monitored by default

  • Your apps: Add Prometheus annotations to expose metrics

ServiceMonitor Example:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: metrics
      interval: 30s

Loki (Log Aggregation)

Centralized logging with S3 backend

  • Storage: Hetzner S3 Object Storage (Crossplane-managed)

  • Access: Via Grafana (Explore → Loki)

  • Retention: Configurable (check with cluster admin)

Log Collection:

  • Container logs automatically collected

  • Query via LogQL in Grafana

Example Query:

{namespace="my-namespace", pod=~"my-app-.*"}

Security Features

Secrets Encryption at Rest

  • ✅ Kubernetes secrets encrypted in etcd (AES-CBC)

  • ✅ Automatic encryption for all Secret resources

  • No action required from developers

Pod Security

  • Pod Security Standards: Baseline enforced

  • Security Contexts: Supported and recommended

Example:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true

Network Encryption

  • ✅ Pod-to-pod traffic secured (Cilium eBPF with native routing)

  • ✅ Ingress traffic encrypted (TLS via cert-manager)

  • ✅ Secrets encrypted at rest (etcd encryption enabled)


GitOps Deployment (ArgoCD)

ArgoCD Access

Dashboard: https://argocd.ops.kup6s.net

Deployment Workflow

  1. Write CDK8S Chart in argoapps/ directory

  2. Register in registry (apps/registry.ts)

  3. Generate manifests: npm run build

  4. Apply ArgoCD Application: kubectl apply -f dist/CHARTNAME.yaml

  5. ArgoCD syncs your application automatically

ArgoCD Application Structure

Example CDK8S Chart:

import { Chart } from 'cdk8s';
import { ArgoCdApplication } from '@opencdk8s/cdk8s-argocd-resources';

export class MyAppChart extends Chart {
  constructor(scope: Construct, id: string) {
    super(scope, id);

    new ArgoCdApplication(this, 'myapp', {
      metadata: {
        name: 'myapp',
        namespace: 'argocd',
      },
      spec: {
        project: 'default',
        source: {
          repoUrl: 'https://github.com/your-org/your-repo',
          path: 'k8s/myapp',
          targetRevision: 'main',
        },
        destination: {
          server: 'https://kubernetes.default.svc',
          namespace: 'myapp',
        },
        syncPolicy: {
          automated: {
            prune: true,
            selfHeal: true,
          },
        },
      },
    });
  }
}

Resource Quotas & Limits

No Hard Quotas (Currently)

  • No namespace-level resource quotas configured

  • Best Practice: Always set resource requests/limits in your pods

Recommended:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

Node Capacity (Total)

Worker Node Resources:

  • ARM64 Workers: 20 vCPU, 40GB RAM (primary capacity)

  • AMD64 Workers: 7 vCPU, 12GB RAM (legacy/transition)

  • Database Node: 4 vCPU, 8GB RAM (dedicated PostgreSQL)

  • Total: 31 vCPU, 60GB RAM

  • Storage: ~560GB local (Longhorn pool across all workers)

Recommended Allocation:

  • 75% of workloads → ARM64 (cheaper, better performance)

  • 25% of workloads → AMD64 (legacy apps during migration)

  • Databases → Dedicated node (isolated from web app contention)

Plan accordingly for your application’s resource needs.


Service Mesh & Advanced Networking

Cilium Advanced Features

The cluster uses Cilium CNI which provides service mesh-like capabilities without a separate service mesh:

  • L7 Network Policies: HTTP/gRPC/Kafka protocol-aware policies

  • Service Mesh Lite: Cilium provides observability and L7 policies without sidecar proxies

  • Hubble Observability: Service dependency maps, flow visualization, network troubleshooting

  • High Performance: eBPF-based networking bypasses iptables for better performance

Accessing Hubble UI:

# Via Cilium CLI (recommended)
cilium hubble ui

# Or via kubectl port-forward
kubectl port-forward -n kube-system service/hubble-ui 12000:80
# Then open http://localhost:12000

Not Available

  • ❌ Full service mesh (Istio/Linkerd) with sidecar proxies

  • ❌ Advanced traffic splitting/canary deployments (use ArgoCD Rollouts instead)

Use Traefik features for:

  • Load balancing

  • Path-based routing

  • Header-based routing

  • Rate limiting (via middleware)


Backup & Disaster Recovery

Automatic Backups

etcd: Daily S3 backups (cluster state) Longhorn: Recurring backups to Storage Box PostgreSQL: Configure per-database (CNPG backup)

Application Backups

Your responsibility:

  • Application data backup strategy

  • Database backup verification

  • Backup testing


Limitations & Considerations

Architecture Constraints

  • ARM64 primary: Most workloads run on ARM64 (cheaper, better performance)

  • AMD64 available: Legacy workloads can run on AMD64 nodes (with nodeSelector)

  • ⚠️ AMD64 nodes are tainted: Workloads won’t schedule there by default - must explicitly target

  • ⚠️ Mixed-arch complexity: Need to manage which workloads run on which architecture

  • 💡 Migration path: Start on AMD64, gradually move to ARM64 for cost optimization

Storage Performance

  • Longhorn: Good for general workloads

  • Longhorn-Kafka: Optimized for high-throughput

  • SMB/CIFS: Slower, best for shared/backup use

Scaling

  • Node scaling: Contact cluster admin

  • HPA (Horizontal Pod Autoscaler): Supported

  • VPA (Vertical Pod Autoscaler): Not configured

External Services

  • External databases: Not directly supported (use port-forward or VPN)

  • Outbound traffic: Unrestricted (no egress filtering)


Quick Reference: Common Tasks

Deploy an Application

Default (ARM64):

  1. Create namespace (if needed)

  2. Create ArgoCD Application (CDK8S or YAML)

  3. Apply: kubectl apply -f dist/myapp.yaml

  4. Monitor in ArgoCD dashboard

  5. Workload automatically schedules to ARM64 nodes (no special config needed)

AMD64-only Application (legacy apps during migration):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: legacy-app
spec:
  template:
    spec:
      nodeSelector:
        kubernetes.io/arch: amd64
      tolerations:
        - key: kubernetes.io/arch
          operator: Equal
          value: amd64
          effect: NoSchedule
      containers:
        - name: app
          image: myorg/legacy-app:amd64

Request a PersistentVolume

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-data
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi

Expose an Application (Ingress)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: traefik
  tls:
    - hosts: [myapp.sites.kup6s.com]
      secretName: myapp-tls
  rules:
    - host: myapp.sites.kup6s.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app
                port:
                  number: 80

Create a PostgreSQL Database

Recommended: Use dedicated database node for isolation:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: myapp-db
  namespace: databases
spec:
  instances: 3  # HA with replication

  # Schedule to dedicated database node (ARM64)
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: workload
                operator: In
                values:
                  - database

  # Tolerate database node taint
  tolerations:
    - key: workload
      operator: Equal
      value: database
      effect: NoSchedule

  storage:
    size: 10Gi
    storageClass: longhorn

  postgresql:
    parameters:
      max_connections: "100"
      shared_buffers: "256MB"

Simple (shared worker node):

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: simple-db
spec:
  instances: 1
  storage:
    size: 10Gi
    storageClass: longhorn

View Logs (Loki)

  1. Open Grafana: https://grafana.ops.kup6s.net

  2. Go to Explore

  3. Select Loki data source

  4. Query: {namespace="your-namespace"}

Monitor Application Metrics

  1. Add Prometheus annotations to Service

  2. Create ServiceMonitor (optional)

  3. View in Grafana dashboards


Getting Help

Cluster Administration Issues

  • Contact: Cluster admin team

  • Topics: Node issues, cluster upgrades, infrastructure

Application Deployment Issues

  • ArgoCD dashboard for sync status

  • Logs via kubectl logs or Grafana/Loki

  • Metrics via Grafana

CDK8S Development