Reference

Cluster Capabilities¶

Type: Reference (Information-oriented)

Target Audience: Developers writing CDK8S charts and ArgoCD application deployments

This document describes the capabilities, services, and features available in the kup6s.com Kubernetes cluster for application developers. If you’re deploying applications via ArgoCD or writing CDK8S charts, this is your reference guide.

Cluster Overview¶

Platform: K3S on Hetzner Cloud
Architecture: Multi-architecture (ARM64 primary, AMD64 available)
High Availability: 3 control plane nodes across 3 data centers
Deployment Method: GitOps via ArgoCD
Kubernetes Version: v1.31.x (automatically managed)

Compute Resources¶

Node Pools¶

Control Plane Nodes (3 nodes - not for workloads):

3x ARM64 nodes (CAX21: 4 vCPU, 8GB RAM)
Locations: fsn1, nbg1, hel1
Taints: Workloads not scheduled here by default

Worker Nodes (6 nodes - for your applications):

ARM64 Workers (Primary - 4 nodes):

1x ARM64 large (CAX31: 8 vCPU, 16GB RAM, 160GB SSD)
2x ARM64 medium (CAX21: 4 vCPU, 8GB RAM, 80GB SSD)
1x ARM64 database (CAX21: 4 vCPU, 8GB RAM) - Dedicated for PostgreSQL
Location: hel1
Default scheduling target - workloads schedule here unless specified otherwise

AMD64 Workers (Legacy - 2 nodes):

1x AMD64 medium (CPX31: 4 vCPU, 8GB RAM, 160GB SSD)
1x AMD64 small (CPX21: 3 vCPU, 4GB RAM, 80GB SSD)
Location: hel1
Tainted - requires explicit nodeSelector to use (see examples below)

Architecture Support¶

The cluster supports both ARM64 and AMD64 architectures:

✅ linux/arm64 (primary, recommended - better performance and cost)
✅ linux/amd64 (available for legacy workloads - requires nodeSelector)

Scheduling Behavior:

ARM64 nodes: Workloads schedule here by default (untainted)
AMD64 nodes: Tainted with kubernetes.io/arch=amd64:NoSchedule - requires explicit targeting

Best Practice:

Prefer ARM64: Use multi-arch or ARM64 images when possible (cheaper nodes, better performance)
AMD64 fallback: Use for legacy applications that don’t have ARM64 builds yet

Multi-platform builds: Build images for both architectures:

docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push .

Test both: If using multi-arch images, verify on both architectures

Storage Options¶

1. Longhorn (Default Persistent Storage)¶

Use for: Stateful applications, databases, persistent volumes

StorageClass: longhorn (default)
Access Modes: ReadWriteOnce (RWO), ReadWriteMany (RWX), ReadOnlyMany (ROX)
File System: XFS
Replication: 3 replicas across nodes (configurable)
Backup: Automatic backup to Hetzner Storage Box (CIFS)
Snapshots: Supported
Capacity: Depends on node local storage (80-160GB per node)

Example PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-app-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi

Special Use Cases:

For Kafka workloads: Use longhorn-kafka StorageClass (dedicated for high-throughput workloads)

2. SMB/CIFS Storage (Hetzner Storage Box)¶

Use for: Shared file storage, backups, multi-pod read/write

StorageClass: hetzner-smb
Access Modes: ReadWriteMany (RWX)
Capacity: Large (Hetzner Storage Box)
Performance: Network-based (slower than Longhorn)

Example PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-uploads
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: hetzner-smb
  resources:
    requests:
      storage: 100Gi

3. S3 Object Storage (via Crossplane)¶

Use for: Object storage, backups, log storage, static assets

Provider: Hetzner Object Storage (S3-compatible)
Management: Crossplane-managed buckets
Access: Via S3 API (AWS SDK compatible)

How to Request a Bucket: Create a Crossplane Bucket resource (see How-To: Create S3 Bucket)

Networking & Ingress¶

Ingress Controller: Traefik¶

Default ingress controller for HTTP/HTTPS traffic

Version: v3.4.1 (pinned)
Features:
- Automatic HTTPS via Let’s Encrypt (cert-manager)
- HTTP to HTTPS redirect (enabled by default)
- Access logs enabled
- Proxy protocol support

Creating an Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: traefik
  tls:
    - hosts:
        - myapp.sites.kup6s.com
      secretName: myapp-tls
  rules:
    - host: myapp.sites.kup6s.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app
                port:
                  number: 80

TLS/SSL Certificates (cert-manager)¶

Automatic certificate management via Let’s Encrypt

Cluster Issuer: letsencrypt-prod
DNS Challenge: Not configured (use HTTP-01 challenge)
Renewal: Automatic (30 days before expiry)

Usage: Add annotation to Ingress (see example above)

Domain Structure¶

Available domain patterns:

*.sites.kup6s.com - Customer/project websites
*.ops.kup6s.net - Infrastructure tools (ArgoCD, Grafana, etc.)
*.nodes.kup6s.com - Node-level DNS (internal only)

Network Policy & Observability¶

CNI: Cilium (eBPF-based) with native routing mode
Pod-to-Pod Traffic: High-performance eBPF networking
Network Policies: Supported (standard Kubernetes NetworkPolicy + Cilium NetworkPolicy for L7)
Hubble Observability: ✅ Enabled
- Service dependency mapping (automatic service maps)
- Flow visibility (L3/L4/L7 traffic inspection)
- Network troubleshooting (DNS, HTTP, TCP flows)
- Hubble UI available for graphical network visualization
- Metrics exported to Prometheus

Databases¶

CloudNativePG (PostgreSQL Operator)¶

Managed PostgreSQL databases via Kubernetes operator

Operator: CloudNativePG (CNPG) v1.27.0
Backup Plugin: Barman Cloud Plugin v0.7.0 (installed)
High Availability: Supported (with replication)
Backups: Integrated with S3/Longhorn via Barman Cloud Plugin
Monitoring: Prometheus metrics

Creating a PostgreSQL Cluster:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: myapp-db
spec:
  instances: 3
  storage:
    storageClass: longhorn
    size: 20Gi
  postgresql:
    parameters:
      max_connections: "100"

Connection: Use generated secrets for connection strings

Note: For backup configuration using the Barman Cloud Plugin, create an ObjectStore resource and reference it in your cluster’s plugins section. The plugin is deployed via 60-B-barman-plugin.yaml.tpl. See CloudNativePG documentation for details.

Monitoring & Observability¶

Prometheus + Grafana (kube-prometheus-stack)¶

Full observability stack pre-installed

Access:

Grafana: https://grafana.ops.kup6s.net
Prometheus: Internal cluster access only

Metrics Collection:

All cluster components monitored by default
Your apps: Add Prometheus annotations to expose metrics

ServiceMonitor Example:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: metrics
      interval: 30s

Loki (Log Aggregation)¶

Centralized logging with S3 backend

Storage: Hetzner S3 Object Storage (Crossplane-managed)
Access: Via Grafana (Explore → Loki)
Retention: Configurable (check with cluster admin)

Log Collection:

Container logs automatically collected
Query via LogQL in Grafana

Example Query:

{namespace="my-namespace", pod=~"my-app-.*"}

Security Features¶

Secrets Encryption at Rest¶

✅ Kubernetes secrets encrypted in etcd (AES-CBC)
✅ Automatic encryption for all Secret resources
No action required from developers

Pod Security¶

Pod Security Standards: Baseline enforced
Security Contexts: Supported and recommended

Example:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true

Network Encryption¶

✅ Pod-to-pod traffic secured (Cilium eBPF with native routing)
✅ Ingress traffic encrypted (TLS via cert-manager)
✅ Secrets encrypted at rest (etcd encryption enabled)

GitOps Deployment (ArgoCD)¶

ArgoCD Access¶

Dashboard: https://argocd.ops.kup6s.net

Deployment Workflow¶

Write CDK8S Chart in argoapps/ directory
Register in registry (apps/registry.ts)
Generate manifests: npm run build
Apply ArgoCD Application: kubectl apply -f dist/CHARTNAME.yaml
ArgoCD syncs your application automatically

ArgoCD Application Structure¶

Example CDK8S Chart:

import { Chart } from 'cdk8s';
import { ArgoCdApplication } from '@opencdk8s/cdk8s-argocd-resources';

export class MyAppChart extends Chart {
  constructor(scope: Construct, id: string) {
    super(scope, id);

    new ArgoCdApplication(this, 'myapp', {
      metadata: {
        name: 'myapp',
        namespace: 'argocd',
      },
      spec: {
        project: 'default',
        source: {
          repoUrl: 'https://github.com/your-org/your-repo',
          path: 'k8s/myapp',
          targetRevision: 'main',
        },
        destination: {
          server: 'https://kubernetes.default.svc',
          namespace: 'myapp',
        },
        syncPolicy: {
          automated: {
            prune: true,
            selfHeal: true,
          },
        },
      },
    });
  }
}

Resource Quotas & Limits¶

No Hard Quotas (Currently)¶

No namespace-level resource quotas configured
Best Practice: Always set resource requests/limits in your pods

Recommended:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

Node Capacity (Total)¶

Worker Node Resources:

ARM64 Workers: 20 vCPU, 40GB RAM (primary capacity)
AMD64 Workers: 7 vCPU, 12GB RAM (legacy/transition)
Database Node: 4 vCPU, 8GB RAM (dedicated PostgreSQL)
Total: 31 vCPU, 60GB RAM
Storage: ~560GB local (Longhorn pool across all workers)

Recommended Allocation:

75% of workloads → ARM64 (cheaper, better performance)
25% of workloads → AMD64 (legacy apps during migration)
Databases → Dedicated node (isolated from web app contention)

Plan accordingly for your application’s resource needs.

Service Mesh & Advanced Networking¶

Cilium Advanced Features¶

The cluster uses Cilium CNI which provides service mesh-like capabilities without a separate service mesh:

✅ L7 Network Policies: HTTP/gRPC/Kafka protocol-aware policies
✅ Service Mesh Lite: Cilium provides observability and L7 policies without sidecar proxies
✅ Hubble Observability: Service dependency maps, flow visualization, network troubleshooting
✅ High Performance: eBPF-based networking bypasses iptables for better performance

Accessing Hubble UI:

# Via Cilium CLI (recommended)
cilium hubble ui

# Or via kubectl port-forward
kubectl port-forward -n kube-system service/hubble-ui 12000:80
# Then open http://localhost:12000

Not Available¶

❌ Full service mesh (Istio/Linkerd) with sidecar proxies
❌ Advanced traffic splitting/canary deployments (use ArgoCD Rollouts instead)

Use Traefik features for:

Load balancing
Path-based routing
Header-based routing
Rate limiting (via middleware)

Backup & Disaster Recovery¶

Automatic Backups¶

etcd: Daily S3 backups (cluster state) Longhorn: Recurring backups to Storage Box PostgreSQL: Configure per-database (CNPG backup)

Application Backups¶

Your responsibility:

Application data backup strategy
Database backup verification
Backup testing

Limitations & Considerations¶

Architecture Constraints¶

✅ ARM64 primary: Most workloads run on ARM64 (cheaper, better performance)
✅ AMD64 available: Legacy workloads can run on AMD64 nodes (with nodeSelector)
⚠️ AMD64 nodes are tainted: Workloads won’t schedule there by default - must explicitly target
⚠️ Mixed-arch complexity: Need to manage which workloads run on which architecture
💡 Migration path: Start on AMD64, gradually move to ARM64 for cost optimization

Storage Performance¶

Longhorn: Good for general workloads
Longhorn-Kafka: Optimized for high-throughput
SMB/CIFS: Slower, best for shared/backup use

Scaling¶

Node scaling: Contact cluster admin
HPA (Horizontal Pod Autoscaler): Supported
VPA (Vertical Pod Autoscaler): Not configured

External Services¶

External databases: Not directly supported (use port-forward or VPN)
Outbound traffic: Unrestricted (no egress filtering)

Quick Reference: Common Tasks¶

Deploy an Application¶

Default (ARM64):

Create namespace (if needed)
Create ArgoCD Application (CDK8S or YAML)
Apply: kubectl apply -f dist/myapp.yaml
Monitor in ArgoCD dashboard
Workload automatically schedules to ARM64 nodes (no special config needed)

AMD64-only Application (legacy apps during migration):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: legacy-app
spec:
  template:
    spec:
      nodeSelector:
        kubernetes.io/arch: amd64
      tolerations:
        - key: kubernetes.io/arch
          operator: Equal
          value: amd64
          effect: NoSchedule
      containers:
        - name: app
          image: myorg/legacy-app:amd64

Request a PersistentVolume¶

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-data
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi

Expose an Application (Ingress)¶

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: traefik
  tls:
    - hosts: [myapp.sites.kup6s.com]
      secretName: myapp-tls
  rules:
    - host: myapp.sites.kup6s.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app
                port:
                  number: 80

Create a PostgreSQL Database¶

Recommended: Use dedicated database node for isolation:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: myapp-db
  namespace: databases
spec:
  instances: 3  # HA with replication

  # Schedule to dedicated database node (ARM64)
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: workload
                operator: In
                values:
                  - database

  # Tolerate database node taint
  tolerations:
    - key: workload
      operator: Equal
      value: database
      effect: NoSchedule

  storage:
    size: 10Gi
    storageClass: longhorn

  postgresql:
    parameters:
      max_connections: "100"
      shared_buffers: "256MB"

Simple (shared worker node):

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: simple-db
spec:
  instances: 1
  storage:
    size: 10Gi
    storageClass: longhorn

View Logs (Loki)¶

Open Grafana: https://grafana.ops.kup6s.net
Go to Explore
Select Loki data source
Query: {namespace="your-namespace"}

Monitor Application Metrics¶

Add Prometheus annotations to Service
Create ServiceMonitor (optional)
View in Grafana dashboards

Getting Help¶

Cluster Administration Issues¶

Contact: Cluster admin team
Topics: Node issues, cluster upgrades, infrastructure

Application Deployment Issues¶

ArgoCD dashboard for sync status
Logs via kubectl logs or Grafana/Loki
Metrics via Grafana

CDK8S Development¶

See: argoapps/README.md
Examples: argoapps/apps/ directory