Reference
Cluster Capabilities¶
Target Audience: Developers writing CDK8S charts and ArgoCD application deployments
This document describes the capabilities, services, and features available in the kup6s.com Kubernetes cluster for application developers. If you’re deploying applications via ArgoCD or writing CDK8S charts, this is your reference guide.
Cluster Overview¶
Platform: K3S on Hetzner Cloud
Architecture: Multi-architecture (ARM64 primary, AMD64 available)
High Availability: 3 control plane nodes across 3 data centers
Deployment Method: GitOps via ArgoCD
Kubernetes Version: v1.31.x (automatically managed)
Compute Resources¶
Node Pools¶
Control Plane Nodes (3 nodes - not for workloads):
3x ARM64 nodes (CAX21: 4 vCPU, 8GB RAM)
Locations: fsn1, nbg1, hel1
Taints: Workloads not scheduled here by default
Worker Nodes (6 nodes - for your applications):
ARM64 Workers (Primary - 4 nodes):
1x ARM64 large (CAX31: 8 vCPU, 16GB RAM, 160GB SSD)
2x ARM64 medium (CAX21: 4 vCPU, 8GB RAM, 80GB SSD)
1x ARM64 database (CAX21: 4 vCPU, 8GB RAM) - Dedicated for PostgreSQL
Location: hel1
Default scheduling target - workloads schedule here unless specified otherwise
AMD64 Workers (Legacy - 2 nodes):
1x AMD64 medium (CPX31: 4 vCPU, 8GB RAM, 160GB SSD)
1x AMD64 small (CPX21: 3 vCPU, 4GB RAM, 80GB SSD)
Location: hel1
Tainted - requires explicit nodeSelector to use (see examples below)
Architecture Support¶
The cluster supports both ARM64 and AMD64 architectures:
✅
linux/arm64(primary, recommended - better performance and cost)✅
linux/amd64(available for legacy workloads - requires nodeSelector)
Scheduling Behavior:
ARM64 nodes: Workloads schedule here by default (untainted)
AMD64 nodes: Tainted with
kubernetes.io/arch=amd64:NoSchedule- requires explicit targeting
Best Practice:
Prefer ARM64: Use multi-arch or ARM64 images when possible (cheaper nodes, better performance)
AMD64 fallback: Use for legacy applications that don’t have ARM64 builds yet
Multi-platform builds: Build images for both architectures:
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push .Test both: If using multi-arch images, verify on both architectures
Storage Options¶
1. Longhorn (Default Persistent Storage)¶
Use for: Stateful applications, databases, persistent volumes
StorageClass:
longhorn(default)Access Modes: ReadWriteOnce (RWO), ReadWriteMany (RWX), ReadOnlyMany (ROX)
File System: XFS
Replication: 3 replicas across nodes (configurable)
Backup: Automatic backup to Hetzner Storage Box (CIFS)
Snapshots: Supported
Capacity: Depends on node local storage (80-160GB per node)
Example PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-app-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
Special Use Cases:
For Kafka workloads: Use
longhorn-kafkaStorageClass (dedicated for high-throughput workloads)
2. SMB/CIFS Storage (Hetzner Storage Box)¶
Use for: Shared file storage, backups, multi-pod read/write
StorageClass:
hetzner-smbAccess Modes: ReadWriteMany (RWX)
Capacity: Large (Hetzner Storage Box)
Performance: Network-based (slower than Longhorn)
Example PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-uploads
spec:
accessModes:
- ReadWriteMany
storageClassName: hetzner-smb
resources:
requests:
storage: 100Gi
3. S3 Object Storage (via Crossplane)¶
Use for: Object storage, backups, log storage, static assets
Provider: Hetzner Object Storage (S3-compatible)
Management: Crossplane-managed buckets
Access: Via S3 API (AWS SDK compatible)
How to Request a Bucket:
Create a Crossplane Bucket resource (see How-To: Create S3 Bucket)
Networking & Ingress¶
Ingress Controller: Traefik¶
Default ingress controller for HTTP/HTTPS traffic
Version: v3.4.1 (pinned)
Features:
Automatic HTTPS via Let’s Encrypt (cert-manager)
HTTP to HTTPS redirect (enabled by default)
Access logs enabled
Proxy protocol support
Creating an Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: traefik
tls:
- hosts:
- myapp.sites.kup6s.com
secretName: myapp-tls
rules:
- host: myapp.sites.kup6s.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app
port:
number: 80
TLS/SSL Certificates (cert-manager)¶
Automatic certificate management via Let’s Encrypt
Cluster Issuer:
letsencrypt-prodDNS Challenge: Not configured (use HTTP-01 challenge)
Renewal: Automatic (30 days before expiry)
Usage: Add annotation to Ingress (see example above)
Domain Structure¶
Available domain patterns:
*.sites.kup6s.com- Customer/project websites*.ops.kup6s.net- Infrastructure tools (ArgoCD, Grafana, etc.)*.nodes.kup6s.com- Node-level DNS (internal only)
Network Policy & Observability¶
CNI: Cilium (eBPF-based) with native routing mode
Pod-to-Pod Traffic: High-performance eBPF networking
Network Policies: Supported (standard Kubernetes NetworkPolicy + Cilium NetworkPolicy for L7)
Hubble Observability: ✅ Enabled
Service dependency mapping (automatic service maps)
Flow visibility (L3/L4/L7 traffic inspection)
Network troubleshooting (DNS, HTTP, TCP flows)
Hubble UI available for graphical network visualization
Metrics exported to Prometheus
Databases¶
CloudNativePG (PostgreSQL Operator)¶
Managed PostgreSQL databases via Kubernetes operator
Operator: CloudNativePG (CNPG) v1.27.0
Backup Plugin: Barman Cloud Plugin v0.7.0 (installed)
High Availability: Supported (with replication)
Backups: Integrated with S3/Longhorn via Barman Cloud Plugin
Monitoring: Prometheus metrics
Creating a PostgreSQL Cluster:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: myapp-db
spec:
instances: 3
storage:
storageClass: longhorn
size: 20Gi
postgresql:
parameters:
max_connections: "100"
Connection: Use generated secrets for connection strings
Note: For backup configuration using the Barman Cloud Plugin, create an ObjectStore resource and reference it in your cluster’s plugins section. The plugin is deployed via 60-B-barman-plugin.yaml.tpl. See CloudNativePG documentation for details.
Monitoring & Observability¶
Prometheus + Grafana (kube-prometheus-stack)¶
Full observability stack pre-installed
Access:
Grafana:
https://grafana.ops.kup6s.netPrometheus: Internal cluster access only
Metrics Collection:
All cluster components monitored by default
Your apps: Add Prometheus annotations to expose metrics
ServiceMonitor Example:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
interval: 30s
Loki (Log Aggregation)¶
Centralized logging with S3 backend
Storage: Hetzner S3 Object Storage (Crossplane-managed)
Access: Via Grafana (Explore → Loki)
Retention: Configurable (check with cluster admin)
Log Collection:
Container logs automatically collected
Query via LogQL in Grafana
Example Query:
{namespace="my-namespace", pod=~"my-app-.*"}
Security Features¶
Secrets Encryption at Rest¶
✅ Kubernetes secrets encrypted in etcd (AES-CBC)
✅ Automatic encryption for all Secret resources
No action required from developers
Pod Security¶
Pod Security Standards: Baseline enforced
Security Contexts: Supported and recommended
Example:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
Network Encryption¶
✅ Pod-to-pod traffic secured (Cilium eBPF with native routing)
✅ Ingress traffic encrypted (TLS via cert-manager)
✅ Secrets encrypted at rest (etcd encryption enabled)
GitOps Deployment (ArgoCD)¶
ArgoCD Access¶
Dashboard: https://argocd.ops.kup6s.net
Deployment Workflow¶
Write CDK8S Chart in
argoapps/directoryRegister in registry (
apps/registry.ts)Generate manifests:
npm run buildApply ArgoCD Application:
kubectl apply -f dist/CHARTNAME.yamlArgoCD syncs your application automatically
ArgoCD Application Structure¶
Example CDK8S Chart:
import { Chart } from 'cdk8s';
import { ArgoCdApplication } from '@opencdk8s/cdk8s-argocd-resources';
export class MyAppChart extends Chart {
constructor(scope: Construct, id: string) {
super(scope, id);
new ArgoCdApplication(this, 'myapp', {
metadata: {
name: 'myapp',
namespace: 'argocd',
},
spec: {
project: 'default',
source: {
repoUrl: 'https://github.com/your-org/your-repo',
path: 'k8s/myapp',
targetRevision: 'main',
},
destination: {
server: 'https://kubernetes.default.svc',
namespace: 'myapp',
},
syncPolicy: {
automated: {
prune: true,
selfHeal: true,
},
},
},
});
}
}
Resource Quotas & Limits¶
No Hard Quotas (Currently)¶
No namespace-level resource quotas configured
Best Practice: Always set resource requests/limits in your pods
Recommended:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Node Capacity (Total)¶
Worker Node Resources:
ARM64 Workers: 20 vCPU, 40GB RAM (primary capacity)
AMD64 Workers: 7 vCPU, 12GB RAM (legacy/transition)
Database Node: 4 vCPU, 8GB RAM (dedicated PostgreSQL)
Total: 31 vCPU, 60GB RAM
Storage: ~560GB local (Longhorn pool across all workers)
Recommended Allocation:
75% of workloads → ARM64 (cheaper, better performance)
25% of workloads → AMD64 (legacy apps during migration)
Databases → Dedicated node (isolated from web app contention)
Plan accordingly for your application’s resource needs.
Service Mesh & Advanced Networking¶
Cilium Advanced Features¶
The cluster uses Cilium CNI which provides service mesh-like capabilities without a separate service mesh:
✅ L7 Network Policies: HTTP/gRPC/Kafka protocol-aware policies
✅ Service Mesh Lite: Cilium provides observability and L7 policies without sidecar proxies
✅ Hubble Observability: Service dependency maps, flow visualization, network troubleshooting
✅ High Performance: eBPF-based networking bypasses iptables for better performance
Accessing Hubble UI:
# Via Cilium CLI (recommended)
cilium hubble ui
# Or via kubectl port-forward
kubectl port-forward -n kube-system service/hubble-ui 12000:80
# Then open http://localhost:12000
Not Available¶
❌ Full service mesh (Istio/Linkerd) with sidecar proxies
❌ Advanced traffic splitting/canary deployments (use ArgoCD Rollouts instead)
Use Traefik features for:
Load balancing
Path-based routing
Header-based routing
Rate limiting (via middleware)
Backup & Disaster Recovery¶
Automatic Backups¶
etcd: Daily S3 backups (cluster state) Longhorn: Recurring backups to Storage Box PostgreSQL: Configure per-database (CNPG backup)
Application Backups¶
Your responsibility:
Application data backup strategy
Database backup verification
Backup testing
Limitations & Considerations¶
Architecture Constraints¶
✅ ARM64 primary: Most workloads run on ARM64 (cheaper, better performance)
✅ AMD64 available: Legacy workloads can run on AMD64 nodes (with nodeSelector)
⚠️ AMD64 nodes are tainted: Workloads won’t schedule there by default - must explicitly target
⚠️ Mixed-arch complexity: Need to manage which workloads run on which architecture
💡 Migration path: Start on AMD64, gradually move to ARM64 for cost optimization
Storage Performance¶
Longhorn: Good for general workloads
Longhorn-Kafka: Optimized for high-throughput
SMB/CIFS: Slower, best for shared/backup use
Scaling¶
Node scaling: Contact cluster admin
HPA (Horizontal Pod Autoscaler): Supported
VPA (Vertical Pod Autoscaler): Not configured
External Services¶
External databases: Not directly supported (use port-forward or VPN)
Outbound traffic: Unrestricted (no egress filtering)
Quick Reference: Common Tasks¶
Deploy an Application¶
Default (ARM64):
Create namespace (if needed)
Create ArgoCD Application (CDK8S or YAML)
Apply:
kubectl apply -f dist/myapp.yamlMonitor in ArgoCD dashboard
Workload automatically schedules to ARM64 nodes (no special config needed)
AMD64-only Application (legacy apps during migration):
apiVersion: apps/v1
kind: Deployment
metadata:
name: legacy-app
spec:
template:
spec:
nodeSelector:
kubernetes.io/arch: amd64
tolerations:
- key: kubernetes.io/arch
operator: Equal
value: amd64
effect: NoSchedule
containers:
- name: app
image: myorg/legacy-app:amd64
Request a PersistentVolume¶
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-data
spec:
accessModes: [ReadWriteOnce]
storageClassName: longhorn
resources:
requests:
storage: 10Gi
Expose an Application (Ingress)¶
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: traefik
tls:
- hosts: [myapp.sites.kup6s.com]
secretName: myapp-tls
rules:
- host: myapp.sites.kup6s.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app
port:
number: 80
Create a PostgreSQL Database¶
Recommended: Use dedicated database node for isolation:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: myapp-db
namespace: databases
spec:
instances: 3 # HA with replication
# Schedule to dedicated database node (ARM64)
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: workload
operator: In
values:
- database
# Tolerate database node taint
tolerations:
- key: workload
operator: Equal
value: database
effect: NoSchedule
storage:
size: 10Gi
storageClass: longhorn
postgresql:
parameters:
max_connections: "100"
shared_buffers: "256MB"
Simple (shared worker node):
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: simple-db
spec:
instances: 1
storage:
size: 10Gi
storageClass: longhorn
View Logs (Loki)¶
Open Grafana:
https://grafana.ops.kup6s.netGo to Explore
Select Loki data source
Query:
{namespace="your-namespace"}
Monitor Application Metrics¶
Add Prometheus annotations to Service
Create ServiceMonitor (optional)
View in Grafana dashboards
Getting Help¶
Cluster Administration Issues¶
Contact: Cluster admin team
Topics: Node issues, cluster upgrades, infrastructure
Application Deployment Issues¶
ArgoCD dashboard for sync status
Logs via
kubectl logsor Grafana/LokiMetrics via Grafana
CDK8S Development¶
See: argoapps/README.md
Examples:
argoapps/apps/directory