Architecture Overview¶
This document explains the architectural decisions behind the Nextcloud deployment on kup6s.
Design Principles¶
2. S3 Primary Storage¶
User files are stored in Hetzner S3 Object Storage, not in Kubernetes volumes:
Why S3?
Scalability: Unlimited storage without managing volumes
Cost: Object storage cheaper than block storage for large files
Durability: Hetzner provides 99.999999999% (11 nines) durability
Backup: Built-in versioning and lifecycle policies
Migration: Easy to migrate between clusters (just move S3 credentials)
Trade-offs:
Slightly higher latency for file access vs. local storage
Dependency on external service (Hetzner S3)
Network egress costs for file transfers
3. CloudNativePG for PostgreSQL¶
CloudNativePG (CNPG) operator manages PostgreSQL clusters:
Why CNPG?
High Availability: Automatic failover with replica promotion
Automated Backups: Continuous archiving to S3 with Barman Cloud Plugin
Point-in-Time Recovery: Restore to any point in backup retention
Connection Pooling: PgBouncer built-in for efficient connection management
Observability: Prometheus metrics and comprehensive status reporting
Configuration:
2 PostgreSQL replicas (1 primary + 1 standby)
Synchronous replication for data safety
Automated backups every 6 hours to S3
30-day backup retention
4. Single Replica Design (affenstall)¶
The nextcloudaffenstall instance runs with replicas: 1 due to storage constraints:
Why 1 Replica?
Nextcloud requires RWX (ReadWriteMany) storage for multiple replicas
Longhorn and Hetzner Cloud Volumes only support RWO (ReadWriteOnce)
SMB CSI (RWX) available but too slow for Nextcloud config/apps directory
Implications:
No horizontal scaling for Nextcloud pod
Brief downtime during pod restarts (~30 seconds)
Vertical scaling still possible (increase CPU/memory)
Future Options:
Migrate to SMB CSI for config/apps (requires performance testing)
Use separate StatefulSet per pod with local RWO volumes
Implement active-passive setup with manual failover
5. ArgoCD GitOps Deployment¶
All Nextcloud instances deployed via ArgoCD with automated sync:
Sync Waves ensure proper ordering:
Wave 0: Namespace creation
Wave 1: S3 buckets (Crossplane) + credentials (ESO)
Wave 2: PostgreSQL + Redis (data layer)
Wave 3: PgBouncer pooler
Wave 4: Nextcloud + Collabora + Whiteboard
Benefits:
Declarative configuration (Git as source of truth)
Automated sync on Git push
Rollback via Git revert
Audit trail of all changes
Component Architecture¶
Nextcloud Hub¶
Image: nextcloud:31.0.13 (official Docker image)
Responsibilities:
WebDAV file protocol
Web UI for file management
User authentication and session management
Background jobs (cron)
File metadata in PostgreSQL
File content in S3
Environment Variables:
Database connection via
POSTGRES_*env varsRedis connection via
REDIS_*env varsS3 config via
OBJECTSTORE_S3_*env vars (injected from secret)
Collabora Online¶
Image: collabora/code:25.04.8.2.1
Purpose: Browser-based LibreOffice for document editing
Integration:
Nextcloud calls Collabora via WOPI protocol
Documents opened in browser iframe
Real-time collaborative editing
Supports: .docx, .xlsx, .pptx, .odt, .ods, .odp
Configuration:
Hostname:
collabora.{domain}Nextcloud domain whitelisted in Collabora config
2 replicas for load distribution
Whiteboard¶
Image: ghcr.io/nextcloud-releases/whiteboard:release
Purpose: Collaborative digital whiteboard
Integration:
Nextcloud app communicates via shared secret (JWT)
Whiteboard runs on
/whiteboardpathReal-time drawing synchronization
PostgreSQL (CloudNativePG)¶
Configuration:
2 replicas (primary + standby)
10Gi PVC per replica (Longhorn)
PgBouncer connection pooling
Automated backups to S3
Database Schema:
Nextcloud uses ~150 tables
File metadata, user data, shares, comments
File content NOT in database (in S3)
Redis¶
Configuration:
1 replica (StatefulSet)
5Gi PVC (Longhorn)
Used for caching and session storage
No persistence required (cache only)
Purpose:
File locking coordination
Session storage
Transactional file locking
Cache for expensive operations
Network Architecture¶
Ingress Routes¶
affenstall.cloud → Traefik → nextcloud:8080
collabora.affenstall.cloud → Traefik → collabora:9980
TLS Termination:
cert-manager provisions Let’s Encrypt certificates
Traefik handles TLS termination
Backend connections are HTTP (in-cluster)
Service Mesh¶
All communication within namespace via Kubernetes Services:
nextcloud → postgres-pooler:5432 → postgres-rw:5432
nextcloud → redis:6379
nextcloud → collabora:9980 (internal, for WOPI)
Data Flow¶
File Upload¶
User Browser → Traefik (TLS) → Nextcloud Pod → Hetzner S3
↓
PostgreSQL (metadata)
File Download¶
User Browser ← Traefik (TLS) ← Nextcloud Pod ← Hetzner S3
↑
PostgreSQL (check permissions)
Document Editing (Collabora)¶
User Browser ↔ Traefik (TLS) ↔ Nextcloud Pod ↔ Collabora Pod
↓ ↓
PostgreSQL (metadata) Hetzner S3 (file)
Security Model¶
Secrets Management¶
All secrets stored in Infisical and injected via External Secrets Operator (ESO):
nextcloud-s3-credentials: S3 access key/secret
nextcloud-postgres-app: PostgreSQL credentials (created by CNPG)
nextcloud: Admin username/password
TLS Encryption¶
External traffic: TLS 1.2+ via Traefik + cert-manager
Internal traffic: Plain HTTP (within Kubernetes network)
S3 traffic: HTTPS to fsn1.your-objectstorage.com
RBAC¶
Nextcloud pod runs as
www-data(UID 33)PostgreSQL runs as
postgres(UID 70)No privileged containers required
Scaling Considerations¶
Vertical Scaling (Current)¶
Increase resources in config.yaml:
resources:
nextcloud:
requests:
cpu: 200m → 500m
memory: 512Mi → 1Gi
limits:
cpu: 1000m → 2000m
memory: 2Gi → 4Gi
Horizontal Scaling (Future)¶
Requires RWX storage solution:
Option A: SMB CSI
Mount Hetzner Storage Box via SMB
Performance testing required
May be too slow for config/apps
Option B: Separate StatefulSets
Each pod gets own RWO volume
Manual load balancing
Complex upgrade procedure
Option C: S3FS FUSE Mount
Mount S3 as filesystem
High latency, not recommended
Reliability concerns
Monitoring and Observability¶
Metrics¶
Nextcloud exposes Prometheus metrics at /ocs/v2.php/apps/serverinfo/api/v1/info:
Active users
Storage usage
Database size
PHP-FPM stats
Share counts
Logs¶
Structured JSON logs to stdout:
Application logs:
/var/www/html/data/nextcloud.logApache access logs: stdout (collected by Loki)
Cron job logs: Separate container logs
Health Checks¶
Liveness probe:
/status.php(restarts unhealthy pods)Readiness probe:
/status.php(removes from service rotation)PostgreSQL: CNPG operator monitors and auto-fails over