Explanation

Storage Architecture for GitLab BDA¶

Type: Explanation (Understanding-oriented)

Related Concepts: S3 Buckets Reference | Storage Tiers

This document explains GitLab BDA’s specific storage tier assignments and S3 bucket strategy. For general storage architecture, tier selection criteria, and Longhorn storage classes, see Storage Architecture and Tiers.

Overview¶

GitLab BDA uses all three storage tiers to optimize cost, performance, and reliability for a small team (2-5 users):

Component	Storage Tier	Storage Class	Size	Rationale
Gitaly (Git repos)	Hetzner Cloud Volumes	`hcloud-volumes`	20Gi	Managed storage, daily S3 backups
PostgreSQL (2 instances)	Longhorn	`longhorn-redundant-app`	10Gi each	CNPG provides replication
Redis	Longhorn	`longhorn`	10Gi	Single instance needs storage HA
GitLab artifacts/uploads/etc	Hetzner S3	N/A	Variable	8 buckets for different purposes

Total cluster storage: 40Gi Longhorn + 20Gi Hetzner Volumes = 60Gi

Tier 1: Hetzner Cloud Volumes¶

Gitaly (Git Repository Storage)¶

Storage tier: Hetzner Cloud Volumes Size: 20Gi Storage class: hcloud-volumes

Why Hetzner Volumes instead of Longhorn?

Decision rationale:

Simplicity - Hetzner handles replication, no cluster overhead
Network-attached - Gitaly pod can reschedule freely between nodes
Backup-based redundancy - Daily GitLab backups to S3 (see below)
Avoid triple redundancy - Hetzner replication + Longhorn replication + S3 backups would be wasteful
Cost-effective - €0.05/GB/month managed storage

Key insight: GitLab’s daily backup job uploads all .git data to gitlab-backups-gitlabbda-kup6s S3 bucket. Combined with Hetzner’s built-in volume redundancy, this provides adequate protection without needing Longhorn replication.

PVC configuration (from GitLab Helm chart):

storageClass: hcloud-volumes
size: 20Gi
accessModes: [ReadWriteOnce]

Tier 2: Longhorn Distributed Storage¶

PostgreSQL (CloudNativePG Cluster)¶

Storage tier: Longhorn Size: 10Gi per instance × 2 instances = 20Gi total (logical) Storage class: longhorn-redundant-app (1 Longhorn replica)

Why 1 replica when PostgreSQL is critical?

Decision rationale:

CNPG provides replication: 2 PostgreSQL instances with streaming replication
App-level redundancy: Primary instance replicates to standby instance
Avoid double redundancy: Each CNPG instance has its own PVC with 1 Longhorn replica
Result: 2 total copies of data (CNPG replication) + S3 backups (Barman)

Formula:

2 CNPG instances × 1 Longhorn replica each = 2 total copies
vs.
2 CNPG instances × 2 Longhorn replicas each = 4 total copies (wasteful!)

PVC configuration (from DatabaseConstruct):

storage: {
  storageClass: 'longhorn-redundant-app',  // 1 replica
  size: '10Gi',  // Per instance
}

Backup strategy:

CNPG Barman Cloud Plugin - WAL archiving + base backups to gitlab-postgresbackups-gitlabbda-kup6s S3 bucket
Longhorn volume backups - Snapshots to Hetzner Storage Box (CIFS)

See Storage Tiers for detailed explanation of this pattern.

Redis Cache¶

Storage tier: Longhorn Size: 10Gi Storage class: longhorn (2 Longhorn replicas)

Why 2 replicas for Redis?

Decision rationale:

Single instance: Redis not clustered for 2-5 users (adequate performance)
Storage-level HA: 2 Longhorn replicas provide redundancy
Data locality: Best-effort placement (one replica on same node as pod for performance)
Tolerable data loss: Redis is cache layer, can be rebuilt if needed

PVC configuration (from RedisConstruct):

storageClass: longhorn  # 2 replicas (default)
size: 10Gi
accessModes: [ReadWriteOnce]

Tier 3: Hetzner S3 Object Storage¶

Eight S3 Buckets¶

All buckets in fsn1 region (Falkenstein - same as cluster for low latency):

Bucket Name	Purpose	Typical Size
`gitlab-artifacts-gitlabbda-kup6s`	CI/CD artifacts (build outputs, test results)	Variable
`gitlab-uploads-gitlabbda-kup6s`	User uploads (images, attachments)	1-5 GB
`gitlab-lfs-gitlabbda-kup6s`	Git LFS objects (large files tracked in git)	5-20 GB
`gitlab-pages-gitlabbda-kup6s`	GitLab Pages static sites	1-10 GB
`gitlab-registry-gitlabbda-kup6s`	Harbor OCI container images	10-50 GB
`gitlab-backups-gitlabbda-kup6s`	GitLab application backups (repos, DB, uploads)	20-100 GB
`gitlab-postgresbackups-gitlabbda-kup6s`	PostgreSQL CNPG Barman WAL/base backups	10-50 GB
`gitlab-cache-gitlabbda-kup6s`	GitLab Runner build cache	Variable

Total estimated: 50-200 GB (grows with usage)

For complete bucket specifications, see S3 Buckets Reference.

Why fsn1 Region for All Buckets?¶

Decision: All buckets in production region (fsn1) instead of spreading across regions.

Rationale:

Latency: Same datacenter as cluster = lowest upload/download latency
Bandwidth: No cross-region egress fees (Hetzner internal network)
Simplicity: Single Crossplane ProviderConfig endpoint

Trade-off: Backup buckets (gitlab-backups-gitlabbda-kup6s, gitlab-postgresbackups-gitlabbda-kup6s) also in fsn1.

Alternative considered: Move backup buckets to hel1 (Helsinki) for geographic redundancy.

Why not multi-region for backups?

Hetzner S3 already replicates within region
Current scale (2-5 users) doesn’t justify complexity
Future improvement: Move backup buckets to hel1 when implementing DR strategy

Bucket Provisioning¶

Via Crossplane (GitOps-managed):

// charts/constructs/s3-buckets.ts
new Bucket(this, 'artifacts', {
  metadata: {
    name: 'gitlab-artifacts-gitlabbda-kup6s',
    annotations: { 'argocd.argoproj.io/sync-wave': '1' },
  },
  spec: {
    forProvider: { region: 'fsn1' },
    providerConfigRef: { name: 'hetzner-s3' },
    managementPolicies: ['Observe', 'Create', 'Delete'],  // Skip Update
    deletionPolicy: 'Orphan',  // Safety
  },
});

Why managementPolicies: [Observe, Create, Delete]?

Skips Update operations (Hetzner S3 doesn’t support tagging)
Without this: buckets show SYNCED=False due to 501 Not Implemented errors

See S3 Bucket Architecture for details.

Storage Allocation Summary¶

Block Storage¶

Component	Hetzner Volumes	Longhorn	Replicas	Total
Gitaly	20Gi	-	Hetzner-managed	20Gi (billed)
PostgreSQL (2×)	-	20Gi (logical)	1 per instance	20Gi (cluster)
Redis	-	10Gi	2	20Gi (cluster)
Subtotal	20Gi	40Gi		80Gi total

Object Storage (S3)¶

Purpose	Buckets	Estimated Size
Application data	5 buckets (artifacts, uploads, LFS, pages, cache)	20-100 GB
Container registry	1 bucket	10-50 GB
Backups	2 buckets (GitLab, PostgreSQL)	30-150 GB
Subtotal	8 buckets	60-300 GB

Total storage cost estimate (monthly):

Hetzner Volumes: 20Gi × €0.05/GB = €1.00
Longhorn: Cluster overhead (included in node costs)
Hetzner S3: ~100GB × €0.005/GB = €0.50
Total: ~€1.50/month for storage

Backup Strategy: Defense in Depth¶

GitLab BDA has four backup layers:

1. Application-Level Replication¶

PostgreSQL: CNPG streaming replication (2 instances)
Redis: Single instance (no clustering needed for 2-5 users)

2. Storage-Level Redundancy¶

Gitaly: Hetzner Cloud Volumes (provider-managed)
PostgreSQL: Longhorn 1 replica per CNPG instance (2 total copies)
Redis: Longhorn 2 replicas
S3: Hetzner multi-datacenter replication

3. Snapshot Backups¶

Longhorn PVCs: Daily snapshots to Hetzner Storage Box (CIFS)
PostgreSQL: Continuous WAL archiving to S3 (Barman Cloud Plugin)

4. Application Backups¶

GitLab Toolbox: Daily full backup to gitlab-backups-gitlabbda-kup6s S3 bucket
- Includes: Git repositories, database dump, uploads, LFS, artifacts

Recovery Scenarios¶

Scenario	Recovery Method	RTO	RPO
Pod crash	Kubernetes restart, same PVC	< 1 min	0
Node failure	Pod reschedule, Longhorn/CNPG replica	< 5 min	0
PVC corruption	Restore from Longhorn backup (CIFS)	< 30 min	24h (daily)
Database corruption	Restore from Barman backup (S3, PITR)	< 1 hour	Minutes (WAL-based)
GitLab data loss	Restore from GitLab backup (S3)	< 4 hours	24h (daily)
Cluster destroyed	Rebuild cluster, restore from S3/CIFS	< 1 day	24h (daily)

RTO: Recovery Time Objective RPO: Recovery Point Objective

Resource Efficiency¶

Storage Optimization Comparison¶

Without optimization (naive 2-replica Longhorn for everything):

Component	Size	Replicas	Total
Gitaly	20Gi	2	40Gi
PostgreSQL (2×)	20Gi	2	40Gi
Redis	10Gi	2	20Gi
Total			100Gi

With optimization (tiered storage + avoid double redundancy):

Component	Storage	Total
Gitaly	Hetzner Volumes	20Gi (billed)
PostgreSQL (2×)	Longhorn 1 replica each	20Gi (cluster)
Redis	Longhorn 2 replicas	20Gi (cluster)
Total		60Gi (40% savings)

Key insight: Avoiding double redundancy (CNPG replication + Longhorn replication) saves 40Gi cluster storage.

Storage Architecture for GitLab BDA¶

Overview¶

Tier 1: Hetzner Cloud Volumes¶

Gitaly (Git Repository Storage)¶

Tier 2: Longhorn Distributed Storage¶

PostgreSQL (CloudNativePG Cluster)¶

Redis Cache¶

Tier 3: Hetzner S3 Object Storage¶

Eight S3 Buckets¶

Why fsn1 Region for All Buckets?¶

Bucket Provisioning¶

Storage Allocation Summary¶

Block Storage¶

Object Storage (S3)¶

Backup Strategy: Defense in Depth¶

1. Application-Level Replication¶

2. Storage-Level Redundancy¶

3. Snapshot Backups¶

4. Application Backups¶

Recovery Scenarios¶

Resource Efficiency¶

Storage Optimization Comparison¶

Related Documentation¶

General Storage Concepts¶

GitLab BDA Specific¶

How-To¶