Reference

S3 Buckets¶

Type: Reference (Information-oriented)

Overview¶

GitLab BDA uses 8 S3 buckets for object storage, all provisioned via Crossplane in the fsn1 (Falkenstein) region.

Bucket naming pattern: {purpose}-gitlabbda-kup6s

Why this pattern? Hetzner S3 bucket names are globally unique across all customers (like AWS). The suffix -gitlabbda-kup6s ensures uniqueness.

Total approximate storage (2-5 users): 50-200GB

Bucket Catalog¶

Bucket Name	Purpose	Est. Size (2-5 users)	Lifecycle	Critical?
artifacts-gitlabbda-kup6s	CI/CD artifacts	5-20GB	30 days (configurable)	No
uploads-gitlabbda-kup6s	User uploads	1-5GB	Never	Yes
lfs-gitlabbda-kup6s	Git LFS objects	0-50GB	Never	Yes
pages-gitlabbda-kup6s	Static sites	0.1-5GB	Manual	No
registry-gitlabbda-kup6s	Container images	10-100GB	Manual GC	No
backups-gitlabbda-kup6s	GitLab backups	20-50GB	7d/4w/3m	Yes
postgresbackups-gitlabbda-kup6s	CNPG backups	5-20GB	30 days	Yes
cache-gitlabbda-kup6s	Build cache	5-20GB	LRU auto-expire	No

Bucket Naming Convention¶

Pattern¶

{purpose}-{deployment}-{cluster}

Components:

{purpose} - Descriptive purpose (artifacts, uploads, backups, etc.)
{deployment} - Deployment name (gitlabbda)
{cluster} - Cluster identifier (kup6s)

Examples¶

Bucket	Purpose	Deployment	Cluster	Full Name
Artifacts	`artifacts`	`gitlabbda`	`kup6s`	`artifacts-gitlabbda-kup6s`
Backups	`backups`	`gitlabbda`	`kup6s`	`backups-gitlabbda-kup6s`

Why This Pattern?¶

Problem: Generic names fail due to global uniqueness

# These will fail (already taken by other Hetzner customers):
✗ artifacts
✗ backups
✗ gitlab-artifacts
✗ uploads

Solution: Add deployment + cluster suffix for uniqueness

# These succeed (unique to your cluster):
✓ artifacts-gitlabbda-kup6s
✓ backups-gitlabbda-kup6s
✓ uploads-gitlabbda-kup6s

Benefits:

Globally unique - Very unlikely someone else uses gitlabbda-kup6s suffix
Self-documenting - Bucket name tells you deployment and cluster
Multi-deployment safe - Can deploy multiple GitLab instances (different deployment name)

Crossplane Bucket Specification¶

All buckets are created via Crossplane Bucket CRs:

apiVersion: s3.aws.upbound.io/v1beta2
kind: Bucket
metadata:
  name: {bucket-name}
  namespace: crossplane-system  # NOT gitlabbda namespace
  labels:
    app.kubernetes.io/managed-by: cdk8s
    app.kubernetes.io/part-of: gitlab
    app.kubernetes.io/component: storage
  annotations:
    argocd.argoproj.io/sync-wave: "1"
    crossplane.io/external-name: {bucket-name}  # Actual bucket name in Hetzner S3
    description: {bucket-purpose}
spec:
  forProvider:
    region: fsn1  # Hetzner region (Falkenstein, Germany)
  providerConfigRef:
    name: hetzner-s3  # Cluster-managed ProviderConfig
  managementPolicies: [Observe, Create, Delete]  # Skip Update (no tagging)
  deletionPolicy: Orphan  # Safety: keep bucket on CR deletion

Critical Configuration Fields¶

managementPolicies¶

managementPolicies: [Observe, Create, Delete]

Why Skip Update?

Hetzner S3 doesn’t support tagging operations
Crossplane Update tries to apply tags → 501 Not Implemented error
Skipping Update avoids SYNCED=False errors

With Update (wrong):

NAME                              SYNCED   READY   EXTERNAL-NAME
artifacts-gitlabbda-kup6s         False    True    artifacts-gitlabbda-kup6s

Without Update (correct):

NAME                              SYNCED   READY   EXTERNAL-NAME
artifacts-gitlabbda-kup6s         True     True    artifacts-gitlabbda-kup6s

deletionPolicy¶

deletionPolicy: Orphan

What it does: Keeps S3 bucket when Bucket CR is deleted

Why Orphan?

Safety - Accidental CR deletion doesn’t delete data
Migration - Can recreate CR without losing bucket contents
Disaster recovery - Bucket survives cluster destruction

Alternative: deletionPolicy: Delete (dangerous - deletes bucket with CR)

forProvider.region¶

region: fsn1

Available Hetzner S3 regions:

fsn1 - Falkenstein, Germany (default)
nbg1 - Nuremberg, Germany
hel1 - Helsinki, Finland

Why fsn1?

Same datacenter region as cluster (low latency)
No cross-region egress fees

providerConfigRef¶

providerConfigRef:
  name: hetzner-s3

ProviderConfig location: crossplane-system namespace (cluster infrastructure)

Contains:

Hetzner S3 endpoint (https://fsn1.your-objectstorage.com)
S3 credentials (access key, secret key)
S3-specific settings (skip_region_validation, s3_use_path_style)

For ProviderConfig details, see Main Cluster Docs: Crossplane S3.

Bucket Details¶

Artifacts Bucket¶

Name: artifacts-gitlabbda-kup6s

Purpose: GitLab CI/CD artifacts storage

Contents:

Build outputs (compiled binaries, JAR files, Docker images)
Test results (JUnit XML, coverage reports)
Job logs (stdout/stderr from CI jobs)
Pipeline artifacts (downloadable files from GitLab UI)

Size estimates:

2-5 users: 5-20GB (10-50 pipelines/week)
10-20 users: 20-100GB (50-200 pipelines/week)
50+ users: 100-500GB (200+ pipelines/week)

Lifecycle policy:

Default: 30 days (configurable in GitLab settings)
Recommendation: 7 days for feature branches, 90 days for main/production

Access pattern:

Write: Frequent (every CI job uploads artifacts)
Read: Occasional (downloading artifacts from GitLab UI)
Delete: Automatic (GitLab expires old artifacts)

Example artifacts:

/artifacts-gitlabbda-kup6s/
  ├── gitlab/project-123/
  │   ├── 456-build-output.zip
  │   ├── 457-coverage-report.html
  │   └── 458-test-results.xml

GitLab configuration:

# In gitlab-helm.ts (Helm values)
global:
  appConfig:
    artifacts:
      enabled: true
      bucket: artifacts-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

Uploads Bucket¶

Name: uploads-gitlabbda-kup6s

Purpose: User uploads and attachments

Contents:

Issue attachments (screenshots, PDFs, documents)
Merge request comments (images, diagrams)
Wiki uploads (images, files)
User avatars
Group/project logos

Size estimates:

2-5 users: 1-5GB (100-500 uploads)
10-20 users: 5-20GB (500-2000 uploads)
50+ users: 20-100GB (2000+ uploads)

Lifecycle policy: Never expires (user data)

Access pattern:

Write: Occasional (when users upload files)
Read: Frequent (every time issue/MR with attachment is viewed)
Delete: Manual (when issue/MR deleted)

Example uploads:

/uploads-gitlabbda-kup6s/
  ├── @hashed/ab/cd/abcdef123.../
  │   ├── screenshot.png
  │   ├── architecture-diagram.pdf
  │   └── user-avatar.jpg

GitLab configuration:

global:
  appConfig:
    uploads:
      enabled: true
      bucket: uploads-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

LFS Bucket¶

Name: lfs-gitlabbda-kup6s

Purpose: Git Large File Storage objects

Contents:

Large files tracked by git (videos, datasets, machine learning models)
Binary files (executables, compiled libraries)
Design files (PSD, Sketch, Figma exports)

How Git LFS works:

1. git add large-file.mp4
   → Git stores pointer file in repo (100 bytes)
   → Actual file uploaded to S3 (100 MB)

2. git clone repo
   → Git downloads pointer files
   → LFS downloads actual files from S3

Size estimates:

2-5 users, no LFS: 0GB (most teams don’t use LFS)
2-5 users, with LFS: 10-50GB (ML/data science teams)
50+ users, heavy LFS: 100-500GB (game development, video production)

Lifecycle policy: Never expires (referenced by git commits)

Access pattern:

Write: Occasional (git push with LFS files)
Read: Frequent (git clone, git pull fetch LFS objects)
Delete: Manual (when LFS object no longer referenced)

Example LFS objects:

/lfs-gitlabbda-kup6s/
  ├── ab/cd/abcdef1234567890.../
  │   ├── dataset.csv (100 MB)
  │   ├── model.h5 (500 MB)
  │   └── video.mp4 (1 GB)

GitLab configuration:

global:
  appConfig:
    lfs:
      enabled: true
      bucket: lfs-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

Pages Bucket¶

Name: pages-gitlabbda-kup6s

Purpose: GitLab Pages static site hosting

Contents:

HTML, CSS, JavaScript files
Images, fonts, static assets
Generated documentation (Sphinx, Doxygen, JSDoc)
Static site generators (Hugo, Jekyll, Gatsby outputs)

How Pages works:

CI job builds site → artifacts: [public/]
GitLab Pages daemon downloads artifacts
Extracts to S3 bucket
User visits https://project.pages.example.com
Pages server reads HTML from S3, serves to user

Size estimates:

2-5 users, 5-10 sites: 0.1-5GB
10-20 users, 20-50 sites: 5-20GB
50+ users, 100+ sites: 20-100GB

Lifecycle policy: Manual (delete when project deleted or Pages disabled)

Access pattern:

Write: Occasional (CI deployment updates site)
Read: Frequent (every page view)
Delete: Manual

Example pages:

/pages-gitlabbda-kup6s/
  ├── @hashed/ab/cd/project-123/
  │   ├── index.html
  │   ├── style.css
  │   ├── script.js
  │   └── images/logo.png

GitLab configuration:

global:
  appConfig:
    pages:
      enabled: true
      bucket: pages-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

Registry Bucket¶

Name: registry-gitlabbda-kup6s

Purpose: Harbor container registry storage

Contents:

OCI image layers (Docker, containerd compatible)
Image manifests (layer lists)
Image tags (mutable pointers to manifests)

How Harbor uses S3:

1. docker push registry.example.com/project/image:tag
   → Harbor Registry receives layers
   → Stores in S3: /docker/registry/v2/blobs/sha256/{hash}

2. docker pull registry.example.com/project/image:tag
   → Harbor Registry reads manifest from PostgreSQL
   → Streams layers from S3 to client

Size estimates:

2-5 users, 10-50 images: 10-50GB
10-20 users, 100-200 images: 50-200GB
50+ users, 500+ images: 200-1000GB

Lifecycle policy: Manual garbage collection

Garbage collection (via Harbor UI):

# Harbor JobService runs GC job
# Deletes unreferenced blobs (layers not used by any image)
# Frees up space in S3

Access pattern:

Write: Frequent (docker push)
Read: Very frequent (docker pull from CI, production)
Delete: Manual (via Harbor GC)

Example registry structure:

/registry-gitlabbda-kup6s/
  └── docker/
      └── registry/
          └── v2/
              ├── blobs/
              │   └── sha256/
              │       ├── ab/cd/abcdef.../data (image layer)
              │       └── 12/34/123456.../data (image layer)
              └── repositories/
                  └── project/
                      └── image/
                          └── _manifests/
                              └── tags/
                                  └── latest/

Harbor Registry configuration:

# In harbor.ts (environment variables)
env:
  - name: REGISTRY_STORAGE
    value: s3
  - name: REGISTRY_STORAGE_S3_BUCKET
    valueFrom: {secretKeyRef: {name: harbor-s3-credentials, key: bucket}}
  - name: REGISTRY_STORAGE_S3_REGIONENDPOINT
    valueFrom: {secretKeyRef: {name: harbor-s3-credentials, key: endpoint}}

Backups Bucket¶

Name: backups-gitlabbda-kup6s

Purpose: GitLab application backups (via Toolbox)

Contents:

Database dumps (PostgreSQL pg_dump)
Repository archives (tar.gz of all git repos)
Uploads backup (copy of uploads bucket)
LFS backup (copy of LFS bucket)
CI artifacts backup (copy of artifacts bucket)

Backup format:

TIMESTAMP_gitlab_backup.tar
  ├── db.sql.gz (PostgreSQL dump)
  ├── repositories.tar.gz (all git repos)
  ├── uploads.tar.gz (user uploads)
  ├── lfs.tar.gz (Git LFS objects)
  ├── artifacts.tar.gz (CI artifacts)
  └── pages.tar.gz (static sites)

Size estimates:

2-5 users: 20-50GB (1-2 backups)
10-20 users: 50-200GB (3-4 backups)
50+ users: 200-1000GB (7+ backups)

Lifecycle policy: Retention policy (configurable)

Recommended retention:

# GitLab backup_keep_time setting
backup_keep_time: 604800  # 7 days

# Or tiered retention (manual cleanup):
# - Keep 7 daily backups (last 7 days)
# - Keep 4 weekly backups (last 4 weeks)
# - Keep 3 monthly backups (last 3 months)

Access pattern:

Write: Daily (automated backup via CronJob)
Read: Rare (only during restore)
Delete: Automated (old backups expired)

Example backups:

/backups-gitlabbda-kup6s/
  ├── 1730000000_2025_10_27_gitlab_backup.tar (latest)
  ├── 1729913600_2025_10_26_gitlab_backup.tar
  ├── 1729827200_2025_10_25_gitlab_backup.tar
  └── ... (older backups)

GitLab Toolbox configuration:

global:
  appConfig:
    backups:
      bucket: backups-gitlabbda-kup6s
      tmpBucket: backups-gitlabbda-kup6s-tmp

Backup command:

kubectl exec -it deploy/gitlab-toolbox -n gitlabbda -- bash
gitlab-backup create
# Uploads to s3://backups-gitlabbda-kup6s/

PostgresBackups Bucket¶

Name: postgresbackups-gitlabbda-kup6s

Purpose: PostgreSQL CNPG backups via Barman Cloud Plugin

Contents:

Base backups (full database snapshot, daily)
WAL archives (write-ahead logs, continuous)
Backup metadata (PITR information)

How CNPG backups work:

1. PostgreSQL writes WAL segments
   → CNPG Barman plugin uploads to S3 (continuous)

2. Daily base backup (full snapshot)
   → CNPG creates pg_basebackup
   → Uploads to S3

3. Point-in-time recovery (PITR)
   → Restore base backup
   → Replay WAL segments to specific timestamp

Size estimates:

2-5 users: 5-10GB (1-2 base backups + WAL)
10-20 users: 10-30GB (2-3 base backups + WAL)
50+ users: 30-100GB (3-5 base backups + WAL)

Lifecycle policy: 30 days (configurable in CNPG Cluster)

# In database.ts (CNPG Cluster spec)
spec:
  backup:
    retentionPolicy: 30d  # Keep backups for 30 days

Access pattern:

Write: Continuous (WAL archiving), daily (base backups)
Read: Rare (only during restore/recovery)
Delete: Automated (CNPG expires old backups)

Example backups:

/postgresbackups-gitlabbda-kup6s/
  ├── base/
  │   ├── 20251027T000000/ (base backup)
  │   └── 20251026T000000/
  └── wals/
      ├── 000000010000000000000001 (WAL segment)
      ├── 000000010000000000000002
      └── ...

CNPG Barman configuration:

# In database.ts (ObjectStore spec)
apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
spec:
  configuration:
    destinationPath: s3://postgresbackups-gitlabbda-kup6s/
    endpointURL: https://fsn1.your-objectstorage.com
    s3Credentials:
      accessKeyId: {secretKeyRef: {name: gitlab-s3-credentials, key: AWS_ACCESS_KEY_ID}}
      secretAccessKey: {secretKeyRef: {name: gitlab-s3-credentials, key: AWS_SECRET_ACCESS_KEY}}

Cache Bucket¶

Name: cache-gitlabbda-kup6s

Purpose: GitLab Runner build cache

Contents:

Dependency caches (npm node_modules, pip packages, Maven .m2)
Build caches (incremental compilation, ccache)
Docker layer cache (for Docker-in-Docker builds)

How cache works:

# .gitlab-ci.yml
build:
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/
      - .npm/

# First run:
# 1. npm install (downloads packages)
# 2. Runner uploads node_modules/ to S3
# 3. Job finishes

# Second run (same branch):
# 1. Runner downloads node_modules/ from S3
# 2. npm install (reuses cache, faster)
# 3. Job finishes

Size estimates:

2-5 users, light CI: 5-10GB
10-20 users, moderate CI: 10-30GB
50+ users, heavy CI: 30-100GB

Lifecycle policy: LRU auto-expire (Least Recently Used)

GitLab Runner cache expiration:

Caches unused for 7 days → deleted
Configurable via Runner cache settings

Access pattern:

Write: Frequent (every CI job uploads cache)
Read: Very frequent (every CI job downloads cache)
Delete: Automated (LRU eviction)

Example cache:

/cache-gitlabbda-kup6s/
  ├── project-123/
  │   ├── main/
  │   │   └── cache.zip (node_modules)
  │   └── feature-456/
  │       └── cache.zip (node_modules)

GitLab configuration:

# In gitlab-helm.ts (Helm values)
global:
  appConfig:
    packages:  # Packages uses cache bucket
      enabled: true
      bucket: cache-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

Bucket Monitoring¶

Size Monitoring¶

Check bucket sizes:

# Via kubectl (Crossplane Bucket status)
kubectl get buckets -n crossplane-system

# Via aws CLI (Hetzner S3 compatible)
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=yyy
aws s3 ls s3://artifacts-gitlabbda-kup6s --endpoint-url=https://fsn1.your-objectstorage.com --recursive --summarize

Grafana dashboard (future):

S3 bucket sizes over time
Growth rate projections
Alerts for buckets >80% of expected size

Cost Monitoring¶

Hetzner S3 pricing (as of 2025):

€0.01/GB/month storage
No egress fees (within Hetzner network)

Cost estimation (2-5 users):

Artifacts: 20GB × €0.01 = €0.20/month
Uploads: 5GB × €0.01 = €0.05/month
LFS: 50GB × €0.01 = €0.50/month
Pages: 5GB × €0.01 = €0.05/month
Registry: 100GB × €0.01 = €1.00/month
Backups: 50GB × €0.01 = €0.50/month
PostgresBackups: 20GB × €0.01 = €0.20/month
Cache: 20GB × €0.01 = €0.20/month
---
Total: 270GB × €0.01 = €2.70/month

Scaling cost (50+ users):

Total: ~1TB × €0.01 = €10/month

Troubleshooting¶

Bucket Not Found¶

Symptom: GitLab logs show NoSuchBucket error

Diagnosis:

kubectl get bucket artifacts-gitlabbda-kup6s -n crossplane-system

Common causes:

Bucket CR not synced - Check ArgoCD Application status
Crossplane not ready - Check Crossplane operator logs
ProviderConfig invalid - Check hetzner-s3 ProviderConfig

Solution: Ensure Bucket CR is READY=True, SYNCED=True

Access Denied¶

Symptom: GitLab logs show AccessDenied or 403 errors

Diagnosis:

kubectl get secret gitlab-s3-credentials -n gitlabbda -o yaml

Common causes:

S3 credentials invalid - Check access key/secret key
Bucket policy restrictive - (Hetzner S3 doesn’t support bucket policies)
Secret not synced - Check ExternalSecret status

Solution: Verify S3 credentials in application-secrets namespace

Slow Uploads/Downloads¶

Symptom: CI jobs slow, timeouts uploading artifacts

Diagnosis:

# Test S3 speed from pod
kubectl run -it s3-test --image=amazon/aws-cli --rm -- \
  s3 cp /tmp/test s3://artifacts-gitlabbda-kup6s/test \
  --endpoint-url=https://fsn1.your-objectstorage.com

Common causes:

Network congestion - Check cluster network bandwidth
S3 endpoint slow - (Rare, Hetzner infrastructure issue)
Large files, no multipart - GitLab uses multipart for >10MB

Solution: Usually transient, retry job

Summary¶

8 S3 buckets, all in fsn1 region:

artifacts - CI/CD artifacts (30d lifecycle)
uploads - User uploads (never expires)
lfs - Git LFS objects (never expires)
pages - Static sites (manual cleanup)
registry - Container images (manual GC)
backups - GitLab backups (7d retention)
postgresbackups - PostgreSQL backups (30d retention)
cache - Build cache (LRU auto-expire)

Key characteristics:

Naming: {purpose}-gitlabbda-kup6s (global uniqueness)
Region: fsn1 (same as cluster)
Provisioning: Crossplane Bucket CRs
Management policies: [Observe, Create, Delete] (skip Update)
Deletion policy: Orphan (safety)

Total cost (2-5 users): ~€3/month (270GB × €0.01)

For implementation details:

Storage Architecture - S3 strategy
Constructs API Reference - Bucket provisioning
Configuration Reference - Configuration values