Reference

S3 Buckets


Overview

GitLab BDA uses 8 S3 buckets for object storage, all provisioned via Crossplane in the fsn1 (Falkenstein) region.

Bucket naming pattern: {purpose}-gitlabbda-kup6s

Why this pattern? Hetzner S3 bucket names are globally unique across all customers (like AWS). The suffix -gitlabbda-kup6s ensures uniqueness.

Total approximate storage (2-5 users): 50-200GB


Bucket Catalog

Bucket Name

Purpose

Est. Size (2-5 users)

Lifecycle

Critical?

artifacts-gitlabbda-kup6s

CI/CD artifacts

5-20GB

30 days (configurable)

No

uploads-gitlabbda-kup6s

User uploads

1-5GB

Never

Yes

lfs-gitlabbda-kup6s

Git LFS objects

0-50GB

Never

Yes

pages-gitlabbda-kup6s

Static sites

0.1-5GB

Manual

No

registry-gitlabbda-kup6s

Container images

10-100GB

Manual GC

No

backups-gitlabbda-kup6s

GitLab backups

20-50GB

7d/4w/3m

Yes

postgresbackups-gitlabbda-kup6s

CNPG backups

5-20GB

30 days

Yes

cache-gitlabbda-kup6s

Build cache

5-20GB

LRU auto-expire

No


Bucket Naming Convention

Pattern

{purpose}-{deployment}-{cluster}

Components:

  • {purpose} - Descriptive purpose (artifacts, uploads, backups, etc.)

  • {deployment} - Deployment name (gitlabbda)

  • {cluster} - Cluster identifier (kup6s)

Examples

Bucket

Purpose

Deployment

Cluster

Full Name

Artifacts

artifacts

gitlabbda

kup6s

artifacts-gitlabbda-kup6s

Backups

backups

gitlabbda

kup6s

backups-gitlabbda-kup6s

Why This Pattern?

Problem: Generic names fail due to global uniqueness

# These will fail (already taken by other Hetzner customers):
 artifacts
 backups
 gitlab-artifacts
 uploads

Solution: Add deployment + cluster suffix for uniqueness

# These succeed (unique to your cluster):
 artifacts-gitlabbda-kup6s
 backups-gitlabbda-kup6s
 uploads-gitlabbda-kup6s

Benefits:

  • Globally unique - Very unlikely someone else uses gitlabbda-kup6s suffix

  • Self-documenting - Bucket name tells you deployment and cluster

  • Multi-deployment safe - Can deploy multiple GitLab instances (different deployment name)


Crossplane Bucket Specification

All buckets are created via Crossplane Bucket CRs:

apiVersion: s3.aws.upbound.io/v1beta2
kind: Bucket
metadata:
  name: {bucket-name}
  namespace: crossplane-system  # NOT gitlabbda namespace
  labels:
    app.kubernetes.io/managed-by: cdk8s
    app.kubernetes.io/part-of: gitlab
    app.kubernetes.io/component: storage
  annotations:
    argocd.argoproj.io/sync-wave: "1"
    crossplane.io/external-name: {bucket-name}  # Actual bucket name in Hetzner S3
    description: {bucket-purpose}
spec:
  forProvider:
    region: fsn1  # Hetzner region (Falkenstein, Germany)
  providerConfigRef:
    name: hetzner-s3  # Cluster-managed ProviderConfig
  managementPolicies: [Observe, Create, Delete]  # Skip Update (no tagging)
  deletionPolicy: Orphan  # Safety: keep bucket on CR deletion

Critical Configuration Fields

managementPolicies

managementPolicies: [Observe, Create, Delete]

Why Skip Update?

  • Hetzner S3 doesn’t support tagging operations

  • Crossplane Update tries to apply tags → 501 Not Implemented error

  • Skipping Update avoids SYNCED=False errors

With Update (wrong):

NAME                              SYNCED   READY   EXTERNAL-NAME
artifacts-gitlabbda-kup6s         False    True    artifacts-gitlabbda-kup6s

Without Update (correct):

NAME                              SYNCED   READY   EXTERNAL-NAME
artifacts-gitlabbda-kup6s         True     True    artifacts-gitlabbda-kup6s

deletionPolicy

deletionPolicy: Orphan

What it does: Keeps S3 bucket when Bucket CR is deleted

Why Orphan?

  • Safety - Accidental CR deletion doesn’t delete data

  • Migration - Can recreate CR without losing bucket contents

  • Disaster recovery - Bucket survives cluster destruction

Alternative: deletionPolicy: Delete (dangerous - deletes bucket with CR)

forProvider.region

region: fsn1

Available Hetzner S3 regions:

  • fsn1 - Falkenstein, Germany (default)

  • nbg1 - Nuremberg, Germany

  • hel1 - Helsinki, Finland

Why fsn1?

  • Same datacenter region as cluster (low latency)

  • No cross-region egress fees

providerConfigRef

providerConfigRef:
  name: hetzner-s3

ProviderConfig location: crossplane-system namespace (cluster infrastructure)

Contains:

  • Hetzner S3 endpoint (https://fsn1.your-objectstorage.com)

  • S3 credentials (access key, secret key)

  • S3-specific settings (skip_region_validation, s3_use_path_style)

For ProviderConfig details, see Main Cluster Docs: Crossplane S3.


Bucket Details

Artifacts Bucket

Name: artifacts-gitlabbda-kup6s

Purpose: GitLab CI/CD artifacts storage

Contents:

  • Build outputs (compiled binaries, JAR files, Docker images)

  • Test results (JUnit XML, coverage reports)

  • Job logs (stdout/stderr from CI jobs)

  • Pipeline artifacts (downloadable files from GitLab UI)

Size estimates:

  • 2-5 users: 5-20GB (10-50 pipelines/week)

  • 10-20 users: 20-100GB (50-200 pipelines/week)

  • 50+ users: 100-500GB (200+ pipelines/week)

Lifecycle policy:

  • Default: 30 days (configurable in GitLab settings)

  • Recommendation: 7 days for feature branches, 90 days for main/production

Access pattern:

  • Write: Frequent (every CI job uploads artifacts)

  • Read: Occasional (downloading artifacts from GitLab UI)

  • Delete: Automatic (GitLab expires old artifacts)

Example artifacts:

/artifacts-gitlabbda-kup6s/
  ├── gitlab/project-123/
  │   ├── 456-build-output.zip
  │   ├── 457-coverage-report.html
  │   └── 458-test-results.xml

GitLab configuration:

# In gitlab-helm.ts (Helm values)
global:
  appConfig:
    artifacts:
      enabled: true
      bucket: artifacts-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

Uploads Bucket

Name: uploads-gitlabbda-kup6s

Purpose: User uploads and attachments

Contents:

  • Issue attachments (screenshots, PDFs, documents)

  • Merge request comments (images, diagrams)

  • Wiki uploads (images, files)

  • User avatars

  • Group/project logos

Size estimates:

  • 2-5 users: 1-5GB (100-500 uploads)

  • 10-20 users: 5-20GB (500-2000 uploads)

  • 50+ users: 20-100GB (2000+ uploads)

Lifecycle policy: Never expires (user data)

Access pattern:

  • Write: Occasional (when users upload files)

  • Read: Frequent (every time issue/MR with attachment is viewed)

  • Delete: Manual (when issue/MR deleted)

Example uploads:

/uploads-gitlabbda-kup6s/
  ├── @hashed/ab/cd/abcdef123.../
  │   ├── screenshot.png
  │   ├── architecture-diagram.pdf
  │   └── user-avatar.jpg

GitLab configuration:

global:
  appConfig:
    uploads:
      enabled: true
      bucket: uploads-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

LFS Bucket

Name: lfs-gitlabbda-kup6s

Purpose: Git Large File Storage objects

Contents:

  • Large files tracked by git (videos, datasets, machine learning models)

  • Binary files (executables, compiled libraries)

  • Design files (PSD, Sketch, Figma exports)

How Git LFS works:

1. git add large-file.mp4
   → Git stores pointer file in repo (100 bytes)
   → Actual file uploaded to S3 (100 MB)

2. git clone repo
   → Git downloads pointer files
   → LFS downloads actual files from S3

Size estimates:

  • 2-5 users, no LFS: 0GB (most teams don’t use LFS)

  • 2-5 users, with LFS: 10-50GB (ML/data science teams)

  • 50+ users, heavy LFS: 100-500GB (game development, video production)

Lifecycle policy: Never expires (referenced by git commits)

Access pattern:

  • Write: Occasional (git push with LFS files)

  • Read: Frequent (git clone, git pull fetch LFS objects)

  • Delete: Manual (when LFS object no longer referenced)

Example LFS objects:

/lfs-gitlabbda-kup6s/
  ├── ab/cd/abcdef1234567890.../
  │   ├── dataset.csv (100 MB)
  │   ├── model.h5 (500 MB)
  │   └── video.mp4 (1 GB)

GitLab configuration:

global:
  appConfig:
    lfs:
      enabled: true
      bucket: lfs-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

Pages Bucket

Name: pages-gitlabbda-kup6s

Purpose: GitLab Pages static site hosting

Contents:

  • HTML, CSS, JavaScript files

  • Images, fonts, static assets

  • Generated documentation (Sphinx, Doxygen, JSDoc)

  • Static site generators (Hugo, Jekyll, Gatsby outputs)

How Pages works:

1. CI job builds site → artifacts: [public/]
2. GitLab Pages daemon downloads artifacts
3. Extracts to S3 bucket
4. User visits https://project.pages.example.com
5. Pages server reads HTML from S3, serves to user

Size estimates:

  • 2-5 users, 5-10 sites: 0.1-5GB

  • 10-20 users, 20-50 sites: 5-20GB

  • 50+ users, 100+ sites: 20-100GB

Lifecycle policy: Manual (delete when project deleted or Pages disabled)

Access pattern:

  • Write: Occasional (CI deployment updates site)

  • Read: Frequent (every page view)

  • Delete: Manual

Example pages:

/pages-gitlabbda-kup6s/
  ├── @hashed/ab/cd/project-123/
  │   ├── index.html
  │   ├── style.css
  │   ├── script.js
  │   └── images/logo.png

GitLab configuration:

global:
  appConfig:
    pages:
      enabled: true
      bucket: pages-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

Registry Bucket

Name: registry-gitlabbda-kup6s

Purpose: Harbor container registry storage

Contents:

  • OCI image layers (Docker, containerd compatible)

  • Image manifests (layer lists)

  • Image tags (mutable pointers to manifests)

How Harbor uses S3:

1. docker push registry.example.com/project/image:tag
   → Harbor Registry receives layers
   → Stores in S3: /docker/registry/v2/blobs/sha256/{hash}

2. docker pull registry.example.com/project/image:tag
   → Harbor Registry reads manifest from PostgreSQL
   → Streams layers from S3 to client

Size estimates:

  • 2-5 users, 10-50 images: 10-50GB

  • 10-20 users, 100-200 images: 50-200GB

  • 50+ users, 500+ images: 200-1000GB

Lifecycle policy: Manual garbage collection

Garbage collection (via Harbor UI):

# Harbor JobService runs GC job
# Deletes unreferenced blobs (layers not used by any image)
# Frees up space in S3

Access pattern:

  • Write: Frequent (docker push)

  • Read: Very frequent (docker pull from CI, production)

  • Delete: Manual (via Harbor GC)

Example registry structure:

/registry-gitlabbda-kup6s/
  └── docker/
      └── registry/
          └── v2/
              ├── blobs/
              │   └── sha256/
              │       ├── ab/cd/abcdef.../data (image layer)
              │       └── 12/34/123456.../data (image layer)
              └── repositories/
                  └── project/
                      └── image/
                          └── _manifests/
                              └── tags/
                                  └── latest/

Harbor Registry configuration:

# In harbor.ts (environment variables)
env:
  - name: REGISTRY_STORAGE
    value: s3
  - name: REGISTRY_STORAGE_S3_BUCKET
    valueFrom: {secretKeyRef: {name: harbor-s3-credentials, key: bucket}}
  - name: REGISTRY_STORAGE_S3_REGIONENDPOINT
    valueFrom: {secretKeyRef: {name: harbor-s3-credentials, key: endpoint}}

Backups Bucket

Name: backups-gitlabbda-kup6s

Purpose: GitLab application backups (via Toolbox)

Contents:

  • Database dumps (PostgreSQL pg_dump)

  • Repository archives (tar.gz of all git repos)

  • Uploads backup (copy of uploads bucket)

  • LFS backup (copy of LFS bucket)

  • CI artifacts backup (copy of artifacts bucket)

Backup format:

TIMESTAMP_gitlab_backup.tar
  ├── db.sql.gz (PostgreSQL dump)
  ├── repositories.tar.gz (all git repos)
  ├── uploads.tar.gz (user uploads)
  ├── lfs.tar.gz (Git LFS objects)
  ├── artifacts.tar.gz (CI artifacts)
  └── pages.tar.gz (static sites)

Size estimates:

  • 2-5 users: 20-50GB (1-2 backups)

  • 10-20 users: 50-200GB (3-4 backups)

  • 50+ users: 200-1000GB (7+ backups)

Lifecycle policy: Retention policy (configurable)

Recommended retention:

# GitLab backup_keep_time setting
backup_keep_time: 604800  # 7 days

# Or tiered retention (manual cleanup):
# - Keep 7 daily backups (last 7 days)
# - Keep 4 weekly backups (last 4 weeks)
# - Keep 3 monthly backups (last 3 months)

Access pattern:

  • Write: Daily (automated backup via CronJob)

  • Read: Rare (only during restore)

  • Delete: Automated (old backups expired)

Example backups:

/backups-gitlabbda-kup6s/
  ├── 1730000000_2025_10_27_gitlab_backup.tar (latest)
  ├── 1729913600_2025_10_26_gitlab_backup.tar
  ├── 1729827200_2025_10_25_gitlab_backup.tar
  └── ... (older backups)

GitLab Toolbox configuration:

global:
  appConfig:
    backups:
      bucket: backups-gitlabbda-kup6s
      tmpBucket: backups-gitlabbda-kup6s-tmp

Backup command:

kubectl exec -it deploy/gitlab-toolbox -n gitlabbda -- bash
gitlab-backup create
# Uploads to s3://backups-gitlabbda-kup6s/

PostgresBackups Bucket

Name: postgresbackups-gitlabbda-kup6s

Purpose: PostgreSQL CNPG backups via Barman Cloud Plugin

Contents:

  • Base backups (full database snapshot, daily)

  • WAL archives (write-ahead logs, continuous)

  • Backup metadata (PITR information)

How CNPG backups work:

1. PostgreSQL writes WAL segments
   → CNPG Barman plugin uploads to S3 (continuous)

2. Daily base backup (full snapshot)
   → CNPG creates pg_basebackup
   → Uploads to S3

3. Point-in-time recovery (PITR)
   → Restore base backup
   → Replay WAL segments to specific timestamp

Size estimates:

  • 2-5 users: 5-10GB (1-2 base backups + WAL)

  • 10-20 users: 10-30GB (2-3 base backups + WAL)

  • 50+ users: 30-100GB (3-5 base backups + WAL)

Lifecycle policy: 30 days (configurable in CNPG Cluster)

# In database.ts (CNPG Cluster spec)
spec:
  backup:
    retentionPolicy: 30d  # Keep backups for 30 days

Access pattern:

  • Write: Continuous (WAL archiving), daily (base backups)

  • Read: Rare (only during restore/recovery)

  • Delete: Automated (CNPG expires old backups)

Example backups:

/postgresbackups-gitlabbda-kup6s/
  ├── base/
  │   ├── 20251027T000000/ (base backup)
  │   └── 20251026T000000/
  └── wals/
      ├── 000000010000000000000001 (WAL segment)
      ├── 000000010000000000000002
      └── ...

CNPG Barman configuration:

# In database.ts (ObjectStore spec)
apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
spec:
  configuration:
    destinationPath: s3://postgresbackups-gitlabbda-kup6s/
    endpointURL: https://fsn1.your-objectstorage.com
    s3Credentials:
      accessKeyId: {secretKeyRef: {name: gitlab-s3-credentials, key: AWS_ACCESS_KEY_ID}}
      secretAccessKey: {secretKeyRef: {name: gitlab-s3-credentials, key: AWS_SECRET_ACCESS_KEY}}

Cache Bucket

Name: cache-gitlabbda-kup6s

Purpose: GitLab Runner build cache

Contents:

  • Dependency caches (npm node_modules, pip packages, Maven .m2)

  • Build caches (incremental compilation, ccache)

  • Docker layer cache (for Docker-in-Docker builds)

How cache works:

# .gitlab-ci.yml
build:
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/
      - .npm/

# First run:
# 1. npm install (downloads packages)
# 2. Runner uploads node_modules/ to S3
# 3. Job finishes

# Second run (same branch):
# 1. Runner downloads node_modules/ from S3
# 2. npm install (reuses cache, faster)
# 3. Job finishes

Size estimates:

  • 2-5 users, light CI: 5-10GB

  • 10-20 users, moderate CI: 10-30GB

  • 50+ users, heavy CI: 30-100GB

Lifecycle policy: LRU auto-expire (Least Recently Used)

GitLab Runner cache expiration:

  • Caches unused for 7 days → deleted

  • Configurable via Runner cache settings

Access pattern:

  • Write: Frequent (every CI job uploads cache)

  • Read: Very frequent (every CI job downloads cache)

  • Delete: Automated (LRU eviction)

Example cache:

/cache-gitlabbda-kup6s/
  ├── project-123/
  │   ├── main/
  │   │   └── cache.zip (node_modules)
  │   └── feature-456/
  │       └── cache.zip (node_modules)

GitLab configuration:

# In gitlab-helm.ts (Helm values)
global:
  appConfig:
    packages:  # Packages uses cache bucket
      enabled: true
      bucket: cache-gitlabbda-kup6s
      connection:
        secret: gitlab-s3-credentials
        key: connection

Bucket Monitoring

Size Monitoring

Check bucket sizes:

# Via kubectl (Crossplane Bucket status)
kubectl get buckets -n crossplane-system

# Via aws CLI (Hetzner S3 compatible)
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=yyy
aws s3 ls s3://artifacts-gitlabbda-kup6s --endpoint-url=https://fsn1.your-objectstorage.com --recursive --summarize

Grafana dashboard (future):

  • S3 bucket sizes over time

  • Growth rate projections

  • Alerts for buckets >80% of expected size

Cost Monitoring

Hetzner S3 pricing (as of 2025):

  • €0.01/GB/month storage

  • No egress fees (within Hetzner network)

Cost estimation (2-5 users):

Artifacts: 20GB × €0.01 = €0.20/month
Uploads: 5GB × €0.01 = €0.05/month
LFS: 50GB × €0.01 = €0.50/month
Pages: 5GB × €0.01 = €0.05/month
Registry: 100GB × €0.01 = €1.00/month
Backups: 50GB × €0.01 = €0.50/month
PostgresBackups: 20GB × €0.01 = €0.20/month
Cache: 20GB × €0.01 = €0.20/month
---
Total: 270GB × €0.01 = €2.70/month

Scaling cost (50+ users):

Total: ~1TB × €0.01 = €10/month

Troubleshooting

Bucket Not Found

Symptom: GitLab logs show NoSuchBucket error

Diagnosis:

kubectl get bucket artifacts-gitlabbda-kup6s -n crossplane-system

Common causes:

  1. Bucket CR not synced - Check ArgoCD Application status

  2. Crossplane not ready - Check Crossplane operator logs

  3. ProviderConfig invalid - Check hetzner-s3 ProviderConfig

Solution: Ensure Bucket CR is READY=True, SYNCED=True

Access Denied

Symptom: GitLab logs show AccessDenied or 403 errors

Diagnosis:

kubectl get secret gitlab-s3-credentials -n gitlabbda -o yaml

Common causes:

  1. S3 credentials invalid - Check access key/secret key

  2. Bucket policy restrictive - (Hetzner S3 doesn’t support bucket policies)

  3. Secret not synced - Check ExternalSecret status

Solution: Verify S3 credentials in application-secrets namespace

Slow Uploads/Downloads

Symptom: CI jobs slow, timeouts uploading artifacts

Diagnosis:

# Test S3 speed from pod
kubectl run -it s3-test --image=amazon/aws-cli --rm -- \
  s3 cp /tmp/test s3://artifacts-gitlabbda-kup6s/test \
  --endpoint-url=https://fsn1.your-objectstorage.com

Common causes:

  1. Network congestion - Check cluster network bandwidth

  2. S3 endpoint slow - (Rare, Hetzner infrastructure issue)

  3. Large files, no multipart - GitLab uses multipart for >10MB

Solution: Usually transient, retry job


Summary

8 S3 buckets, all in fsn1 region:

  1. artifacts - CI/CD artifacts (30d lifecycle)

  2. uploads - User uploads (never expires)

  3. lfs - Git LFS objects (never expires)

  4. pages - Static sites (manual cleanup)

  5. registry - Container images (manual GC)

  6. backups - GitLab backups (7d retention)

  7. postgresbackups - PostgreSQL backups (30d retention)

  8. cache - Build cache (LRU auto-expire)

Key characteristics:

  • Naming: {purpose}-gitlabbda-kup6s (global uniqueness)

  • Region: fsn1 (same as cluster)

  • Provisioning: Crossplane Bucket CRs

  • Management policies: [Observe, Create, Delete] (skip Update)

  • Deletion policy: Orphan (safety)

Total cost (2-5 users): ~€3/month (270GB × €0.01)

For implementation details: