Reference
S3 Buckets¶
Overview¶
GitLab BDA uses 8 S3 buckets for object storage, all provisioned via Crossplane in the fsn1 (Falkenstein) region.
Bucket naming pattern: {purpose}-gitlabbda-kup6s
Why this pattern? Hetzner S3 bucket names are globally unique across all customers (like AWS). The suffix -gitlabbda-kup6s ensures uniqueness.
Total approximate storage (2-5 users): 50-200GB
Bucket Catalog¶
Bucket Name |
Purpose |
Est. Size (2-5 users) |
Lifecycle |
Critical? |
|---|---|---|---|---|
CI/CD artifacts |
5-20GB |
30 days (configurable) |
No |
|
User uploads |
1-5GB |
Never |
Yes |
|
Git LFS objects |
0-50GB |
Never |
Yes |
|
Static sites |
0.1-5GB |
Manual |
No |
|
Container images |
10-100GB |
Manual GC |
No |
|
GitLab backups |
20-50GB |
7d/4w/3m |
Yes |
|
CNPG backups |
5-20GB |
30 days |
Yes |
|
Build cache |
5-20GB |
LRU auto-expire |
No |
Bucket Naming Convention¶
Pattern¶
{purpose}-{deployment}-{cluster}
Components:
{purpose}- Descriptive purpose (artifacts, uploads, backups, etc.){deployment}- Deployment name (gitlabbda){cluster}- Cluster identifier (kup6s)
Examples¶
Bucket |
Purpose |
Deployment |
Cluster |
Full Name |
|---|---|---|---|---|
Artifacts |
|
|
|
|
Backups |
|
|
|
|
Why This Pattern?¶
Problem: Generic names fail due to global uniqueness
# These will fail (already taken by other Hetzner customers):
✗ artifacts
✗ backups
✗ gitlab-artifacts
✗ uploads
Solution: Add deployment + cluster suffix for uniqueness
# These succeed (unique to your cluster):
✓ artifacts-gitlabbda-kup6s
✓ backups-gitlabbda-kup6s
✓ uploads-gitlabbda-kup6s
Benefits:
Globally unique - Very unlikely someone else uses
gitlabbda-kup6ssuffixSelf-documenting - Bucket name tells you deployment and cluster
Multi-deployment safe - Can deploy multiple GitLab instances (different deployment name)
Crossplane Bucket Specification¶
All buckets are created via Crossplane Bucket CRs:
apiVersion: s3.aws.upbound.io/v1beta2
kind: Bucket
metadata:
name: {bucket-name}
namespace: crossplane-system # NOT gitlabbda namespace
labels:
app.kubernetes.io/managed-by: cdk8s
app.kubernetes.io/part-of: gitlab
app.kubernetes.io/component: storage
annotations:
argocd.argoproj.io/sync-wave: "1"
crossplane.io/external-name: {bucket-name} # Actual bucket name in Hetzner S3
description: {bucket-purpose}
spec:
forProvider:
region: fsn1 # Hetzner region (Falkenstein, Germany)
providerConfigRef:
name: hetzner-s3 # Cluster-managed ProviderConfig
managementPolicies: [Observe, Create, Delete] # Skip Update (no tagging)
deletionPolicy: Orphan # Safety: keep bucket on CR deletion
Critical Configuration Fields¶
managementPolicies¶
managementPolicies: [Observe, Create, Delete]
Why Skip Update?
Hetzner S3 doesn’t support tagging operations
Crossplane Update tries to apply tags → 501 Not Implemented error
Skipping Update avoids
SYNCED=Falseerrors
With Update (wrong):
NAME SYNCED READY EXTERNAL-NAME
artifacts-gitlabbda-kup6s False True artifacts-gitlabbda-kup6s
Without Update (correct):
NAME SYNCED READY EXTERNAL-NAME
artifacts-gitlabbda-kup6s True True artifacts-gitlabbda-kup6s
deletionPolicy¶
deletionPolicy: Orphan
What it does: Keeps S3 bucket when Bucket CR is deleted
Why Orphan?
Safety - Accidental CR deletion doesn’t delete data
Migration - Can recreate CR without losing bucket contents
Disaster recovery - Bucket survives cluster destruction
Alternative: deletionPolicy: Delete (dangerous - deletes bucket with CR)
forProvider.region¶
region: fsn1
Available Hetzner S3 regions:
fsn1- Falkenstein, Germany (default)nbg1- Nuremberg, Germanyhel1- Helsinki, Finland
Why fsn1?
Same datacenter region as cluster (low latency)
No cross-region egress fees
providerConfigRef¶
providerConfigRef:
name: hetzner-s3
ProviderConfig location: crossplane-system namespace (cluster infrastructure)
Contains:
Hetzner S3 endpoint (
https://fsn1.your-objectstorage.com)S3 credentials (access key, secret key)
S3-specific settings (
skip_region_validation,s3_use_path_style)
For ProviderConfig details, see Main Cluster Docs: Crossplane S3.
Bucket Details¶
Artifacts Bucket¶
Name: artifacts-gitlabbda-kup6s
Purpose: GitLab CI/CD artifacts storage
Contents:
Build outputs (compiled binaries, JAR files, Docker images)
Test results (JUnit XML, coverage reports)
Job logs (stdout/stderr from CI jobs)
Pipeline artifacts (downloadable files from GitLab UI)
Size estimates:
2-5 users: 5-20GB (10-50 pipelines/week)
10-20 users: 20-100GB (50-200 pipelines/week)
50+ users: 100-500GB (200+ pipelines/week)
Lifecycle policy:
Default: 30 days (configurable in GitLab settings)
Recommendation: 7 days for feature branches, 90 days for main/production
Access pattern:
Write: Frequent (every CI job uploads artifacts)
Read: Occasional (downloading artifacts from GitLab UI)
Delete: Automatic (GitLab expires old artifacts)
Example artifacts:
/artifacts-gitlabbda-kup6s/
├── gitlab/project-123/
│ ├── 456-build-output.zip
│ ├── 457-coverage-report.html
│ └── 458-test-results.xml
GitLab configuration:
# In gitlab-helm.ts (Helm values)
global:
appConfig:
artifacts:
enabled: true
bucket: artifacts-gitlabbda-kup6s
connection:
secret: gitlab-s3-credentials
key: connection
Uploads Bucket¶
Name: uploads-gitlabbda-kup6s
Purpose: User uploads and attachments
Contents:
Issue attachments (screenshots, PDFs, documents)
Merge request comments (images, diagrams)
Wiki uploads (images, files)
User avatars
Group/project logos
Size estimates:
2-5 users: 1-5GB (100-500 uploads)
10-20 users: 5-20GB (500-2000 uploads)
50+ users: 20-100GB (2000+ uploads)
Lifecycle policy: Never expires (user data)
Access pattern:
Write: Occasional (when users upload files)
Read: Frequent (every time issue/MR with attachment is viewed)
Delete: Manual (when issue/MR deleted)
Example uploads:
/uploads-gitlabbda-kup6s/
├── @hashed/ab/cd/abcdef123.../
│ ├── screenshot.png
│ ├── architecture-diagram.pdf
│ └── user-avatar.jpg
GitLab configuration:
global:
appConfig:
uploads:
enabled: true
bucket: uploads-gitlabbda-kup6s
connection:
secret: gitlab-s3-credentials
key: connection
LFS Bucket¶
Name: lfs-gitlabbda-kup6s
Purpose: Git Large File Storage objects
Contents:
Large files tracked by git (videos, datasets, machine learning models)
Binary files (executables, compiled libraries)
Design files (PSD, Sketch, Figma exports)
How Git LFS works:
1. git add large-file.mp4
→ Git stores pointer file in repo (100 bytes)
→ Actual file uploaded to S3 (100 MB)
2. git clone repo
→ Git downloads pointer files
→ LFS downloads actual files from S3
Size estimates:
2-5 users, no LFS: 0GB (most teams don’t use LFS)
2-5 users, with LFS: 10-50GB (ML/data science teams)
50+ users, heavy LFS: 100-500GB (game development, video production)
Lifecycle policy: Never expires (referenced by git commits)
Access pattern:
Write: Occasional (git push with LFS files)
Read: Frequent (git clone, git pull fetch LFS objects)
Delete: Manual (when LFS object no longer referenced)
Example LFS objects:
/lfs-gitlabbda-kup6s/
├── ab/cd/abcdef1234567890.../
│ ├── dataset.csv (100 MB)
│ ├── model.h5 (500 MB)
│ └── video.mp4 (1 GB)
GitLab configuration:
global:
appConfig:
lfs:
enabled: true
bucket: lfs-gitlabbda-kup6s
connection:
secret: gitlab-s3-credentials
key: connection
Pages Bucket¶
Name: pages-gitlabbda-kup6s
Purpose: GitLab Pages static site hosting
Contents:
HTML, CSS, JavaScript files
Images, fonts, static assets
Generated documentation (Sphinx, Doxygen, JSDoc)
Static site generators (Hugo, Jekyll, Gatsby outputs)
How Pages works:
1. CI job builds site → artifacts: [public/]
2. GitLab Pages daemon downloads artifacts
3. Extracts to S3 bucket
4. User visits https://project.pages.example.com
5. Pages server reads HTML from S3, serves to user
Size estimates:
2-5 users, 5-10 sites: 0.1-5GB
10-20 users, 20-50 sites: 5-20GB
50+ users, 100+ sites: 20-100GB
Lifecycle policy: Manual (delete when project deleted or Pages disabled)
Access pattern:
Write: Occasional (CI deployment updates site)
Read: Frequent (every page view)
Delete: Manual
Example pages:
/pages-gitlabbda-kup6s/
├── @hashed/ab/cd/project-123/
│ ├── index.html
│ ├── style.css
│ ├── script.js
│ └── images/logo.png
GitLab configuration:
global:
appConfig:
pages:
enabled: true
bucket: pages-gitlabbda-kup6s
connection:
secret: gitlab-s3-credentials
key: connection
Registry Bucket¶
Name: registry-gitlabbda-kup6s
Purpose: Harbor container registry storage
Contents:
OCI image layers (Docker, containerd compatible)
Image manifests (layer lists)
Image tags (mutable pointers to manifests)
How Harbor uses S3:
1. docker push registry.example.com/project/image:tag
→ Harbor Registry receives layers
→ Stores in S3: /docker/registry/v2/blobs/sha256/{hash}
2. docker pull registry.example.com/project/image:tag
→ Harbor Registry reads manifest from PostgreSQL
→ Streams layers from S3 to client
Size estimates:
2-5 users, 10-50 images: 10-50GB
10-20 users, 100-200 images: 50-200GB
50+ users, 500+ images: 200-1000GB
Lifecycle policy: Manual garbage collection
Garbage collection (via Harbor UI):
# Harbor JobService runs GC job
# Deletes unreferenced blobs (layers not used by any image)
# Frees up space in S3
Access pattern:
Write: Frequent (docker push)
Read: Very frequent (docker pull from CI, production)
Delete: Manual (via Harbor GC)
Example registry structure:
/registry-gitlabbda-kup6s/
└── docker/
└── registry/
└── v2/
├── blobs/
│ └── sha256/
│ ├── ab/cd/abcdef.../data (image layer)
│ └── 12/34/123456.../data (image layer)
└── repositories/
└── project/
└── image/
└── _manifests/
└── tags/
└── latest/
Harbor Registry configuration:
# In harbor.ts (environment variables)
env:
- name: REGISTRY_STORAGE
value: s3
- name: REGISTRY_STORAGE_S3_BUCKET
valueFrom: {secretKeyRef: {name: harbor-s3-credentials, key: bucket}}
- name: REGISTRY_STORAGE_S3_REGIONENDPOINT
valueFrom: {secretKeyRef: {name: harbor-s3-credentials, key: endpoint}}
Backups Bucket¶
Name: backups-gitlabbda-kup6s
Purpose: GitLab application backups (via Toolbox)
Contents:
Database dumps (PostgreSQL pg_dump)
Repository archives (tar.gz of all git repos)
Uploads backup (copy of uploads bucket)
LFS backup (copy of LFS bucket)
CI artifacts backup (copy of artifacts bucket)
Backup format:
TIMESTAMP_gitlab_backup.tar
├── db.sql.gz (PostgreSQL dump)
├── repositories.tar.gz (all git repos)
├── uploads.tar.gz (user uploads)
├── lfs.tar.gz (Git LFS objects)
├── artifacts.tar.gz (CI artifacts)
└── pages.tar.gz (static sites)
Size estimates:
2-5 users: 20-50GB (1-2 backups)
10-20 users: 50-200GB (3-4 backups)
50+ users: 200-1000GB (7+ backups)
Lifecycle policy: Retention policy (configurable)
Recommended retention:
# GitLab backup_keep_time setting
backup_keep_time: 604800 # 7 days
# Or tiered retention (manual cleanup):
# - Keep 7 daily backups (last 7 days)
# - Keep 4 weekly backups (last 4 weeks)
# - Keep 3 monthly backups (last 3 months)
Access pattern:
Write: Daily (automated backup via CronJob)
Read: Rare (only during restore)
Delete: Automated (old backups expired)
Example backups:
/backups-gitlabbda-kup6s/
├── 1730000000_2025_10_27_gitlab_backup.tar (latest)
├── 1729913600_2025_10_26_gitlab_backup.tar
├── 1729827200_2025_10_25_gitlab_backup.tar
└── ... (older backups)
GitLab Toolbox configuration:
global:
appConfig:
backups:
bucket: backups-gitlabbda-kup6s
tmpBucket: backups-gitlabbda-kup6s-tmp
Backup command:
kubectl exec -it deploy/gitlab-toolbox -n gitlabbda -- bash
gitlab-backup create
# Uploads to s3://backups-gitlabbda-kup6s/
PostgresBackups Bucket¶
Name: postgresbackups-gitlabbda-kup6s
Purpose: PostgreSQL CNPG backups via Barman Cloud Plugin
Contents:
Base backups (full database snapshot, daily)
WAL archives (write-ahead logs, continuous)
Backup metadata (PITR information)
How CNPG backups work:
1. PostgreSQL writes WAL segments
→ CNPG Barman plugin uploads to S3 (continuous)
2. Daily base backup (full snapshot)
→ CNPG creates pg_basebackup
→ Uploads to S3
3. Point-in-time recovery (PITR)
→ Restore base backup
→ Replay WAL segments to specific timestamp
Size estimates:
2-5 users: 5-10GB (1-2 base backups + WAL)
10-20 users: 10-30GB (2-3 base backups + WAL)
50+ users: 30-100GB (3-5 base backups + WAL)
Lifecycle policy: 30 days (configurable in CNPG Cluster)
# In database.ts (CNPG Cluster spec)
spec:
backup:
retentionPolicy: 30d # Keep backups for 30 days
Access pattern:
Write: Continuous (WAL archiving), daily (base backups)
Read: Rare (only during restore/recovery)
Delete: Automated (CNPG expires old backups)
Example backups:
/postgresbackups-gitlabbda-kup6s/
├── base/
│ ├── 20251027T000000/ (base backup)
│ └── 20251026T000000/
└── wals/
├── 000000010000000000000001 (WAL segment)
├── 000000010000000000000002
└── ...
CNPG Barman configuration:
# In database.ts (ObjectStore spec)
apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
spec:
configuration:
destinationPath: s3://postgresbackups-gitlabbda-kup6s/
endpointURL: https://fsn1.your-objectstorage.com
s3Credentials:
accessKeyId: {secretKeyRef: {name: gitlab-s3-credentials, key: AWS_ACCESS_KEY_ID}}
secretAccessKey: {secretKeyRef: {name: gitlab-s3-credentials, key: AWS_SECRET_ACCESS_KEY}}
Cache Bucket¶
Name: cache-gitlabbda-kup6s
Purpose: GitLab Runner build cache
Contents:
Dependency caches (npm node_modules, pip packages, Maven .m2)
Build caches (incremental compilation, ccache)
Docker layer cache (for Docker-in-Docker builds)
How cache works:
# .gitlab-ci.yml
build:
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
- .npm/
# First run:
# 1. npm install (downloads packages)
# 2. Runner uploads node_modules/ to S3
# 3. Job finishes
# Second run (same branch):
# 1. Runner downloads node_modules/ from S3
# 2. npm install (reuses cache, faster)
# 3. Job finishes
Size estimates:
2-5 users, light CI: 5-10GB
10-20 users, moderate CI: 10-30GB
50+ users, heavy CI: 30-100GB
Lifecycle policy: LRU auto-expire (Least Recently Used)
GitLab Runner cache expiration:
Caches unused for 7 days → deleted
Configurable via Runner cache settings
Access pattern:
Write: Frequent (every CI job uploads cache)
Read: Very frequent (every CI job downloads cache)
Delete: Automated (LRU eviction)
Example cache:
/cache-gitlabbda-kup6s/
├── project-123/
│ ├── main/
│ │ └── cache.zip (node_modules)
│ └── feature-456/
│ └── cache.zip (node_modules)
GitLab configuration:
# In gitlab-helm.ts (Helm values)
global:
appConfig:
packages: # Packages uses cache bucket
enabled: true
bucket: cache-gitlabbda-kup6s
connection:
secret: gitlab-s3-credentials
key: connection
Bucket Monitoring¶
Size Monitoring¶
Check bucket sizes:
# Via kubectl (Crossplane Bucket status)
kubectl get buckets -n crossplane-system
# Via aws CLI (Hetzner S3 compatible)
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=yyy
aws s3 ls s3://artifacts-gitlabbda-kup6s --endpoint-url=https://fsn1.your-objectstorage.com --recursive --summarize
Grafana dashboard (future):
S3 bucket sizes over time
Growth rate projections
Alerts for buckets >80% of expected size
Cost Monitoring¶
Hetzner S3 pricing (as of 2025):
€0.01/GB/month storage
No egress fees (within Hetzner network)
Cost estimation (2-5 users):
Artifacts: 20GB × €0.01 = €0.20/month
Uploads: 5GB × €0.01 = €0.05/month
LFS: 50GB × €0.01 = €0.50/month
Pages: 5GB × €0.01 = €0.05/month
Registry: 100GB × €0.01 = €1.00/month
Backups: 50GB × €0.01 = €0.50/month
PostgresBackups: 20GB × €0.01 = €0.20/month
Cache: 20GB × €0.01 = €0.20/month
---
Total: 270GB × €0.01 = €2.70/month
Scaling cost (50+ users):
Total: ~1TB × €0.01 = €10/month
Troubleshooting¶
Bucket Not Found¶
Symptom: GitLab logs show NoSuchBucket error
Diagnosis:
kubectl get bucket artifacts-gitlabbda-kup6s -n crossplane-system
Common causes:
Bucket CR not synced - Check ArgoCD Application status
Crossplane not ready - Check Crossplane operator logs
ProviderConfig invalid - Check hetzner-s3 ProviderConfig
Solution: Ensure Bucket CR is READY=True, SYNCED=True
Access Denied¶
Symptom: GitLab logs show AccessDenied or 403 errors
Diagnosis:
kubectl get secret gitlab-s3-credentials -n gitlabbda -o yaml
Common causes:
S3 credentials invalid - Check access key/secret key
Bucket policy restrictive - (Hetzner S3 doesn’t support bucket policies)
Secret not synced - Check ExternalSecret status
Solution: Verify S3 credentials in application-secrets namespace
Slow Uploads/Downloads¶
Symptom: CI jobs slow, timeouts uploading artifacts
Diagnosis:
# Test S3 speed from pod
kubectl run -it s3-test --image=amazon/aws-cli --rm -- \
s3 cp /tmp/test s3://artifacts-gitlabbda-kup6s/test \
--endpoint-url=https://fsn1.your-objectstorage.com
Common causes:
Network congestion - Check cluster network bandwidth
S3 endpoint slow - (Rare, Hetzner infrastructure issue)
Large files, no multipart - GitLab uses multipart for >10MB
Solution: Usually transient, retry job
Summary¶
8 S3 buckets, all in fsn1 region:
artifacts - CI/CD artifacts (30d lifecycle)
uploads - User uploads (never expires)
lfs - Git LFS objects (never expires)
pages - Static sites (manual cleanup)
registry - Container images (manual GC)
backups - GitLab backups (7d retention)
postgresbackups - PostgreSQL backups (30d retention)
cache - Build cache (LRU auto-expire)
Key characteristics:
Naming:
{purpose}-gitlabbda-kup6s(global uniqueness)Region: fsn1 (same as cluster)
Provisioning: Crossplane Bucket CRs
Management policies: [Observe, Create, Delete] (skip Update)
Deletion policy: Orphan (safety)
Total cost (2-5 users): ~€3/month (270GB × €0.01)
For implementation details:
Storage Architecture - S3 strategy
Constructs API Reference - Bucket provisioning
Configuration Reference - Configuration values