Reference
S3 Buckets Reference¶
Complete reference for S3 buckets used by the monitoring stack.
Overview¶
The monitoring stack uses 2 S3 buckets for long-term storage:
metrics-thanos-kup6s - Prometheus metrics (via Thanos)
logs-loki-kup6s - Loki log chunks and indexes
Both buckets are:
Hosted on Hetzner Object Storage (S3-compatible)
Located in fsn1 region (Falkenstein, Germany)
Managed by Crossplane (declarative provisioning)
Accessed with shared cluster-wide S3 credentials
Bucket: metrics-thanos-kup6s¶
Purpose¶
Long-term storage for Prometheus metrics via Thanos architecture.
Configuration¶
apiVersion: s3.aws.upbound.io/v1beta1
kind: Bucket
metadata:
name: metrics-thanos-kup6s
namespace: crossplane-system
spec:
deletionPolicy: Delete
managementPolicies:
- Observe
- Create
- Delete
forProvider:
region: fsn1
providerConfigRef:
name: hetzner-s3
Bucket Properties¶
Property |
Value |
|---|---|
Name |
metrics-thanos-kup6s |
Region |
fsn1 (Falkenstein) |
Endpoint |
https://fsn1.your-objectstorage.com |
Access Mode |
Private (credentials required) |
Versioning |
Disabled |
Encryption |
Server-side (Hetzner default) |
Lifecycle Policy¶
apiVersion: s3.aws.upbound.io/v1beta1
kind: BucketLifecycleConfiguration
metadata:
name: thanos-bucket-lifecycle
namespace: crossplane-system
spec:
deletionPolicy: Delete
managementPolicies:
- Observe
- Create
- Update
- Delete
forProvider:
bucket: metrics-thanos-kup6s
rule:
- id: expire-old-metrics
status: Enabled
expiration:
- days: 730 # 2 years
Retention Strategy:
Raw blocks (no downsampling): 30 days (Thanos Compactor marks for deletion)
5-minute downsampled: 180 days (6 months)
1-hour downsampled: 730 days (2 years)
Objects older than 730 days: Automatically deleted by S3 lifecycle
Data Structure¶
metrics-thanos-kup6s/
├── 01HQABCDEFGHIJKLMNOPQRSTU/ # ULID block ID
│ ├── meta.json # Block metadata
│ ├── index # Series index (inverted index)
│ ├── chunks/
│ │ ├── 000001 # Compressed sample chunks
│ │ ├── 000002
│ │ └── ...
│ └── tombstones # Deletion markers
├── 01HQVWXYZ1234567890ABCDEF/ # Another block
│ └── ...
├── debug/
│ └── metas/ # Compactor debug metadata
└── compact/
└── 01HR*-5m/ # 5-minute downsampled blocks
└── 01HR*-1h/ # 1-hour downsampled blocks
Block Naming¶
Blocks use ULID (Universally Unique Lexicographically Sortable Identifier):
Format:
01HQABCDEFGHIJKLMNOPQRSTUSortable by creation time
Example:
01HQABCD...created before01HQVWXY...
Block Metadata (meta.json)¶
{
"ulid": "01HQABCDEFGHIJKLMNOPQRSTU",
"minTime": 1699000000000,
"maxTime": 1699007200000,
"stats": {
"numSamples": 12500000,
"numSeries": 50000,
"numChunks": 125000
},
"compaction": {
"level": 1,
"sources": ["01HQABCDEFGHIJKLMNOPQRSTU"]
},
"version": 1,
"thanos": {
"labels": {
"prometheus": "kube-prometheus-stack-prometheus",
"prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0"
},
"downsample": {
"resolution": 0
},
"source": "sidecar"
}
}
Storage Growth¶
Ingestion Rate:
~100,000 samples/second
~500MB/day compressed (raw blocks)
After Compaction & Downsampling:
Raw data (30d): ~15GB
5m resolution (180d): ~50GB
1h resolution (730d): ~80GB
Total steady state: ~145GB
Growth Rate: Approximately linear with cluster size and scrape targets.
Access Patterns¶
Writers:
Thanos Sidecar (uploads 2-hour blocks every 2h)
Thanos Compactor (writes compacted/downsampled blocks)
Readers:
Thanos Store (reads blocks for queries)
Thanos Compactor (reads blocks for compaction)
Operations:
PUT (uploads): ~24/day (12 per Prometheus replica)
GET (queries): ~1000/day (varies with query load)
DELETE (lifecycle): Automatic via S3 lifecycle policy
Cost Analysis¶
Storage Cost:
Rate: €0.023/GB/month
Usage: ~145GB steady state
Monthly cost: €3.34
API Request Cost:
PUT: €0.005/1000 requests × 24/day × 30 = €0.004/month
GET: €0.0004/1000 requests × 1000/day × 30 = €0.012/month
Monthly cost: €0.016
Total Monthly Cost: ~€3.36
Bucket: logs-loki-kup6s¶
Purpose¶
Storage for Loki log chunks and TSDB indexes.
Configuration¶
apiVersion: s3.aws.upbound.io/v1beta1
kind: Bucket
metadata:
name: logs-loki-kup6s
namespace: crossplane-system
spec:
deletionPolicy: Delete
managementPolicies:
- Observe
- Create
- Delete
forProvider:
region: fsn1
providerConfigRef:
name: hetzner-s3
Bucket Properties¶
Property |
Value |
|---|---|
Name |
logs-loki-kup6s |
Region |
fsn1 (Falkenstein) |
Endpoint |
https://fsn1.your-objectstorage.com |
Access Mode |
Private (credentials required) |
Versioning |
Disabled |
Encryption |
Server-side (Hetzner default) |
Lifecycle Policy¶
apiVersion: s3.aws.upbound.io/v1beta1
kind: BucketLifecycleConfiguration
metadata:
name: loki-bucket-lifecycle
namespace: crossplane-system
spec:
deletionPolicy: Delete
managementPolicies:
- Observe
- Create
- Update
- Delete
forProvider:
bucket: logs-loki-kup6s
rule:
- id: expire-old-logs
status: Enabled
expiration:
- days: 90 # Safety margin (Loki internal: 31d)
Retention Strategy:
Loki compactor marks chunks for deletion after 31 days (744h)
S3 lifecycle deletes objects after 90 days (safety net for orphaned chunks)
Gap between Loki deletion (31d) and S3 deletion (90d) allows recovery if compactor fails
Data Structure¶
logs-loki-kup6s/
├── fake/ # Loki tenant ID (single-tenant)
│ ├── chunks/
│ │ └── 20251101/ # Date-based partitioning
│ │ ├── 12:00:00-13:00:00/ # Hourly directories
│ │ │ ├── abc123def456.gz # Compressed log chunks
│ │ │ └── 789ghi012jkl.gz
│ │ └── 13:00:00-14:00:00/
│ │ └── mno345pqr678.gz
│ └── index/
│ ├── boltdb-shipper/ # Legacy index
│ │ └── compactor/
│ │ └── index_18900 # Daily index files
│ └── tsdb/ # TSDB index (current)
│ ├── index_18900/
│ │ ├── 1234567890.tsdb
│ │ └── meta.json
│ └── index_18901/
└── retention/
└── markers/ # Retention markers
Chunk Format¶
Chunk Naming: {hash}.gz
Example:
abc123def456789012345678901234.gzHash: SHA256 of chunk contents
Compression: gzip
Chunk Contents:
Multiple log lines from same stream
Target size: 1.5MB compressed
Flush triggers:
Chunk reaches 1.5MB
15 minutes elapsed (max_chunk_age)
Ingester shutdown
Index Format¶
TSDB Index (current):
index_{day}/
├── {hash}.tsdb # Per-stream index files
├── meta.json # Index metadata
└── compactor.json # Compaction status
Index Content:
Stream labels → Chunk references
Label cardinality: ~1000 unique label combinations
Index size: ~10-50MB per day
Storage Growth¶
Ingestion Rate:
~50MB/day uncompressed logs
~5MB/day compressed (10:1 compression ratio)
31-Day Retention:
Chunks: 31 days × 5MB = ~155MB
Indexes: 31 days × 30MB = ~930MB
Total steady state: ~1.1GB
Growth Rate: Linear with log volume (scales with pod count and verbosity).
Access Patterns¶
Writers:
Loki Write (flushes chunks every 15min)
Loki Backend (writes indexes hourly)
Readers:
Loki Read (queries chunks for log retrieval)
Loki Backend (reads indexes for query planning)
Operations:
PUT (chunk uploads): ~96/day (every 15 min)
PUT (index uploads): ~24/day (hourly)
GET (queries): ~500/day (varies with dashboard usage)
DELETE (compactor): Daily cleanup of expired chunks
Cost Analysis¶
Storage Cost:
Rate: €0.023/GB/month
Usage: ~1.1GB steady state
Monthly cost: €0.025
API Request Cost:
PUT: €0.005/1000 requests × 120/day × 30 = €0.018/month
GET: €0.0004/1000 requests × 500/day × 30 = €0.006/month
Monthly cost: €0.024
Total Monthly Cost: ~€0.05 (negligible)
Credentials and Access¶
Credential Storage¶
All S3 credentials are managed by External Secrets Operator (ESO):
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: thanos-objstore-config
namespace: monitoring
spec:
secretStoreRef:
name: crossplane-credentials
kind: ClusterSecretStore
target:
name: thanos-objstore-config
template:
engineVersion: v2
data:
objstore.yml: |
type: S3
config:
bucket: metrics-thanos-kup6s
endpoint: {{ .endpoint }}
access_key: {{ .access_key_id }}
secret_key: {{ .secret_access_key }}
insecure: false
dataFrom:
- extract:
key: hetzner-s3-credentials
Source Secret: hetzner-s3-credentials in crossplane-system namespace
Target Secrets:
thanos-objstore-config(monitoring namespace) - for Thanos componentsloki-s3-config(monitoring namespace) - for Loki componentsmonitoring-s3-credentials(monitoring namespace) - for Crossplane ProviderConfig
Crossplane ProviderConfig¶
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
name: hetzner-s3
spec:
credentials:
source: Secret
secretRef:
name: monitoring-s3-credentials
namespace: monitoring
key: credentials
endpoint:
url:
static: https://fsn1.your-objectstorage.com
type: Static
hostnameImmutable: true
skip_region_validation: true
s3_use_path_style: true
Critical Settings:
hostnameImmutable: true- Required for S3-compatible storageskip_region_validation: true- Allows Hetzner regions (fsn1, nbg1, hel1)s3_use_path_style: true- Path-style access (bucket in URL path)
Monitoring and Observability¶
Thanos Metrics¶
S3 Upload Success:
rate(thanos_objstore_bucket_operations_total{operation="upload",bucket="metrics-thanos-kup6s"}[5m])
S3 Upload Failures:
rate(thanos_objstore_bucket_operation_failures_total{operation="upload",bucket="metrics-thanos-kup6s"}[5m]) > 0
S3 Request Duration:
histogram_quantile(0.95,
rate(thanos_objstore_bucket_operation_duration_seconds_bucket{bucket="metrics-thanos-kup6s"}[5m])
)
Loki Metrics¶
Chunk Uploads:
rate(loki_boltdb_shipper_uploads_total[5m])
Index Writes:
rate(loki_ingester_chunks_flushed_total[5m])
S3 Errors:
rate(loki_boltdb_shipper_upload_errors_total[5m]) > 0
Bucket Size Monitoring¶
Check bucket size (requires AWS CLI or compatible):
# Metrics bucket
aws s3 ls s3://metrics-thanos-kup6s/ --recursive --summarize | grep "Total Size"
# Logs bucket
aws s3 ls s3://logs-loki-kup6s/ --recursive --summarize | grep "Total Size"
Prometheus metric (if bucket metrics enabled):
s3_bucket_size_bytes{bucket="metrics-thanos-kup6s"}
Troubleshooting¶
Bucket Not Created¶
Symptom: Crossplane Bucket shows SYNCED=False
Diagnosis:
kubectl describe bucket metrics-thanos-kup6s -n crossplane-system
Common Causes:
Invalid credentials: Check
hetzner-s3-credentialssecretEndpoint unreachable: Verify
https://fsn1.your-objectstorage.comName collision: Bucket name already exists (globally unique)
Solution:
# Check ProviderConfig
kubectl get providerconfig hetzner-s3 -o yaml
# Check credentials
kubectl get secret hetzner-s3-creds-standard -n crossplane-system -o yaml
Thanos Upload Failures¶
Symptom: thanos_objstore_bucket_operation_failures_total increasing
Diagnosis:
kubectl logs -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c thanos-sidecar | grep -i error
Common Causes:
Invalid credentials: Secret
thanos-objstore-configincorrectNetwork issues: Cannot reach S3 endpoint
Bucket doesn’t exist: Crossplane bucket not ready
Solution:
# Verify secret
kubectl get secret thanos-objstore-config -n monitoring -o jsonpath='{.data.objstore\.yml}' | base64 -d
# Test S3 access from pod
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c thanos-sidecar -- \
wget -O- https://fsn1.your-objectstorage.com
Loki Chunk Upload Failures¶
Symptom: loki_boltdb_shipper_upload_errors_total increasing
Diagnosis:
kubectl logs -n monitoring -l app.kubernetes.io/component=write | grep -i "s3\|error"
Common Causes:
S3 credentials wrong: Check
loki-s3-configsecretBucket not accessible: Permissions or network issue
WAL disk full: Loki Write PVC full
Solution:
# Check Loki S3 config
kubectl get secret loki-s3-config -n monitoring -o yaml
# Check WAL PVC usage
kubectl exec -n monitoring loki-write-0 -- df -h /var/loki
High S3 Costs¶
Symptom: S3 bill higher than expected
Diagnosis:
# Count objects in bucket
aws s3 ls s3://metrics-thanos-kup6s/ --recursive | wc -l
# Check lifecycle policy
kubectl get bucketlifecycleconfiguration thanos-bucket-lifecycle -o yaml
Common Causes:
Lifecycle not active: Old blocks not deleted
Compaction not running: Raw blocks accumulating
High retention: Retention too long
Solution:
# Verify compactor running
kubectl get pods -n monitoring -l app.kubernetes.io/name=thanos-compactor
# Check compactor logs
kubectl logs -n monitoring thanos-compactor-0 | grep -E "compact|delete"
Disaster Recovery¶
Bucket Deletion Protection¶
Current Setting: deletionPolicy: Delete (buckets deleted when Crossplane resource deleted)
For production, consider changing to Orphan:
spec:
deletionPolicy: Orphan # Bucket survives Crossplane resource deletion
Backup and Restore¶
Metrics Bucket:
Backup: S3 bucket is the backup (source of truth)
Restore: Thanos Store automatically serves data from S3
Logs Bucket:
Backup: S3 bucket is the backup
Restore: Loki Read automatically queries S3
Bucket Versioning¶
Not currently enabled (Hetzner S3 supports versioning):
apiVersion: s3.aws.upbound.io/v1beta1
kind: BucketVersioning
metadata:
name: metrics-thanos-versioning
spec:
forProvider:
bucket: metrics-thanos-kup6s
versioningConfiguration:
- status: Enabled
Consideration: Versioning increases storage costs but protects against accidental deletion.
See Also¶
Storage Architecture - Storage strategy
Prometheus-Thanos Integration - Metrics flow
Loki Architecture - Logs flow
Troubleshooting - Common issues