Reference

S3 Buckets Reference

Complete reference for S3 buckets used by the monitoring stack.

Overview

The monitoring stack uses 2 S3 buckets for long-term storage:

  1. metrics-thanos-kup6s - Prometheus metrics (via Thanos)

  2. logs-loki-kup6s - Loki log chunks and indexes

Both buckets are:

  • Hosted on Hetzner Object Storage (S3-compatible)

  • Located in fsn1 region (Falkenstein, Germany)

  • Managed by Crossplane (declarative provisioning)

  • Accessed with shared cluster-wide S3 credentials

Bucket: metrics-thanos-kup6s

Purpose

Long-term storage for Prometheus metrics via Thanos architecture.

Configuration

apiVersion: s3.aws.upbound.io/v1beta1
kind: Bucket
metadata:
  name: metrics-thanos-kup6s
  namespace: crossplane-system
spec:
  deletionPolicy: Delete
  managementPolicies:
    - Observe
    - Create
    - Delete
  forProvider:
    region: fsn1
  providerConfigRef:
    name: hetzner-s3

Bucket Properties

Property

Value

Name

metrics-thanos-kup6s

Region

fsn1 (Falkenstein)

Endpoint

https://fsn1.your-objectstorage.com

Access Mode

Private (credentials required)

Versioning

Disabled

Encryption

Server-side (Hetzner default)

Lifecycle Policy

apiVersion: s3.aws.upbound.io/v1beta1
kind: BucketLifecycleConfiguration
metadata:
  name: thanos-bucket-lifecycle
  namespace: crossplane-system
spec:
  deletionPolicy: Delete
  managementPolicies:
    - Observe
    - Create
    - Update
    - Delete
  forProvider:
    bucket: metrics-thanos-kup6s
    rule:
      - id: expire-old-metrics
        status: Enabled
        expiration:
          - days: 730  # 2 years

Retention Strategy:

  • Raw blocks (no downsampling): 30 days (Thanos Compactor marks for deletion)

  • 5-minute downsampled: 180 days (6 months)

  • 1-hour downsampled: 730 days (2 years)

  • Objects older than 730 days: Automatically deleted by S3 lifecycle

Data Structure

metrics-thanos-kup6s/
├── 01HQABCDEFGHIJKLMNOPQRSTU/          # ULID block ID
│   ├── meta.json                       # Block metadata
│   ├── index                           # Series index (inverted index)
│   ├── chunks/
│   │   ├── 000001                      # Compressed sample chunks
│   │   ├── 000002
│   │   └── ...
│   └── tombstones                      # Deletion markers
├── 01HQVWXYZ1234567890ABCDEF/          # Another block
│   └── ...
├── debug/
│   └── metas/                          # Compactor debug metadata
└── compact/
    └── 01HR*-5m/                       # 5-minute downsampled blocks
    └── 01HR*-1h/                       # 1-hour downsampled blocks

Block Naming

Blocks use ULID (Universally Unique Lexicographically Sortable Identifier):

  • Format: 01HQABCDEFGHIJKLMNOPQRSTU

  • Sortable by creation time

  • Example: 01HQABCD... created before 01HQVWXY...

Block Metadata (meta.json)

{
  "ulid": "01HQABCDEFGHIJKLMNOPQRSTU",
  "minTime": 1699000000000,
  "maxTime": 1699007200000,
  "stats": {
    "numSamples": 12500000,
    "numSeries": 50000,
    "numChunks": 125000
  },
  "compaction": {
    "level": 1,
    "sources": ["01HQABCDEFGHIJKLMNOPQRSTU"]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "prometheus": "kube-prometheus-stack-prometheus",
      "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "sidecar"
  }
}

Storage Growth

Ingestion Rate:

  • ~100,000 samples/second

  • ~500MB/day compressed (raw blocks)

After Compaction & Downsampling:

  • Raw data (30d): ~15GB

  • 5m resolution (180d): ~50GB

  • 1h resolution (730d): ~80GB

  • Total steady state: ~145GB

Growth Rate: Approximately linear with cluster size and scrape targets.

Access Patterns

Writers:

  • Thanos Sidecar (uploads 2-hour blocks every 2h)

  • Thanos Compactor (writes compacted/downsampled blocks)

Readers:

  • Thanos Store (reads blocks for queries)

  • Thanos Compactor (reads blocks for compaction)

Operations:

  • PUT (uploads): ~24/day (12 per Prometheus replica)

  • GET (queries): ~1000/day (varies with query load)

  • DELETE (lifecycle): Automatic via S3 lifecycle policy

Cost Analysis

Storage Cost:

  • Rate: €0.023/GB/month

  • Usage: ~145GB steady state

  • Monthly cost: €3.34

API Request Cost:

  • PUT: €0.005/1000 requests × 24/day × 30 = €0.004/month

  • GET: €0.0004/1000 requests × 1000/day × 30 = €0.012/month

  • Monthly cost: €0.016

Total Monthly Cost: ~€3.36


Bucket: logs-loki-kup6s

Purpose

Storage for Loki log chunks and TSDB indexes.

Configuration

apiVersion: s3.aws.upbound.io/v1beta1
kind: Bucket
metadata:
  name: logs-loki-kup6s
  namespace: crossplane-system
spec:
  deletionPolicy: Delete
  managementPolicies:
    - Observe
    - Create
    - Delete
  forProvider:
    region: fsn1
  providerConfigRef:
    name: hetzner-s3

Bucket Properties

Property

Value

Name

logs-loki-kup6s

Region

fsn1 (Falkenstein)

Endpoint

https://fsn1.your-objectstorage.com

Access Mode

Private (credentials required)

Versioning

Disabled

Encryption

Server-side (Hetzner default)

Lifecycle Policy

apiVersion: s3.aws.upbound.io/v1beta1
kind: BucketLifecycleConfiguration
metadata:
  name: loki-bucket-lifecycle
  namespace: crossplane-system
spec:
  deletionPolicy: Delete
  managementPolicies:
    - Observe
    - Create
    - Update
    - Delete
  forProvider:
    bucket: logs-loki-kup6s
    rule:
      - id: expire-old-logs
        status: Enabled
        expiration:
          - days: 90  # Safety margin (Loki internal: 31d)

Retention Strategy:

  • Loki compactor marks chunks for deletion after 31 days (744h)

  • S3 lifecycle deletes objects after 90 days (safety net for orphaned chunks)

  • Gap between Loki deletion (31d) and S3 deletion (90d) allows recovery if compactor fails

Data Structure

logs-loki-kup6s/
├── fake/                                    # Loki tenant ID (single-tenant)
│   ├── chunks/
│   │   └── 20251101/                        # Date-based partitioning
│   │       ├── 12:00:00-13:00:00/           # Hourly directories
│   │       │   ├── abc123def456.gz          # Compressed log chunks
│   │       │   └── 789ghi012jkl.gz
│   │       └── 13:00:00-14:00:00/
│   │           └── mno345pqr678.gz
│   └── index/
│       ├── boltdb-shipper/                  # Legacy index
│       │   └── compactor/
│       │       └── index_18900              # Daily index files
│       └── tsdb/                            # TSDB index (current)
│           ├── index_18900/
│           │   ├── 1234567890.tsdb
│           │   └── meta.json
│           └── index_18901/
└── retention/
    └── markers/                             # Retention markers

Chunk Format

Chunk Naming: {hash}.gz

  • Example: abc123def456789012345678901234.gz

  • Hash: SHA256 of chunk contents

  • Compression: gzip

Chunk Contents:

  • Multiple log lines from same stream

  • Target size: 1.5MB compressed

  • Flush triggers:

    • Chunk reaches 1.5MB

    • 15 minutes elapsed (max_chunk_age)

    • Ingester shutdown

Index Format

TSDB Index (current):

index_{day}/
├── {hash}.tsdb          # Per-stream index files
├── meta.json            # Index metadata
└── compactor.json       # Compaction status

Index Content:

  • Stream labels → Chunk references

  • Label cardinality: ~1000 unique label combinations

  • Index size: ~10-50MB per day

Storage Growth

Ingestion Rate:

  • ~50MB/day uncompressed logs

  • ~5MB/day compressed (10:1 compression ratio)

31-Day Retention:

  • Chunks: 31 days × 5MB = ~155MB

  • Indexes: 31 days × 30MB = ~930MB

  • Total steady state: ~1.1GB

Growth Rate: Linear with log volume (scales with pod count and verbosity).

Access Patterns

Writers:

  • Loki Write (flushes chunks every 15min)

  • Loki Backend (writes indexes hourly)

Readers:

  • Loki Read (queries chunks for log retrieval)

  • Loki Backend (reads indexes for query planning)

Operations:

  • PUT (chunk uploads): ~96/day (every 15 min)

  • PUT (index uploads): ~24/day (hourly)

  • GET (queries): ~500/day (varies with dashboard usage)

  • DELETE (compactor): Daily cleanup of expired chunks

Cost Analysis

Storage Cost:

  • Rate: €0.023/GB/month

  • Usage: ~1.1GB steady state

  • Monthly cost: €0.025

API Request Cost:

  • PUT: €0.005/1000 requests × 120/day × 30 = €0.018/month

  • GET: €0.0004/1000 requests × 500/day × 30 = €0.006/month

  • Monthly cost: €0.024

Total Monthly Cost: ~€0.05 (negligible)


Credentials and Access

Credential Storage

All S3 credentials are managed by External Secrets Operator (ESO):

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: thanos-objstore-config
  namespace: monitoring
spec:
  secretStoreRef:
    name: crossplane-credentials
    kind: ClusterSecretStore
  target:
    name: thanos-objstore-config
    template:
      engineVersion: v2
      data:
        objstore.yml: |
          type: S3
          config:
            bucket: metrics-thanos-kup6s
            endpoint: {{ .endpoint }}
            access_key: {{ .access_key_id }}
            secret_key: {{ .secret_access_key }}
            insecure: false
  dataFrom:
    - extract:
        key: hetzner-s3-credentials

Source Secret: hetzner-s3-credentials in crossplane-system namespace

Target Secrets:

  • thanos-objstore-config (monitoring namespace) - for Thanos components

  • loki-s3-config (monitoring namespace) - for Loki components

  • monitoring-s3-credentials (monitoring namespace) - for Crossplane ProviderConfig

Shared Credentials Model

Important: All monitoring components use the same Hetzner S3 credentials:

  • Thanos Sidecar

  • Thanos Query

  • Thanos Store

  • Thanos Compactor

  • Loki Write

  • Loki Read

  • Loki Backend

Rationale: Hetzner S3 uses project-level access keys (not bucket-specific like AWS IAM).

Crossplane ProviderConfig

apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: hetzner-s3
spec:
  credentials:
    source: Secret
    secretRef:
      name: monitoring-s3-credentials
      namespace: monitoring
      key: credentials
  endpoint:
    url:
      static: https://fsn1.your-objectstorage.com
      type: Static
    hostnameImmutable: true
  skip_region_validation: true
  s3_use_path_style: true

Critical Settings:

  • hostnameImmutable: true - Required for S3-compatible storage

  • skip_region_validation: true - Allows Hetzner regions (fsn1, nbg1, hel1)

  • s3_use_path_style: true - Path-style access (bucket in URL path)


Monitoring and Observability

Thanos Metrics

S3 Upload Success:

rate(thanos_objstore_bucket_operations_total{operation="upload",bucket="metrics-thanos-kup6s"}[5m])

S3 Upload Failures:

rate(thanos_objstore_bucket_operation_failures_total{operation="upload",bucket="metrics-thanos-kup6s"}[5m]) > 0

S3 Request Duration:

histogram_quantile(0.95,
  rate(thanos_objstore_bucket_operation_duration_seconds_bucket{bucket="metrics-thanos-kup6s"}[5m])
)

Loki Metrics

Chunk Uploads:

rate(loki_boltdb_shipper_uploads_total[5m])

Index Writes:

rate(loki_ingester_chunks_flushed_total[5m])

S3 Errors:

rate(loki_boltdb_shipper_upload_errors_total[5m]) > 0

Bucket Size Monitoring

Check bucket size (requires AWS CLI or compatible):

# Metrics bucket
aws s3 ls s3://metrics-thanos-kup6s/ --recursive --summarize | grep "Total Size"

# Logs bucket
aws s3 ls s3://logs-loki-kup6s/ --recursive --summarize | grep "Total Size"

Prometheus metric (if bucket metrics enabled):

s3_bucket_size_bytes{bucket="metrics-thanos-kup6s"}

Troubleshooting

Bucket Not Created

Symptom: Crossplane Bucket shows SYNCED=False

Diagnosis:

kubectl describe bucket metrics-thanos-kup6s -n crossplane-system

Common Causes:

  1. Invalid credentials: Check hetzner-s3-credentials secret

  2. Endpoint unreachable: Verify https://fsn1.your-objectstorage.com

  3. Name collision: Bucket name already exists (globally unique)

Solution:

# Check ProviderConfig
kubectl get providerconfig hetzner-s3 -o yaml

# Check credentials
kubectl get secret hetzner-s3-creds-standard -n crossplane-system -o yaml

Thanos Upload Failures

Symptom: thanos_objstore_bucket_operation_failures_total increasing

Diagnosis:

kubectl logs -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c thanos-sidecar | grep -i error

Common Causes:

  1. Invalid credentials: Secret thanos-objstore-config incorrect

  2. Network issues: Cannot reach S3 endpoint

  3. Bucket doesn’t exist: Crossplane bucket not ready

Solution:

# Verify secret
kubectl get secret thanos-objstore-config -n monitoring -o jsonpath='{.data.objstore\.yml}' | base64 -d

# Test S3 access from pod
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c thanos-sidecar -- \
  wget -O- https://fsn1.your-objectstorage.com

Loki Chunk Upload Failures

Symptom: loki_boltdb_shipper_upload_errors_total increasing

Diagnosis:

kubectl logs -n monitoring -l app.kubernetes.io/component=write | grep -i "s3\|error"

Common Causes:

  1. S3 credentials wrong: Check loki-s3-config secret

  2. Bucket not accessible: Permissions or network issue

  3. WAL disk full: Loki Write PVC full

Solution:

# Check Loki S3 config
kubectl get secret loki-s3-config -n monitoring -o yaml

# Check WAL PVC usage
kubectl exec -n monitoring loki-write-0 -- df -h /var/loki

High S3 Costs

Symptom: S3 bill higher than expected

Diagnosis:

# Count objects in bucket
aws s3 ls s3://metrics-thanos-kup6s/ --recursive | wc -l

# Check lifecycle policy
kubectl get bucketlifecycleconfiguration thanos-bucket-lifecycle -o yaml

Common Causes:

  1. Lifecycle not active: Old blocks not deleted

  2. Compaction not running: Raw blocks accumulating

  3. High retention: Retention too long

Solution:

# Verify compactor running
kubectl get pods -n monitoring -l app.kubernetes.io/name=thanos-compactor

# Check compactor logs
kubectl logs -n monitoring thanos-compactor-0 | grep -E "compact|delete"

Disaster Recovery

Bucket Deletion Protection

Current Setting: deletionPolicy: Delete (buckets deleted when Crossplane resource deleted)

For production, consider changing to Orphan:

spec:
  deletionPolicy: Orphan  # Bucket survives Crossplane resource deletion

Backup and Restore

Metrics Bucket:

  • Backup: S3 bucket is the backup (source of truth)

  • Restore: Thanos Store automatically serves data from S3

Logs Bucket:

  • Backup: S3 bucket is the backup

  • Restore: Loki Read automatically queries S3

Bucket Versioning

Not currently enabled (Hetzner S3 supports versioning):

apiVersion: s3.aws.upbound.io/v1beta1
kind: BucketVersioning
metadata:
  name: metrics-thanos-versioning
spec:
  forProvider:
    bucket: metrics-thanos-kup6s
    versioningConfiguration:
      - status: Enabled

Consideration: Versioning increases storage costs but protects against accidental deletion.


See Also