Explanation

GitLab Components Architecture


Overview

GitLab is a microservices application - not a single monolithic container. The GitLab BDA deployment uses 8+ microservices working together, plus external PostgreSQL and Redis. Understanding how these components interact is essential for operations, troubleshooting, and scaling.

This document explains HOW GitLab components work together and WHY we use external dependencies instead of GitLab’s built-in database/cache.


Component Architecture

GitLab Components (Official Helm Chart)

The GitLab platform consists of these interconnected microservices:

Component

Role

Protocol

Dependencies

Webservice

Rails app handling HTTP

HTTP (8080)

PostgreSQL, Redis, Gitaly

Gitaly

Git server

gRPC (8075)

Hetzner Cloud Volumes (storage)

Workhorse

Smart reverse proxy

HTTP (8181)

Webservice, Gitaly, S3

Sidekiq

Background job processor

Internal

PostgreSQL, Redis, Gitaly, S3

GitLab Shell

SSH git operations

SSH (22)

Gitaly, Redis

GitLab Pages

Static site hosting

HTTP (8090)

S3 (pages bucket)

Toolbox

Backup/restore utility

Internal

PostgreSQL, Gitaly, S3

###External Dependencies

Component

Role

Why External?

PostgreSQL (CNPG)

Primary database

HA, automated backups, pooling, monitoring

Redis

Cache + job queue

Dedicated resources, persistence control

Harbor

Container registry

Vulnerability scanning, OAuth, separate lifecycle

Why separate from GitLab?

  • High availability - CNPG provides automated failover for PostgreSQL

  • Resource isolation - Database/cache don’t compete with GitLab pods

  • Independent scaling - Scale database separately from application

  • Better observability - CNPG provides rich PostgreSQL metrics

  • Backup control - Barman Cloud Plugin for PostgreSQL-aware backups

For storage details, see Storage Architecture.


Core Components Explained

Webservice: The Rails Application

What it does: HTTP request handling, UI rendering, API endpoints

Architecture:

  • Ruby on Rails application

  • GitLab CE (Community Edition) v18.5.1

  • 2 replicas for high availability (can handle pod restarts without downtime)

  • Stateless (all state in PostgreSQL/Redis/Gitaly)

Handles:

  • Web UI (projects, issues, merge requests, CI/CD pipelines)

  • REST API (/api/v4/*)

  • GraphQL API (/api/graphql)

  • User authentication and authorization

  • Git HTTP operations (/gitlab/my-project.git/*)

Communication:

User → Traefik Ingress → Workhorse → Webservice
                         PostgreSQL ← → Redis
                                 Gitaly

Resource profile:

  • CPU: 300m request, 1 core limit (request-intensive)

  • Memory: 2Gi request, 2.5Gi limit (Rails is memory-hungry)

Why 2 replicas?

  • Zero-downtime deployments - Rolling updates keep 1 instance available

  • Load balancing - Distribute HTTP requests across instances

  • Pod failure resilience - Service stays available if 1 pod crashes

Stateless design: No local disk needed - all data in PostgreSQL, Redis, S3, or Gitaly.

Workhorse: Smart Reverse Proxy

What it does: Efficient file handling, offloading slow operations from Rails

Architecture:

  • Written in Go (fast, low memory)

  • Sits between Traefik and Webservice

  • 2 replicas (same as webservice)

  • Handles large file uploads/downloads without tying up Rails workers

Key features:

  1. Direct S3 uploads - Files uploaded directly to S3 (artifacts, LFS), bypassing Rails

    User → Workhorse → S3
    (Rails just generates presigned URL, Workhorse handles upload)
    
  2. Git HTTP traffic - Proxies git clone/push to Gitaly (not through Rails)

    git clone https://gitlab.../repo.git
    → Workhorse → Gitaly (gRPC) → Hetzner Volume (20Gi)
    
  3. Artifact streaming - Streams CI artifacts from S3 without buffering in memory

  4. Websocket proxying - Terminal access in CI jobs, interactive web IDE

Why separate from Webservice?

  • Performance - Go is faster than Ruby for I/O operations

  • Resource efficiency - Doesn’t consume Rails worker threads for file transfers

  • Scalability - Can scale independently if file traffic increases

Communication:

  • Inbound: HTTP from Traefik (port 8181)

  • Outbound: HTTP to Webservice (port 8080), gRPC to Gitaly (port 8075)

Gitaly: Git Repository Server

What it does: All git operations (clone, fetch, push, diff, blame, etc.)

Architecture:

  • Written in Go (git performance-critical)

  • gRPC API for git operations

  • 1 replica (stateful - bound to specific PVC)

  • 20Gi Hetzner Cloud Volume (network-attached, not Longhorn)

Why separate from Webservice?

  • Performance - Native git operations (libgit2) faster than shelling out from Ruby

  • Resource isolation - Git operations (especially large repos) are CPU/memory intensive

  • Security - Sandboxed git execution (prevents repository corruption from bad commits)

Storage architecture:

Gitaly pod → PVC (hcloud-volumes, 20Gi) → Hetzner Cloud Volume
                                           (network-attached SSD)

Why Hetzner Volumes, not Longhorn?

  • Simplicity - Hetzner manages redundancy (no Longhorn replication needed)

  • Network-attached - Pod can reschedule to any node (not node-specific like local storage)

  • Cost-effective - €1/month for 20Gi (vs cluster storage overhead)

For storage rationale, see Storage Architecture.

Communication:

  • Inbound: gRPC from Webservice, Workhorse, Sidekiq (port 8075)

  • Outbound: None (Gitaly is the source of truth for git data)

Operations handled:

  • git clone - Pack objects and send to client

  • git push - Receive pack, write to refs, run hooks

  • git diff - Tree comparison for merge requests

  • git blame - Line-by-line history (expensive operation)

  • Repository maintenance - Garbage collection, repacking

Resource profile:

  • CPU: 200m request, 500m limit (bursty during git operations)

  • Memory: 768Mi request, 1536Mi limit (large repos need memory for pack operations)

Sidekiq: Background Job Processor

What it does: Asynchronous work that doesn’t need immediate response

Architecture:

  • Ruby process (shares Rails codebase with Webservice)

  • Redis-based job queue

  • 1 replica (for 2-5 users, 1 worker sufficient)

  • Stateless (jobs defined in Redis, data in PostgreSQL/S3)

Job types:

  1. CI/CD pipeline execution

    • Create runner jobs

    • Process pipeline events

    • Update pipeline status

  2. Email delivery

    • Notification emails (new issues, merge requests)

    • Password resets, sign-up confirmations

  3. Git housekeeping

    • Repository garbage collection (via Gitaly)

    • Repository statistics calculation

    • Mirror updates (pull/push to external repos)

  4. Search indexing

    • Update search index for code, issues, merge requests

  5. Webhook delivery

    • HTTP POST to external services on git push, issue create, etc.

Why separate from Webservice?

  • Responsiveness - Web requests stay fast (no blocking on slow jobs)

  • Resource isolation - Long-running jobs don’t starve web workers

  • Retry logic - Failed jobs retried automatically (webhooks, emails)

  • Scalability - Can add more Sidekiq replicas independently (when scaling beyond 5 users)

Communication:

Webservice → Enqueue job → Redis
                          Sidekiq → Process job → PostgreSQL/Gitaly/S3

Resource profile:

  • CPU: 200m request, 500m limit (lower than webservice - less bursty)

  • Memory: 1.5Gi request, 2Gi limit (similar to webservice - same Rails app)

GitLab Shell: SSH Git Access

What it does: Handles git@gitlab.example.com:user/repo.git SSH operations

Architecture:

  • Go application handling SSH protocol

  • Authenticates against Webservice API

  • Proxies git operations to Gitaly

  • 1 replica (SSH traffic low for 2-5 users)

SSH flow:

ssh git@gitlab.example.com
gitlab-shell (pod) → Authenticate via Webservice API
gitlab-shell → Gitaly (gRPC) → Execute git command
Gitaly → Hetzner Volume (read/write .git data)
gitlab-shell → Return git response to client

Why separate from Gitaly?

  • Security - SSH authentication and authorization separate from git operations

  • Protocol handling - SSH protocol (keys, agent forwarding) separate from git logic

  • Flexibility - Can swap authentication methods without changing Gitaly

SSH routing via Traefik:

GitLab BDA uses Traefik TCP routing instead of dedicated LoadBalancer:

# Traefik routes SSH port 22 to gitlab-shell service
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
spec:
  entryPoints: [ssh]
  routes:
    - match: HostSNI(`*`)
      services:
        - name: gitlab-gitlab-shell
          port: 22

Benefits:

  • Cost savings - No separate LoadBalancer for SSH (shares Traefik LB)

  • Consistent ingress - All traffic (HTTP, HTTPS, SSH) through single entry point

  • Easy port management - SSH on port 22 (standard), no custom ports

For SSH routing details, see Constructs API Reference.

Resource profile:

  • CPU: 50m request, 200m limit (lightweight - mostly I/O wait)

  • Memory: 128Mi request, 256Mi limit (minimal - just SSH protocol handling)

GitLab Pages: Static Site Hosting

What it does: Serves static websites built from git repositories

Architecture:

  • Go application serving HTTP

  • Reads HTML/CSS/JS from S3 (pages-gitlabbda-kup6s bucket)

  • 1 replica (static sites, low traffic for 2-5 users)

  • Supports custom domains (via wildcard *.pages.staging.bluedynamics.eu)

Pages deployment flow:

1. Push to repo with .gitlab-ci.yml:
   pages:
     script: npm run build
     artifacts:
       paths: [public/]

2. CI job builds site → artifacts saved to S3 (artifacts bucket)

3. Pages daemon downloads artifacts → extracts to pages bucket

4. User visits https://project.pages.staging.bluedynamics.eu
   → Pages reads from S3 → Serves HTML

Why S3 storage?

  • Scalability - No disk space limits (S3 grows on demand)

  • CDN-ready - Can front S3 with Cloudflare later (global distribution)

  • Cost - €0.01/GB/month vs block storage (static sites are read-heavy)

Ingress routing:

Wildcard ingress for all Pages sites:

# Simplified (see constructs-api.md for full spec)
spec:
  rules:
    - host: '*.pages.staging.bluedynamics.eu'
      http:
        paths:
          - backend:
              service:
                name: gitlab-gitlab-pages

Benefits:

  • Zero configuration per site - New projects automatically get <project>.pages.staging.bluedynamics.eu

  • Custom domains - Can add specific domains via additional Ingress resources

Resource profile:

  • CPU: 50m request, 200m limit (lightweight - serving static files)

  • Memory: 128Mi request, 256Mi limit (minimal - mostly streaming from S3)

Toolbox: Backup and Restore Utility

What it does: gitlab-rake commands for maintenance, backup, restore

Architecture:

  • Deployment (not CronJob) - runs continuously, exec into for tasks

  • Rails environment (same as Webservice) with CLI tools

  • Accesses all data sources (PostgreSQL, Gitaly, S3)

Common operations:

# Exec into toolbox pod
kubectl exec -it deploy/gitlab-toolbox -n gitlabbda -- bash

# Create full GitLab backup
gitlab-backup create
# → Uploads to S3 (backups-gitlabbda-kup6s bucket)
# Includes: Database dump, repositories, uploads, LFS, artifacts, pages

# Restore from backup
gitlab-backup restore BACKUP=<timestamp>

# Check database status
gitlab-rake db:migrate:status

# Clear Redis cache
gitlab-rake cache:clear

# Recalculate repository sizes
gitlab-rake gitlab:cleanup:repos

Backup strategy:

GitLab Toolbox backups are application-aware (vs volume snapshots):

Backup Type

Scope

Frequency

Recovery Time

Toolbox S3

Full GitLab state

Daily (configured via CronJob)

2-4 hours

CNPG Barman

PostgreSQL only (PITR)

Continuous WAL + daily base

30-60 min

Longhorn CIFS

Volume snapshots

Daily

15-30 min (restore PVC)

When to use Toolbox backups:

  • Migration - Moving to different cluster (export/import all data)

  • Version upgrade rollback - Restore to pre-upgrade state

  • Disaster recovery - Entire namespace lost (restore from S3)

For backup details, see Storage Architecture.


External Dependencies

PostgreSQL (CloudNativePG)

What it does: Primary data store for GitLab application data

Data stored:

  • Users, groups, projects metadata

  • Issues, merge requests, CI/CD pipelines

  • Repository metadata (not .git data - that’s in Gitaly)

  • Access control (permissions, tokens)

  • Configuration settings

Why External PostgreSQL?

GitLab Helm chart includes built-in PostgreSQL - so why use CNPG instead?

Feature

Built-in PostgreSQL

CNPG PostgreSQL

High availability

Single instance (manual failover)

2 instances (auto-failover < 30s)

Backups

Manual via GitLab Toolbox

Automatic via Barman Cloud Plugin

Point-in-time recovery

No (full backups only)

Yes (WAL archiving to S3)

Connection pooling

PgBouncer sidecar

Dedicated Pooler CRD (better isolation)

Monitoring

Basic metrics

Rich metrics via PodMonitor

Storage management

Static PVC size

Longhorn snapshots, resizing

Upgrades

Manual (risky)

Automated via rolling updates

Key benefits of CNPG:

  1. Automated failover - Primary crashes? Standby promoted automatically (< 30 seconds)

  2. Continuous backups - WAL archiving to S3 (can recover to any point in time)

  3. Read replicas - Can add read-only replicas for scaling (future)

  4. Pooling - PgBouncer Pooler handles 1000 client connections with 25 backend connections

  5. Kubernetes-native - Managed via CRDs (GitOps-friendly)

CNPG Architecture:

GitLab Webservice/Sidekiq
gitlab-postgres-pooler (PgBouncer, 2 replicas)
gitlab-postgres-1 (Primary) ⟷ gitlab-postgres-2 (Standby)
         ↓                              ↓
    PVC (10Gi)                      PVC (10Gi)
 longhorn-redundant-app         longhorn-redundant-app
     (1 replica)                    (1 replica)

Replication flow:

  1. Write to Primary → PostgreSQL streaming replication → Standby

  2. Standby stays in sync (< 1 second lag typically)

  3. Primary fails → CNPG promotes Standby → Pooler redirects connections

Storage rationale:

  • longhorn-redundant-app (1 replica) - Each CNPG instance has 1 Longhorn replica

  • Why not 2 Longhorn replicas? - CNPG provides application-level replication (2 instances)

  • Total copies: 2 (CNPG instances) = adequate redundancy without waste

For storage details, see Storage Architecture.

Configuration:

  • Instances: 2 (primary + standby)

  • Pooler: 2 replicas (transaction pooling)

  • Max connections: 200 (DB), 1000 (pooler)

  • Backup: S3 (postgresbackups-gitlabbda-kup6s) via Barman Cloud Plugin

For CNPG cluster specification, see Constructs API Reference.

Redis: Cache and Job Queue

What it does: Fast in-memory data store for caching and background jobs

Data stored:

  1. Session cache - User sessions, temporary auth tokens

  2. Job queue - Sidekiq job definitions (pending, running, retrying)

  3. Cache store - Rendered Markdown, API responses (volatile, can rebuild)

  4. Shared state - Rate limiting counters, feature flags

Why External Redis?

GitLab Helm chart includes built-in Redis - why separate instance?

Feature

Built-in Redis

Dedicated Redis

Resource isolation

Shares pod resources

Dedicated CPU/memory

Persistence

Optional

Controlled (AOF + snapshots)

Monitoring

Limited

Dedicated metrics

Scaling

Coupled to GitLab

Independent

Sentinel HA

No (single instance)

Future: Redis Sentinel

Key benefits of dedicated Redis:

  1. Resource guarantees - 128-512Mi memory reserved (GitLab cache won’t evict job queue data)

  2. Persistence control - AOF (append-only file) + snapshots ensure job queue survives pod restart

  3. Independent scaling - Can increase memory without scaling entire GitLab deployment

  4. Future-proof - Can migrate to Redis Sentinel (HA) when scaling beyond 5 users

Redis Architecture:

GitLab Webservice (cache reads/writes)
    Redis Service (ClusterIP)
    Redis StatefulSet (1 replica)
    PVC (10Gi, longhorn, 2 replicas)
    /data (AOF + snapshots)

Persistence strategy:

redis-server --appendonly yes --save 60 1
  • AOF (append-only file) - Every write logged to disk (fsync every second)

  • Snapshot - Full data dump if ≥1 key changed in 60 seconds

  • Recovery - Pod restart → Replay AOF → Full state recovered

Why persistence?

  • Job queue - Sidekiq jobs must survive restarts (email delivery, CI jobs)

  • Sessions - User sessions preserved (no forced logouts during pod restart)

  • Cache - Nice-to-have (can rebuild, but faster to restore)

Storage rationale:

  • longhorn (2 replicas) - Redis is single instance (no clustering for 2-5 users)

  • Why 2 Longhorn replicas? - No application-level replication (unlike CNPG)

  • Cost: 10Gi × 2 = 20Gi cluster storage (acceptable for HA)

For storage details, see Storage Architecture.

Configuration:

  • Replicas: 1 (single instance, sufficient for 2-5 users)

  • Memory: 128-512Mi (cache + queue for small team)

  • Persistence: AOF + snapshots (every 60s if data changed)

For Redis specification, see Constructs API Reference.

Future: Redis Sentinel

When scaling beyond 5 users, consider Redis Sentinel (HA):

  • 3 Redis instances (1 primary + 2 replicas)

  • Sentinel processes monitor health

  • Auto-failover if primary fails (< 10 seconds)

  • Cost: 3× memory (30Gi vs 10Gi)


Component Communication Patterns

HTTP Request Flow

Web UI request (e.g., view merge request):

1. User → https://gitlab.staging.bluedynamics.eu/project/merge_requests/1

2. Traefik Ingress (TLS termination, routing)
3. Workhorse (port 8181) → Check cache (Redis)
   ↓ Cache miss
4. Workhorse → Webservice (port 8080)
5. Webservice → PostgreSQL (query MR data)
   Webservice → Gitaly (get diff via gRPC)
   Webservice → Redis (update cache)
6. Webservice → Render HTML
7. Workhorse → Return to user

Git HTTP operation (e.g., git clone):

1. git clone https://gitlab.staging.bluedynamics.eu/user/repo.git

2. Traefik → Workhorse
3. Workhorse → Webservice (authenticate user, authorize repository access)
4. Webservice → PostgreSQL (check permissions)
5. Webservice → Return OK + Gitaly endpoint to Workhorse
6. Workhorse → Gitaly (gRPC: RepositoryService.GetObjects)
7. Gitaly → Read .git data from Hetzner Volume
8. Gitaly → Stream pack file to Workhorse
9. Workhorse → Stream to git client

Key insight: Workhorse handles streaming (step 6-9) without involving Rails (efficient).

Background Job Flow

CI/CD pipeline trigger (e.g., git push):

1. git push origin main

2. GitLab Shell (SSH) or Workhorse (HTTPS) → Gitaly (write to repo)
3. Gitaly → Post-receive hook → Webservice API (notify of push)
4. Webservice → Check .gitlab-ci.yml (via Gitaly)
   Webservice → Create pipeline record (PostgreSQL)
   Webservice → Enqueue job (Redis: CreatePipelineWorker)
5. Sidekiq → Dequeue CreatePipelineWorker from Redis
6. Sidekiq → Create CI jobs (PostgreSQL)
   Sidekiq → Enqueue job (Redis: PipelineProcessWorker)
7. Sidekiq → Dequeue PipelineProcessWorker
8. Sidekiq → Assign jobs to GitLab Runners
9. Runners → Execute jobs → Upload artifacts to S3
10. Runners → Report status to Webservice API
11. Webservice → Update pipeline status (PostgreSQL)
12. User refreshes UI → See pipeline results

Why async jobs?

  • Push responsiveness - git push returns immediately (doesn’t wait for CI)

  • Retry logic - Job fails? Sidekiq retries automatically

  • Scalability - Can add more Sidekiq workers to process jobs faster

Storage Access Patterns

Different components access different storage tiers:

Component

Hetzner Volumes

Longhorn PVC

S3 Buckets

PostgreSQL

Redis

Webservice

✅ (pre-signed URLs)

✅ (read/write)

✅ (cache)

Gitaly

✅ (20Gi)

Sidekiq

✅ (upload artifacts)

✅ (read/write)

✅ (job queue)

Pages

✅ (pages bucket)

Workhorse

✅ (direct upload)

PostgreSQL

✅ (10Gi × 2)

✅ (backups)

N/A

Redis

✅ (10Gi)

N/A

Key pattern: Storage separation by access type

  • Shared state (database, cache) → PostgreSQL/Redis (low-latency access needed)

  • Large files (artifacts, uploads) → S3 (scalable, cost-effective)

  • Git repositories → Hetzner Volumes (simple, reliable block storage)


Scaling Considerations

Current Configuration (2-5 users)

Component

Replicas

Rationale

Webservice

2

HA (zero-downtime deploys)

Workhorse

2

Follows webservice

Sidekiq

1

Low job volume

Gitaly

1

Stateful (single PVC)

Shell

1

Low SSH traffic

Pages

1

Low page views

PostgreSQL

2

HA (CNPG instances)

Redis

1

Sufficient for cache + queue

Scaling to 10-20 Users

Bottlenecks to watch:

  1. Sidekiq queue depth

    • Symptom: Jobs waiting >1 minute before processing

    • Solution: Increase Sidekiq replicas (1 → 2 or 3)

  2. Database connections

    • Symptom: remaining connection slots reserved errors

    • Solution: Increase pooler replicas or max_connections (200 → 400)

  3. Redis memory

    • Symptom: Cache evictions, slow job queue

    • Solution: Increase Redis memory (512Mi → 1Gi)

  4. Gitaly storage

    • Symptom: PVC 80% full

    • Solution: Resize Hetzner Volume (20Gi → 50Gi) - no downtime needed

For resource sizing, see Resource Requirements Reference.

Scaling Beyond 20 Users

Architectural changes needed:

  1. Horizontal Gitaly scaling - Shard repositories across multiple Gitaly instances (Praefect)

  2. Redis Sentinel - HA Redis with auto-failover (3 instances)

  3. PostgreSQL read replicas - Offload read traffic from primary (CNPG supports this)

  4. Separate Harbor PostgreSQL - Dedicated database for registry (reduce contention)

For scaling architecture, see Architecture Overview.


Health Checks and Probes

Liveness Probes (Restart on failure)

Webservice:

livenessProbe:
  httpGet:
    path: /-/liveness
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

Gitaly:

livenessProbe:
  exec:
    command: ['/scripts/healthcheck']
  initialDelaySeconds: 30

Redis:

livenessProbe:
  tcpSocket:
    port: 6379
  initialDelaySeconds: 30

Readiness Probes (Remove from service on failure)

Webservice:

readinessProbe:
  httpGet:
    path: /-/readiness
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

PostgreSQL (CNPG built-in):

  • Checks replication lag (< 1MB behind primary)

  • Checks database accepting connections

  • Removes replica from pooler if unhealthy

Why both probes?

  • Liveness: Detect deadlocks (restart pod)

  • Readiness: Detect slow startup (don’t send traffic until ready)


Troubleshooting Component Issues

Webservice Pod CrashLoopBackOff

Symptoms:

  • Webservice pods repeatedly crashing

  • Logs show PG::ConnectionBad or Redis::CannotConnectError

Diagnosis:

kubectl logs deploy/gitlab-webservice -n gitlabbda | grep -i error

Common causes:

  1. PostgreSQL not ready - Check CNPG cluster status

    kubectl get cluster gitlab-postgres -n gitlabbda
    
  2. Redis not ready - Check Redis pod

    kubectl get pods -l app.kubernetes.io/name=redis -n gitlabbda
    
  3. S3 credentials invalid - Check ExternalSecret sync status

    kubectl get externalsecret gitlab-s3-credentials -n gitlabbda
    

Solution: Fix dependency, webservice will recover automatically (readiness probe).

Gitaly High CPU

Symptoms:

  • Gitaly pod using 80-100% CPU

  • Git operations slow (clone, diff take minutes)

Diagnosis:

kubectl top pod -l app.kubernetes.io/name=gitaly -n gitlabbda

Common causes:

  1. Large git operation (e.g., cloning 5GB repo) - Normal, temporary

  2. Repository garbage collection - Scheduled housekeeping task

  3. Many concurrent git operations - Multiple users pushing simultaneously

Solution:

  • Temporary spike: Wait for operation to complete

  • Sustained high CPU: Consider adding CPU limit headroom or optimizing repository

Sidekiq Jobs Not Processing

Symptoms:

  • Emails not sending

  • CI pipelines stuck in “pending”

  • Background tasks accumulating

Diagnosis:

kubectl logs deploy/gitlab-sidekiq -n gitlabbda | tail -n 100

Common causes:

  1. Sidekiq pod crashed - Check pod status

  2. Redis connection lost - Check Redis logs

  3. Job stuck (infinite loop or deadlock) - Check Sidekiq logs for repeated errors

Solution:

# Restart Sidekiq
kubectl rollout restart deploy/gitlab-sidekiq -n gitlabbda

# If jobs still stuck, check Redis queue
kubectl exec -it deploy/redis -n gitlabbda -- redis-cli
> LLEN resque:queue:default  # Check queue length

For complete troubleshooting guide, see Troubleshooting Reference.


Summary

GitLab is a microservices platform with clear separation of concerns:

  • Webservice - HTTP requests, UI rendering

  • Workhorse - Efficient file handling, proxying

  • Gitaly - Git operations (the source of truth for .git data)

  • Sidekiq - Asynchronous jobs (CI, emails, housekeeping)

  • GitLab Shell - SSH git access

  • GitLab Pages - Static site hosting

External dependencies provide critical HA and management benefits:

  • PostgreSQL (CNPG) - Automated failover, point-in-time recovery, connection pooling

  • Redis - Dedicated resources, persistence control, independent scaling

Communication patterns:

  • Synchronous - HTTP/gRPC for user-facing operations (Webservice ↔ Gitaly)

  • Asynchronous - Redis job queue for background work (Webservice → Sidekiq)

  • Storage tier separation - Block storage (Gitaly), replicated storage (DB/cache), object storage (artifacts/uploads)

For implementation details: