Explanation

GitLab Components Architecture¶

Type: Explanation (Understanding-oriented)

Related Concepts: Architecture Overview | Storage Architecture

Overview¶

GitLab is a microservices application - not a single monolithic container. The GitLab BDA deployment uses 8+ microservices working together, plus external PostgreSQL and Redis. Understanding how these components interact is essential for operations, troubleshooting, and scaling.

This document explains HOW GitLab components work together and WHY we use external dependencies instead of GitLab’s built-in database/cache.

Component Architecture¶

GitLab Components (Official Helm Chart)¶

The GitLab platform consists of these interconnected microservices:

Component	Role	Protocol	Dependencies
Webservice	Rails app handling HTTP	HTTP (8080)	PostgreSQL, Redis, Gitaly
Gitaly	Git server	gRPC (8075)	Hetzner Cloud Volumes (storage)
Workhorse	Smart reverse proxy	HTTP (8181)	Webservice, Gitaly, S3
Sidekiq	Background job processor	Internal	PostgreSQL, Redis, Gitaly, S3
GitLab Shell	SSH git operations	SSH (22)	Gitaly, Redis
GitLab Pages	Static site hosting	HTTP (8090)	S3 (pages bucket)
Toolbox	Backup/restore utility	Internal	PostgreSQL, Gitaly, S3

###External Dependencies

Component	Role	Why External?
PostgreSQL (CNPG)	Primary database	HA, automated backups, pooling, monitoring
Redis	Cache + job queue	Dedicated resources, persistence control
Harbor	Container registry	Vulnerability scanning, OAuth, separate lifecycle

Why separate from GitLab?

High availability - CNPG provides automated failover for PostgreSQL
Resource isolation - Database/cache don’t compete with GitLab pods
Independent scaling - Scale database separately from application
Better observability - CNPG provides rich PostgreSQL metrics
Backup control - Barman Cloud Plugin for PostgreSQL-aware backups

For storage details, see Storage Architecture.

Core Components Explained¶

Webservice: The Rails Application¶

What it does: HTTP request handling, UI rendering, API endpoints

Architecture:

Ruby on Rails application
GitLab CE (Community Edition) v18.5.1
2 replicas for high availability (can handle pod restarts without downtime)
Stateless (all state in PostgreSQL/Redis/Gitaly)

Handles:

Web UI (projects, issues, merge requests, CI/CD pipelines)
REST API (/api/v4/*)
GraphQL API (/api/graphql)
User authentication and authorization
Git HTTP operations (/gitlab/my-project.git/*)

Communication:

User → Traefik Ingress → Workhorse → Webservice
                                   ↓
                         PostgreSQL ← → Redis
                                   ↓
                                 Gitaly

Resource profile:

CPU: 300m request, 1 core limit (request-intensive)
Memory: 2Gi request, 2.5Gi limit (Rails is memory-hungry)

Why 2 replicas?

Zero-downtime deployments - Rolling updates keep 1 instance available
Load balancing - Distribute HTTP requests across instances
Pod failure resilience - Service stays available if 1 pod crashes

Stateless design: No local disk needed - all data in PostgreSQL, Redis, S3, or Gitaly.

Workhorse: Smart Reverse Proxy¶

What it does: Efficient file handling, offloading slow operations from Rails

Architecture:

Written in Go (fast, low memory)
Sits between Traefik and Webservice
2 replicas (same as webservice)
Handles large file uploads/downloads without tying up Rails workers

Key features:

Direct S3 uploads - Files uploaded directly to S3 (artifacts, LFS), bypassing Rails

User → Workhorse → S3
(Rails just generates presigned URL, Workhorse handles upload)

Git HTTP traffic - Proxies git clone/push to Gitaly (not through Rails)

git clone https://gitlab.../repo.git
→ Workhorse → Gitaly (gRPC) → Hetzner Volume (20Gi)

Artifact streaming - Streams CI artifacts from S3 without buffering in memory
Websocket proxying - Terminal access in CI jobs, interactive web IDE

Why separate from Webservice?

Performance - Go is faster than Ruby for I/O operations
Resource efficiency - Doesn’t consume Rails worker threads for file transfers
Scalability - Can scale independently if file traffic increases

Communication:

Inbound: HTTP from Traefik (port 8181)
Outbound: HTTP to Webservice (port 8080), gRPC to Gitaly (port 8075)

Gitaly: Git Repository Server¶

What it does: All git operations (clone, fetch, push, diff, blame, etc.)

Architecture:

Written in Go (git performance-critical)
gRPC API for git operations
1 replica (stateful - bound to specific PVC)
20Gi Hetzner Cloud Volume (network-attached, not Longhorn)

Why separate from Webservice?

Performance - Native git operations (libgit2) faster than shelling out from Ruby
Resource isolation - Git operations (especially large repos) are CPU/memory intensive
Security - Sandboxed git execution (prevents repository corruption from bad commits)

Storage architecture:

Gitaly pod → PVC (hcloud-volumes, 20Gi) → Hetzner Cloud Volume
                                           (network-attached SSD)

Why Hetzner Volumes, not Longhorn?

Simplicity - Hetzner manages redundancy (no Longhorn replication needed)
Network-attached - Pod can reschedule to any node (not node-specific like local storage)
Cost-effective - €1/month for 20Gi (vs cluster storage overhead)

For storage rationale, see Storage Architecture.

Communication:

Inbound: gRPC from Webservice, Workhorse, Sidekiq (port 8075)
Outbound: None (Gitaly is the source of truth for git data)

Operations handled:

git clone - Pack objects and send to client
git push - Receive pack, write to refs, run hooks
git diff - Tree comparison for merge requests
git blame - Line-by-line history (expensive operation)
Repository maintenance - Garbage collection, repacking

Resource profile:

CPU: 200m request, 500m limit (bursty during git operations)
Memory: 768Mi request, 1536Mi limit (large repos need memory for pack operations)

Sidekiq: Background Job Processor¶

What it does: Asynchronous work that doesn’t need immediate response

Architecture:

Ruby process (shares Rails codebase with Webservice)
Redis-based job queue
1 replica (for 2-5 users, 1 worker sufficient)
Stateless (jobs defined in Redis, data in PostgreSQL/S3)

Job types:

CI/CD pipeline execution
- Create runner jobs
- Process pipeline events
- Update pipeline status
Email delivery
- Notification emails (new issues, merge requests)
- Password resets, sign-up confirmations
Git housekeeping
- Repository garbage collection (via Gitaly)
- Repository statistics calculation
- Mirror updates (pull/push to external repos)
Search indexing
- Update search index for code, issues, merge requests
Webhook delivery
- HTTP POST to external services on git push, issue create, etc.

Why separate from Webservice?

Responsiveness - Web requests stay fast (no blocking on slow jobs)
Resource isolation - Long-running jobs don’t starve web workers
Retry logic - Failed jobs retried automatically (webhooks, emails)
Scalability - Can add more Sidekiq replicas independently (when scaling beyond 5 users)

Communication:

Webservice → Enqueue job → Redis
                              ↓
                          Sidekiq → Process job → PostgreSQL/Gitaly/S3

Resource profile:

CPU: 200m request, 500m limit (lower than webservice - less bursty)
Memory: 1.5Gi request, 2Gi limit (similar to webservice - same Rails app)

GitLab Shell: SSH Git Access¶

What it does: Handles git@gitlab.example.com:user/repo.git SSH operations

Architecture:

Go application handling SSH protocol
Authenticates against Webservice API
Proxies git operations to Gitaly
1 replica (SSH traffic low for 2-5 users)

SSH flow:

ssh git@gitlab.example.com
  ↓
gitlab-shell (pod) → Authenticate via Webservice API
  ↓
gitlab-shell → Gitaly (gRPC) → Execute git command
  ↓
Gitaly → Hetzner Volume (read/write .git data)
  ↓
gitlab-shell → Return git response to client

Why separate from Gitaly?

Security - SSH authentication and authorization separate from git operations
Protocol handling - SSH protocol (keys, agent forwarding) separate from git logic
Flexibility - Can swap authentication methods without changing Gitaly

SSH routing via Traefik:

GitLab BDA uses Traefik TCP routing instead of dedicated LoadBalancer:

# Traefik routes SSH port 22 to gitlab-shell service
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
spec:
  entryPoints: [ssh]
  routes:
    - match: HostSNI(`*`)
      services:
        - name: gitlab-gitlab-shell
          port: 22

Benefits:

Cost savings - No separate LoadBalancer for SSH (shares Traefik LB)
Consistent ingress - All traffic (HTTP, HTTPS, SSH) through single entry point
Easy port management - SSH on port 22 (standard), no custom ports

For SSH routing details, see Constructs API Reference.

Resource profile:

CPU: 50m request, 200m limit (lightweight - mostly I/O wait)
Memory: 128Mi request, 256Mi limit (minimal - just SSH protocol handling)

GitLab Pages: Static Site Hosting¶

What it does: Serves static websites built from git repositories

Architecture:

Go application serving HTTP
Reads HTML/CSS/JS from S3 (pages-gitlabbda-kup6s bucket)
1 replica (static sites, low traffic for 2-5 users)
Supports custom domains (via wildcard *.pages.staging.bluedynamics.eu)

Pages deployment flow:

1. Push to repo with .gitlab-ci.yml:
   pages:
     script: npm run build
     artifacts:
       paths: [public/]

2. CI job builds site → artifacts saved to S3 (artifacts bucket)

3. Pages daemon downloads artifacts → extracts to pages bucket

4. User visits https://project.pages.staging.bluedynamics.eu
   → Pages reads from S3 → Serves HTML

Why S3 storage?

Scalability - No disk space limits (S3 grows on demand)
CDN-ready - Can front S3 with Cloudflare later (global distribution)
Cost - €0.01/GB/month vs block storage (static sites are read-heavy)

Ingress routing:

Wildcard ingress for all Pages sites:

# Simplified (see constructs-api.md for full spec)
spec:
  rules:
    - host: '*.pages.staging.bluedynamics.eu'
      http:
        paths:
          - backend:
              service:
                name: gitlab-gitlab-pages

Benefits:

Zero configuration per site - New projects automatically get <project>.pages.staging.bluedynamics.eu
Custom domains - Can add specific domains via additional Ingress resources

Resource profile:

CPU: 50m request, 200m limit (lightweight - serving static files)
Memory: 128Mi request, 256Mi limit (minimal - mostly streaming from S3)

Toolbox: Backup and Restore Utility¶

What it does: gitlab-rake commands for maintenance, backup, restore

Architecture:

Deployment (not CronJob) - runs continuously, exec into for tasks
Rails environment (same as Webservice) with CLI tools
Accesses all data sources (PostgreSQL, Gitaly, S3)

Common operations:

# Exec into toolbox pod
kubectl exec -it deploy/gitlab-toolbox -n gitlabbda -- bash

# Create full GitLab backup
gitlab-backup create
# → Uploads to S3 (backups-gitlabbda-kup6s bucket)
# Includes: Database dump, repositories, uploads, LFS, artifacts, pages

# Restore from backup
gitlab-backup restore BACKUP=<timestamp>

# Check database status
gitlab-rake db:migrate:status

# Clear Redis cache
gitlab-rake cache:clear

# Recalculate repository sizes
gitlab-rake gitlab:cleanup:repos

Backup strategy:

GitLab Toolbox backups are application-aware (vs volume snapshots):

Backup Type	Scope	Frequency	Recovery Time
Toolbox S3	Full GitLab state	Daily (configured via CronJob)	2-4 hours
CNPG Barman	PostgreSQL only (PITR)	Continuous WAL + daily base	30-60 min
Longhorn CIFS	Volume snapshots	Daily	15-30 min (restore PVC)

When to use Toolbox backups:

Migration - Moving to different cluster (export/import all data)
Version upgrade rollback - Restore to pre-upgrade state
Disaster recovery - Entire namespace lost (restore from S3)

For backup details, see Storage Architecture.

External Dependencies¶

PostgreSQL (CloudNativePG)¶

What it does: Primary data store for GitLab application data

Data stored:

Users, groups, projects metadata
Issues, merge requests, CI/CD pipelines
Repository metadata (not .git data - that’s in Gitaly)
Access control (permissions, tokens)
Configuration settings

Why External PostgreSQL?

GitLab Helm chart includes built-in PostgreSQL - so why use CNPG instead?

Feature	Built-in PostgreSQL	CNPG PostgreSQL
High availability	Single instance (manual failover)	2 instances (auto-failover < 30s)
Backups	Manual via GitLab Toolbox	Automatic via Barman Cloud Plugin
Point-in-time recovery	No (full backups only)	Yes (WAL archiving to S3)
Connection pooling	PgBouncer sidecar	Dedicated Pooler CRD (better isolation)
Monitoring	Basic metrics	Rich metrics via PodMonitor
Storage management	Static PVC size	Longhorn snapshots, resizing
Upgrades	Manual (risky)	Automated via rolling updates

Key benefits of CNPG:

Automated failover - Primary crashes? Standby promoted automatically (< 30 seconds)
Continuous backups - WAL archiving to S3 (can recover to any point in time)
Read replicas - Can add read-only replicas for scaling (future)
Pooling - PgBouncer Pooler handles 1000 client connections with 25 backend connections
Kubernetes-native - Managed via CRDs (GitOps-friendly)

CNPG Architecture:

GitLab Webservice/Sidekiq
         ↓
gitlab-postgres-pooler (PgBouncer, 2 replicas)
         ↓
gitlab-postgres-1 (Primary) ⟷ gitlab-postgres-2 (Standby)
         ↓                              ↓
    PVC (10Gi)                      PVC (10Gi)
 longhorn-redundant-app         longhorn-redundant-app
     (1 replica)                    (1 replica)

Replication flow:

Write to Primary → PostgreSQL streaming replication → Standby
Standby stays in sync (< 1 second lag typically)
Primary fails → CNPG promotes Standby → Pooler redirects connections

Storage rationale:

longhorn-redundant-app (1 replica) - Each CNPG instance has 1 Longhorn replica
Why not 2 Longhorn replicas? - CNPG provides application-level replication (2 instances)
Total copies: 2 (CNPG instances) = adequate redundancy without waste

For storage details, see Storage Architecture.

Configuration:

Instances: 2 (primary + standby)
Pooler: 2 replicas (transaction pooling)
Max connections: 200 (DB), 1000 (pooler)
Backup: S3 (postgresbackups-gitlabbda-kup6s) via Barman Cloud Plugin

For CNPG cluster specification, see Constructs API Reference.

Redis: Cache and Job Queue¶

What it does: Fast in-memory data store for caching and background jobs

Data stored:

Session cache - User sessions, temporary auth tokens
Job queue - Sidekiq job definitions (pending, running, retrying)
Cache store - Rendered Markdown, API responses (volatile, can rebuild)
Shared state - Rate limiting counters, feature flags

Why External Redis?

GitLab Helm chart includes built-in Redis - why separate instance?

Feature	Built-in Redis	Dedicated Redis
Resource isolation	Shares pod resources	Dedicated CPU/memory
Persistence	Optional	Controlled (AOF + snapshots)
Monitoring	Limited	Dedicated metrics
Scaling	Coupled to GitLab	Independent
Sentinel HA	No (single instance)	Future: Redis Sentinel

Key benefits of dedicated Redis:

Resource guarantees - 128-512Mi memory reserved (GitLab cache won’t evict job queue data)
Persistence control - AOF (append-only file) + snapshots ensure job queue survives pod restart
Independent scaling - Can increase memory without scaling entire GitLab deployment
Future-proof - Can migrate to Redis Sentinel (HA) when scaling beyond 5 users

Redis Architecture:

GitLab Webservice (cache reads/writes)
         ↓
    Redis Service (ClusterIP)
         ↓
    Redis StatefulSet (1 replica)
         ↓
    PVC (10Gi, longhorn, 2 replicas)
         ↓
    /data (AOF + snapshots)

Persistence strategy:

redis-server --appendonly yes --save 60 1

AOF (append-only file) - Every write logged to disk (fsync every second)
Snapshot - Full data dump if ≥1 key changed in 60 seconds
Recovery - Pod restart → Replay AOF → Full state recovered

Why persistence?

Job queue - Sidekiq jobs must survive restarts (email delivery, CI jobs)
Sessions - User sessions preserved (no forced logouts during pod restart)
Cache - Nice-to-have (can rebuild, but faster to restore)

Storage rationale:

longhorn (2 replicas) - Redis is single instance (no clustering for 2-5 users)
Why 2 Longhorn replicas? - No application-level replication (unlike CNPG)
Cost: 10Gi × 2 = 20Gi cluster storage (acceptable for HA)

For storage details, see Storage Architecture.

Configuration:

Replicas: 1 (single instance, sufficient for 2-5 users)
Memory: 128-512Mi (cache + queue for small team)
Persistence: AOF + snapshots (every 60s if data changed)

For Redis specification, see Constructs API Reference.

Future: Redis Sentinel

When scaling beyond 5 users, consider Redis Sentinel (HA):

3 Redis instances (1 primary + 2 replicas)
Sentinel processes monitor health
Auto-failover if primary fails (< 10 seconds)
Cost: 3× memory (30Gi vs 10Gi)

Component Communication Patterns¶

HTTP Request Flow¶

Web UI request (e.g., view merge request):

1. User → https://gitlab.staging.bluedynamics.eu/project/merge_requests/1

2. Traefik Ingress (TLS termination, routing)
   ↓
3. Workhorse (port 8181) → Check cache (Redis)
   ↓ Cache miss
4. Workhorse → Webservice (port 8080)
   ↓
5. Webservice → PostgreSQL (query MR data)
   Webservice → Gitaly (get diff via gRPC)
   Webservice → Redis (update cache)
   ↓
6. Webservice → Render HTML
   ↓
7. Workhorse → Return to user

Git HTTP operation (e.g., git clone):

1. git clone https://gitlab.staging.bluedynamics.eu/user/repo.git

2. Traefik → Workhorse
   ↓
3. Workhorse → Webservice (authenticate user, authorize repository access)
   ↓
4. Webservice → PostgreSQL (check permissions)
   ↓
5. Webservice → Return OK + Gitaly endpoint to Workhorse
   ↓
6. Workhorse → Gitaly (gRPC: RepositoryService.GetObjects)
   ↓
7. Gitaly → Read .git data from Hetzner Volume
   ↓
8. Gitaly → Stream pack file to Workhorse
   ↓
9. Workhorse → Stream to git client

Key insight: Workhorse handles streaming (step 6-9) without involving Rails (efficient).

Background Job Flow¶

CI/CD pipeline trigger (e.g., git push):

1. git push origin main

2. GitLab Shell (SSH) or Workhorse (HTTPS) → Gitaly (write to repo)
   ↓
3. Gitaly → Post-receive hook → Webservice API (notify of push)
   ↓
4. Webservice → Check .gitlab-ci.yml (via Gitaly)
   Webservice → Create pipeline record (PostgreSQL)
   Webservice → Enqueue job (Redis: CreatePipelineWorker)
   ↓
5. Sidekiq → Dequeue CreatePipelineWorker from Redis
   ↓
6. Sidekiq → Create CI jobs (PostgreSQL)
   Sidekiq → Enqueue job (Redis: PipelineProcessWorker)
   ↓
7. Sidekiq → Dequeue PipelineProcessWorker
   ↓
8. Sidekiq → Assign jobs to GitLab Runners
   ↓
9. Runners → Execute jobs → Upload artifacts to S3
   ↓
10. Runners → Report status to Webservice API
   ↓
11. Webservice → Update pipeline status (PostgreSQL)
   ↓
12. User refreshes UI → See pipeline results

Why async jobs?

Push responsiveness - git push returns immediately (doesn’t wait for CI)
Retry logic - Job fails? Sidekiq retries automatically
Scalability - Can add more Sidekiq workers to process jobs faster

Storage Access Patterns¶

Different components access different storage tiers:

Component	Hetzner Volumes	Longhorn PVC	S3 Buckets	PostgreSQL	Redis
Webservice	❌	❌	✅ (pre-signed URLs)	✅ (read/write)	✅ (cache)
Gitaly	✅ (20Gi)	❌	❌	❌	❌
Sidekiq	❌	❌	✅ (upload artifacts)	✅ (read/write)	✅ (job queue)
Pages	❌	❌	✅ (pages bucket)	❌	❌
Workhorse	❌	❌	✅ (direct upload)	❌	❌
PostgreSQL	❌	✅ (10Gi × 2)	✅ (backups)	N/A	❌
Redis	❌	✅ (10Gi)	❌	❌	N/A

Key pattern: Storage separation by access type

Shared state (database, cache) → PostgreSQL/Redis (low-latency access needed)
Large files (artifacts, uploads) → S3 (scalable, cost-effective)
Git repositories → Hetzner Volumes (simple, reliable block storage)

Scaling Considerations¶

Current Configuration (2-5 users)¶

Component	Replicas	Rationale
Webservice	2	HA (zero-downtime deploys)
Workhorse	2	Follows webservice
Sidekiq	1	Low job volume
Gitaly	1	Stateful (single PVC)
Shell	1	Low SSH traffic
Pages	1	Low page views
PostgreSQL	2	HA (CNPG instances)
Redis	1	Sufficient for cache + queue

Scaling to 10-20 Users¶

Bottlenecks to watch:

Sidekiq queue depth
- Symptom: Jobs waiting >1 minute before processing
- Solution: Increase Sidekiq replicas (1 → 2 or 3)
Database connections
- Symptom: remaining connection slots reserved errors
- Solution: Increase pooler replicas or max_connections (200 → 400)
Redis memory
- Symptom: Cache evictions, slow job queue
- Solution: Increase Redis memory (512Mi → 1Gi)
Gitaly storage
- Symptom: PVC 80% full
- Solution: Resize Hetzner Volume (20Gi → 50Gi) - no downtime needed

For resource sizing, see Resource Requirements Reference.

Scaling Beyond 20 Users¶

Architectural changes needed:

Horizontal Gitaly scaling - Shard repositories across multiple Gitaly instances (Praefect)
Redis Sentinel - HA Redis with auto-failover (3 instances)
PostgreSQL read replicas - Offload read traffic from primary (CNPG supports this)
Separate Harbor PostgreSQL - Dedicated database for registry (reduce contention)

For scaling architecture, see Architecture Overview.

Health Checks and Probes¶

Liveness Probes (Restart on failure)¶

Webservice:

livenessProbe:
  httpGet:
    path: /-/liveness
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

Gitaly:

livenessProbe:
  exec:
    command: ['/scripts/healthcheck']
  initialDelaySeconds: 30

Redis:

livenessProbe:
  tcpSocket:
    port: 6379
  initialDelaySeconds: 30

Readiness Probes (Remove from service on failure)¶

Webservice:

readinessProbe:
  httpGet:
    path: /-/readiness
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

PostgreSQL (CNPG built-in):

Checks replication lag (< 1MB behind primary)
Checks database accepting connections
Removes replica from pooler if unhealthy

Why both probes?

Liveness: Detect deadlocks (restart pod)
Readiness: Detect slow startup (don’t send traffic until ready)

Troubleshooting Component Issues¶

Webservice Pod CrashLoopBackOff¶

Symptoms:

Webservice pods repeatedly crashing
Logs show PG::ConnectionBad or Redis::CannotConnectError

Diagnosis:

kubectl logs deploy/gitlab-webservice -n gitlabbda | grep -i error

Common causes:

PostgreSQL not ready - Check CNPG cluster status

kubectl get cluster gitlab-postgres -n gitlabbda

Redis not ready - Check Redis pod

kubectl get pods -l app.kubernetes.io/name=redis -n gitlabbda

S3 credentials invalid - Check ExternalSecret sync status

kubectl get externalsecret gitlab-s3-credentials -n gitlabbda

Solution: Fix dependency, webservice will recover automatically (readiness probe).

Gitaly High CPU¶

Symptoms:

Gitaly pod using 80-100% CPU
Git operations slow (clone, diff take minutes)

Diagnosis:

kubectl top pod -l app.kubernetes.io/name=gitaly -n gitlabbda

Common causes:

Large git operation (e.g., cloning 5GB repo) - Normal, temporary
Repository garbage collection - Scheduled housekeeping task
Many concurrent git operations - Multiple users pushing simultaneously

Solution:

Temporary spike: Wait for operation to complete
Sustained high CPU: Consider adding CPU limit headroom or optimizing repository

Sidekiq Jobs Not Processing¶

Symptoms:

Emails not sending
CI pipelines stuck in “pending”
Background tasks accumulating

Diagnosis:

kubectl logs deploy/gitlab-sidekiq -n gitlabbda | tail -n 100

Common causes:

Sidekiq pod crashed - Check pod status
Redis connection lost - Check Redis logs
Job stuck (infinite loop or deadlock) - Check Sidekiq logs for repeated errors

Solution:

# Restart Sidekiq
kubectl rollout restart deploy/gitlab-sidekiq -n gitlabbda

# If jobs still stuck, check Redis queue
kubectl exec -it deploy/redis -n gitlabbda -- redis-cli
> LLEN resque:queue:default  # Check queue length

For complete troubleshooting guide, see Troubleshooting Reference.

Summary¶

GitLab is a microservices platform with clear separation of concerns:

Webservice - HTTP requests, UI rendering
Workhorse - Efficient file handling, proxying
Gitaly - Git operations (the source of truth for .git data)
Sidekiq - Asynchronous jobs (CI, emails, housekeeping)
GitLab Shell - SSH git access
GitLab Pages - Static site hosting

External dependencies provide critical HA and management benefits:

PostgreSQL (CNPG) - Automated failover, point-in-time recovery, connection pooling
Redis - Dedicated resources, persistence control, independent scaling

Communication patterns:

Synchronous - HTTP/gRPC for user-facing operations (Webservice ↔ Gitaly)
Asynchronous - Redis job queue for background work (Webservice → Sidekiq)
Storage tier separation - Block storage (Gitaly), replicated storage (DB/cache), object storage (artifacts/uploads)

For implementation details:

Constructs API Reference - CDK8S components
Configuration Reference - Configuration fields
Storage Architecture - Storage tier details
Resource Requirements - Resource sizing