Explanation
GitLab Components Architecture¶
Overview¶
GitLab is a microservices application - not a single monolithic container. The GitLab BDA deployment uses 8+ microservices working together, plus external PostgreSQL and Redis. Understanding how these components interact is essential for operations, troubleshooting, and scaling.
This document explains HOW GitLab components work together and WHY we use external dependencies instead of GitLab’s built-in database/cache.
Component Architecture¶
GitLab Components (Official Helm Chart)¶
The GitLab platform consists of these interconnected microservices:
Component |
Role |
Protocol |
Dependencies |
|---|---|---|---|
Webservice |
Rails app handling HTTP |
HTTP (8080) |
PostgreSQL, Redis, Gitaly |
Gitaly |
Git server |
gRPC (8075) |
Hetzner Cloud Volumes (storage) |
Workhorse |
Smart reverse proxy |
HTTP (8181) |
Webservice, Gitaly, S3 |
Sidekiq |
Background job processor |
Internal |
PostgreSQL, Redis, Gitaly, S3 |
GitLab Shell |
SSH git operations |
SSH (22) |
Gitaly, Redis |
GitLab Pages |
Static site hosting |
HTTP (8090) |
S3 (pages bucket) |
Toolbox |
Backup/restore utility |
Internal |
PostgreSQL, Gitaly, S3 |
###External Dependencies
Component |
Role |
Why External? |
|---|---|---|
PostgreSQL (CNPG) |
Primary database |
HA, automated backups, pooling, monitoring |
Redis |
Cache + job queue |
Dedicated resources, persistence control |
Harbor |
Container registry |
Vulnerability scanning, OAuth, separate lifecycle |
Why separate from GitLab?
High availability - CNPG provides automated failover for PostgreSQL
Resource isolation - Database/cache don’t compete with GitLab pods
Independent scaling - Scale database separately from application
Better observability - CNPG provides rich PostgreSQL metrics
Backup control - Barman Cloud Plugin for PostgreSQL-aware backups
For storage details, see Storage Architecture.
Core Components Explained¶
Webservice: The Rails Application¶
What it does: HTTP request handling, UI rendering, API endpoints
Architecture:
Ruby on Rails application
GitLab CE (Community Edition) v18.5.1
2 replicas for high availability (can handle pod restarts without downtime)
Stateless (all state in PostgreSQL/Redis/Gitaly)
Handles:
Web UI (projects, issues, merge requests, CI/CD pipelines)
REST API (
/api/v4/*)GraphQL API (
/api/graphql)User authentication and authorization
Git HTTP operations (
/gitlab/my-project.git/*)
Communication:
User → Traefik Ingress → Workhorse → Webservice
↓
PostgreSQL ← → Redis
↓
Gitaly
Resource profile:
CPU: 300m request, 1 core limit (request-intensive)
Memory: 2Gi request, 2.5Gi limit (Rails is memory-hungry)
Why 2 replicas?
Zero-downtime deployments - Rolling updates keep 1 instance available
Load balancing - Distribute HTTP requests across instances
Pod failure resilience - Service stays available if 1 pod crashes
Stateless design: No local disk needed - all data in PostgreSQL, Redis, S3, or Gitaly.
Workhorse: Smart Reverse Proxy¶
What it does: Efficient file handling, offloading slow operations from Rails
Architecture:
Written in Go (fast, low memory)
Sits between Traefik and Webservice
2 replicas (same as webservice)
Handles large file uploads/downloads without tying up Rails workers
Key features:
Direct S3 uploads - Files uploaded directly to S3 (artifacts, LFS), bypassing Rails
User → Workhorse → S3 (Rails just generates presigned URL, Workhorse handles upload)
Git HTTP traffic - Proxies git clone/push to Gitaly (not through Rails)
git clone https://gitlab.../repo.git → Workhorse → Gitaly (gRPC) → Hetzner Volume (20Gi)
Artifact streaming - Streams CI artifacts from S3 without buffering in memory
Websocket proxying - Terminal access in CI jobs, interactive web IDE
Why separate from Webservice?
Performance - Go is faster than Ruby for I/O operations
Resource efficiency - Doesn’t consume Rails worker threads for file transfers
Scalability - Can scale independently if file traffic increases
Communication:
Inbound: HTTP from Traefik (port 8181)
Outbound: HTTP to Webservice (port 8080), gRPC to Gitaly (port 8075)
Gitaly: Git Repository Server¶
What it does: All git operations (clone, fetch, push, diff, blame, etc.)
Architecture:
Written in Go (git performance-critical)
gRPC API for git operations
1 replica (stateful - bound to specific PVC)
20Gi Hetzner Cloud Volume (network-attached, not Longhorn)
Why separate from Webservice?
Performance - Native git operations (libgit2) faster than shelling out from Ruby
Resource isolation - Git operations (especially large repos) are CPU/memory intensive
Security - Sandboxed git execution (prevents repository corruption from bad commits)
Storage architecture:
Gitaly pod → PVC (hcloud-volumes, 20Gi) → Hetzner Cloud Volume
(network-attached SSD)
Why Hetzner Volumes, not Longhorn?
Simplicity - Hetzner manages redundancy (no Longhorn replication needed)
Network-attached - Pod can reschedule to any node (not node-specific like local storage)
Cost-effective - €1/month for 20Gi (vs cluster storage overhead)
For storage rationale, see Storage Architecture.
Communication:
Inbound: gRPC from Webservice, Workhorse, Sidekiq (port 8075)
Outbound: None (Gitaly is the source of truth for git data)
Operations handled:
git clone- Pack objects and send to clientgit push- Receive pack, write to refs, run hooksgit diff- Tree comparison for merge requestsgit blame- Line-by-line history (expensive operation)Repository maintenance - Garbage collection, repacking
Resource profile:
CPU: 200m request, 500m limit (bursty during git operations)
Memory: 768Mi request, 1536Mi limit (large repos need memory for pack operations)
Sidekiq: Background Job Processor¶
What it does: Asynchronous work that doesn’t need immediate response
Architecture:
Ruby process (shares Rails codebase with Webservice)
Redis-based job queue
1 replica (for 2-5 users, 1 worker sufficient)
Stateless (jobs defined in Redis, data in PostgreSQL/S3)
Job types:
CI/CD pipeline execution
Create runner jobs
Process pipeline events
Update pipeline status
Email delivery
Notification emails (new issues, merge requests)
Password resets, sign-up confirmations
Git housekeeping
Repository garbage collection (via Gitaly)
Repository statistics calculation
Mirror updates (pull/push to external repos)
Search indexing
Update search index for code, issues, merge requests
Webhook delivery
HTTP POST to external services on git push, issue create, etc.
Why separate from Webservice?
Responsiveness - Web requests stay fast (no blocking on slow jobs)
Resource isolation - Long-running jobs don’t starve web workers
Retry logic - Failed jobs retried automatically (webhooks, emails)
Scalability - Can add more Sidekiq replicas independently (when scaling beyond 5 users)
Communication:
Webservice → Enqueue job → Redis
↓
Sidekiq → Process job → PostgreSQL/Gitaly/S3
Resource profile:
CPU: 200m request, 500m limit (lower than webservice - less bursty)
Memory: 1.5Gi request, 2Gi limit (similar to webservice - same Rails app)
GitLab Shell: SSH Git Access¶
What it does: Handles git@gitlab.example.com:user/repo.git SSH operations
Architecture:
Go application handling SSH protocol
Authenticates against Webservice API
Proxies git operations to Gitaly
1 replica (SSH traffic low for 2-5 users)
SSH flow:
ssh git@gitlab.example.com
↓
gitlab-shell (pod) → Authenticate via Webservice API
↓
gitlab-shell → Gitaly (gRPC) → Execute git command
↓
Gitaly → Hetzner Volume (read/write .git data)
↓
gitlab-shell → Return git response to client
Why separate from Gitaly?
Security - SSH authentication and authorization separate from git operations
Protocol handling - SSH protocol (keys, agent forwarding) separate from git logic
Flexibility - Can swap authentication methods without changing Gitaly
SSH routing via Traefik:
GitLab BDA uses Traefik TCP routing instead of dedicated LoadBalancer:
# Traefik routes SSH port 22 to gitlab-shell service
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
spec:
entryPoints: [ssh]
routes:
- match: HostSNI(`*`)
services:
- name: gitlab-gitlab-shell
port: 22
Benefits:
Cost savings - No separate LoadBalancer for SSH (shares Traefik LB)
Consistent ingress - All traffic (HTTP, HTTPS, SSH) through single entry point
Easy port management - SSH on port 22 (standard), no custom ports
For SSH routing details, see Constructs API Reference.
Resource profile:
CPU: 50m request, 200m limit (lightweight - mostly I/O wait)
Memory: 128Mi request, 256Mi limit (minimal - just SSH protocol handling)
GitLab Pages: Static Site Hosting¶
What it does: Serves static websites built from git repositories
Architecture:
Go application serving HTTP
Reads HTML/CSS/JS from S3 (pages-gitlabbda-kup6s bucket)
1 replica (static sites, low traffic for 2-5 users)
Supports custom domains (via wildcard
*.pages.staging.bluedynamics.eu)
Pages deployment flow:
1. Push to repo with .gitlab-ci.yml:
pages:
script: npm run build
artifacts:
paths: [public/]
2. CI job builds site → artifacts saved to S3 (artifacts bucket)
3. Pages daemon downloads artifacts → extracts to pages bucket
4. User visits https://project.pages.staging.bluedynamics.eu
→ Pages reads from S3 → Serves HTML
Why S3 storage?
Scalability - No disk space limits (S3 grows on demand)
CDN-ready - Can front S3 with Cloudflare later (global distribution)
Cost - €0.01/GB/month vs block storage (static sites are read-heavy)
Ingress routing:
Wildcard ingress for all Pages sites:
# Simplified (see constructs-api.md for full spec)
spec:
rules:
- host: '*.pages.staging.bluedynamics.eu'
http:
paths:
- backend:
service:
name: gitlab-gitlab-pages
Benefits:
Zero configuration per site - New projects automatically get
<project>.pages.staging.bluedynamics.euCustom domains - Can add specific domains via additional Ingress resources
Resource profile:
CPU: 50m request, 200m limit (lightweight - serving static files)
Memory: 128Mi request, 256Mi limit (minimal - mostly streaming from S3)
Toolbox: Backup and Restore Utility¶
What it does: gitlab-rake commands for maintenance, backup, restore
Architecture:
Deployment (not CronJob) - runs continuously, exec into for tasks
Rails environment (same as Webservice) with CLI tools
Accesses all data sources (PostgreSQL, Gitaly, S3)
Common operations:
# Exec into toolbox pod
kubectl exec -it deploy/gitlab-toolbox -n gitlabbda -- bash
# Create full GitLab backup
gitlab-backup create
# → Uploads to S3 (backups-gitlabbda-kup6s bucket)
# Includes: Database dump, repositories, uploads, LFS, artifacts, pages
# Restore from backup
gitlab-backup restore BACKUP=<timestamp>
# Check database status
gitlab-rake db:migrate:status
# Clear Redis cache
gitlab-rake cache:clear
# Recalculate repository sizes
gitlab-rake gitlab:cleanup:repos
Backup strategy:
GitLab Toolbox backups are application-aware (vs volume snapshots):
Backup Type |
Scope |
Frequency |
Recovery Time |
|---|---|---|---|
Toolbox S3 |
Full GitLab state |
Daily (configured via CronJob) |
2-4 hours |
CNPG Barman |
PostgreSQL only (PITR) |
Continuous WAL + daily base |
30-60 min |
Longhorn CIFS |
Volume snapshots |
Daily |
15-30 min (restore PVC) |
When to use Toolbox backups:
Migration - Moving to different cluster (export/import all data)
Version upgrade rollback - Restore to pre-upgrade state
Disaster recovery - Entire namespace lost (restore from S3)
For backup details, see Storage Architecture.
External Dependencies¶
PostgreSQL (CloudNativePG)¶
What it does: Primary data store for GitLab application data
Data stored:
Users, groups, projects metadata
Issues, merge requests, CI/CD pipelines
Repository metadata (not
.gitdata - that’s in Gitaly)Access control (permissions, tokens)
Configuration settings
Why External PostgreSQL?
GitLab Helm chart includes built-in PostgreSQL - so why use CNPG instead?
Feature |
Built-in PostgreSQL |
CNPG PostgreSQL |
|---|---|---|
High availability |
Single instance (manual failover) |
2 instances (auto-failover < 30s) |
Backups |
Manual via GitLab Toolbox |
Automatic via Barman Cloud Plugin |
Point-in-time recovery |
No (full backups only) |
Yes (WAL archiving to S3) |
Connection pooling |
PgBouncer sidecar |
Dedicated Pooler CRD (better isolation) |
Monitoring |
Basic metrics |
Rich metrics via PodMonitor |
Storage management |
Static PVC size |
Longhorn snapshots, resizing |
Upgrades |
Manual (risky) |
Automated via rolling updates |
Key benefits of CNPG:
Automated failover - Primary crashes? Standby promoted automatically (< 30 seconds)
Continuous backups - WAL archiving to S3 (can recover to any point in time)
Read replicas - Can add read-only replicas for scaling (future)
Pooling - PgBouncer Pooler handles 1000 client connections with 25 backend connections
Kubernetes-native - Managed via CRDs (GitOps-friendly)
CNPG Architecture:
GitLab Webservice/Sidekiq
↓
gitlab-postgres-pooler (PgBouncer, 2 replicas)
↓
gitlab-postgres-1 (Primary) ⟷ gitlab-postgres-2 (Standby)
↓ ↓
PVC (10Gi) PVC (10Gi)
longhorn-redundant-app longhorn-redundant-app
(1 replica) (1 replica)
Replication flow:
Write to Primary → PostgreSQL streaming replication → Standby
Standby stays in sync (< 1 second lag typically)
Primary fails → CNPG promotes Standby → Pooler redirects connections
Storage rationale:
longhorn-redundant-app(1 replica) - Each CNPG instance has 1 Longhorn replicaWhy not 2 Longhorn replicas? - CNPG provides application-level replication (2 instances)
Total copies: 2 (CNPG instances) = adequate redundancy without waste
For storage details, see Storage Architecture.
Configuration:
Instances: 2 (primary + standby)
Pooler: 2 replicas (transaction pooling)
Max connections: 200 (DB), 1000 (pooler)
Backup: S3 (postgresbackups-gitlabbda-kup6s) via Barman Cloud Plugin
For CNPG cluster specification, see Constructs API Reference.
Redis: Cache and Job Queue¶
What it does: Fast in-memory data store for caching and background jobs
Data stored:
Session cache - User sessions, temporary auth tokens
Job queue - Sidekiq job definitions (pending, running, retrying)
Cache store - Rendered Markdown, API responses (volatile, can rebuild)
Shared state - Rate limiting counters, feature flags
Why External Redis?
GitLab Helm chart includes built-in Redis - why separate instance?
Feature |
Built-in Redis |
Dedicated Redis |
|---|---|---|
Resource isolation |
Shares pod resources |
Dedicated CPU/memory |
Persistence |
Optional |
Controlled (AOF + snapshots) |
Monitoring |
Limited |
Dedicated metrics |
Scaling |
Coupled to GitLab |
Independent |
Sentinel HA |
No (single instance) |
Future: Redis Sentinel |
Key benefits of dedicated Redis:
Resource guarantees - 128-512Mi memory reserved (GitLab cache won’t evict job queue data)
Persistence control - AOF (append-only file) + snapshots ensure job queue survives pod restart
Independent scaling - Can increase memory without scaling entire GitLab deployment
Future-proof - Can migrate to Redis Sentinel (HA) when scaling beyond 5 users
Redis Architecture:
GitLab Webservice (cache reads/writes)
↓
Redis Service (ClusterIP)
↓
Redis StatefulSet (1 replica)
↓
PVC (10Gi, longhorn, 2 replicas)
↓
/data (AOF + snapshots)
Persistence strategy:
redis-server --appendonly yes --save 60 1
AOF (append-only file) - Every write logged to disk (fsync every second)
Snapshot - Full data dump if ≥1 key changed in 60 seconds
Recovery - Pod restart → Replay AOF → Full state recovered
Why persistence?
Job queue - Sidekiq jobs must survive restarts (email delivery, CI jobs)
Sessions - User sessions preserved (no forced logouts during pod restart)
Cache - Nice-to-have (can rebuild, but faster to restore)
Storage rationale:
longhorn(2 replicas) - Redis is single instance (no clustering for 2-5 users)Why 2 Longhorn replicas? - No application-level replication (unlike CNPG)
Cost: 10Gi × 2 = 20Gi cluster storage (acceptable for HA)
For storage details, see Storage Architecture.
Configuration:
Replicas: 1 (single instance, sufficient for 2-5 users)
Memory: 128-512Mi (cache + queue for small team)
Persistence: AOF + snapshots (every 60s if data changed)
For Redis specification, see Constructs API Reference.
Future: Redis Sentinel
When scaling beyond 5 users, consider Redis Sentinel (HA):
3 Redis instances (1 primary + 2 replicas)
Sentinel processes monitor health
Auto-failover if primary fails (< 10 seconds)
Cost: 3× memory (30Gi vs 10Gi)
Component Communication Patterns¶
HTTP Request Flow¶
Web UI request (e.g., view merge request):
1. User → https://gitlab.staging.bluedynamics.eu/project/merge_requests/1
2. Traefik Ingress (TLS termination, routing)
↓
3. Workhorse (port 8181) → Check cache (Redis)
↓ Cache miss
4. Workhorse → Webservice (port 8080)
↓
5. Webservice → PostgreSQL (query MR data)
Webservice → Gitaly (get diff via gRPC)
Webservice → Redis (update cache)
↓
6. Webservice → Render HTML
↓
7. Workhorse → Return to user
Git HTTP operation (e.g., git clone):
1. git clone https://gitlab.staging.bluedynamics.eu/user/repo.git
2. Traefik → Workhorse
↓
3. Workhorse → Webservice (authenticate user, authorize repository access)
↓
4. Webservice → PostgreSQL (check permissions)
↓
5. Webservice → Return OK + Gitaly endpoint to Workhorse
↓
6. Workhorse → Gitaly (gRPC: RepositoryService.GetObjects)
↓
7. Gitaly → Read .git data from Hetzner Volume
↓
8. Gitaly → Stream pack file to Workhorse
↓
9. Workhorse → Stream to git client
Key insight: Workhorse handles streaming (step 6-9) without involving Rails (efficient).
Background Job Flow¶
CI/CD pipeline trigger (e.g., git push):
1. git push origin main
2. GitLab Shell (SSH) or Workhorse (HTTPS) → Gitaly (write to repo)
↓
3. Gitaly → Post-receive hook → Webservice API (notify of push)
↓
4. Webservice → Check .gitlab-ci.yml (via Gitaly)
Webservice → Create pipeline record (PostgreSQL)
Webservice → Enqueue job (Redis: CreatePipelineWorker)
↓
5. Sidekiq → Dequeue CreatePipelineWorker from Redis
↓
6. Sidekiq → Create CI jobs (PostgreSQL)
Sidekiq → Enqueue job (Redis: PipelineProcessWorker)
↓
7. Sidekiq → Dequeue PipelineProcessWorker
↓
8. Sidekiq → Assign jobs to GitLab Runners
↓
9. Runners → Execute jobs → Upload artifacts to S3
↓
10. Runners → Report status to Webservice API
↓
11. Webservice → Update pipeline status (PostgreSQL)
↓
12. User refreshes UI → See pipeline results
Why async jobs?
Push responsiveness -
git pushreturns immediately (doesn’t wait for CI)Retry logic - Job fails? Sidekiq retries automatically
Scalability - Can add more Sidekiq workers to process jobs faster
Storage Access Patterns¶
Different components access different storage tiers:
Component |
Hetzner Volumes |
Longhorn PVC |
S3 Buckets |
PostgreSQL |
Redis |
|---|---|---|---|---|---|
Webservice |
❌ |
❌ |
✅ (pre-signed URLs) |
✅ (read/write) |
✅ (cache) |
Gitaly |
✅ (20Gi) |
❌ |
❌ |
❌ |
❌ |
Sidekiq |
❌ |
❌ |
✅ (upload artifacts) |
✅ (read/write) |
✅ (job queue) |
Pages |
❌ |
❌ |
✅ (pages bucket) |
❌ |
❌ |
Workhorse |
❌ |
❌ |
✅ (direct upload) |
❌ |
❌ |
PostgreSQL |
❌ |
✅ (10Gi × 2) |
✅ (backups) |
N/A |
❌ |
Redis |
❌ |
✅ (10Gi) |
❌ |
❌ |
N/A |
Key pattern: Storage separation by access type
Shared state (database, cache) → PostgreSQL/Redis (low-latency access needed)
Large files (artifacts, uploads) → S3 (scalable, cost-effective)
Git repositories → Hetzner Volumes (simple, reliable block storage)
Scaling Considerations¶
Current Configuration (2-5 users)¶
Component |
Replicas |
Rationale |
|---|---|---|
Webservice |
2 |
HA (zero-downtime deploys) |
Workhorse |
2 |
Follows webservice |
Sidekiq |
1 |
Low job volume |
Gitaly |
1 |
Stateful (single PVC) |
Shell |
1 |
Low SSH traffic |
Pages |
1 |
Low page views |
PostgreSQL |
2 |
HA (CNPG instances) |
Redis |
1 |
Sufficient for cache + queue |
Scaling to 10-20 Users¶
Bottlenecks to watch:
Sidekiq queue depth
Symptom: Jobs waiting >1 minute before processing
Solution: Increase Sidekiq replicas (1 → 2 or 3)
Database connections
Symptom:
remaining connection slots reservederrorsSolution: Increase pooler replicas or max_connections (200 → 400)
Redis memory
Symptom: Cache evictions, slow job queue
Solution: Increase Redis memory (512Mi → 1Gi)
Gitaly storage
Symptom: PVC 80% full
Solution: Resize Hetzner Volume (20Gi → 50Gi) - no downtime needed
For resource sizing, see Resource Requirements Reference.
Scaling Beyond 20 Users¶
Architectural changes needed:
Horizontal Gitaly scaling - Shard repositories across multiple Gitaly instances (Praefect)
Redis Sentinel - HA Redis with auto-failover (3 instances)
PostgreSQL read replicas - Offload read traffic from primary (CNPG supports this)
Separate Harbor PostgreSQL - Dedicated database for registry (reduce contention)
For scaling architecture, see Architecture Overview.
Health Checks and Probes¶
Liveness Probes (Restart on failure)¶
Webservice:
livenessProbe:
httpGet:
path: /-/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
Gitaly:
livenessProbe:
exec:
command: ['/scripts/healthcheck']
initialDelaySeconds: 30
Redis:
livenessProbe:
tcpSocket:
port: 6379
initialDelaySeconds: 30
Readiness Probes (Remove from service on failure)¶
Webservice:
readinessProbe:
httpGet:
path: /-/readiness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
PostgreSQL (CNPG built-in):
Checks replication lag (< 1MB behind primary)
Checks database accepting connections
Removes replica from pooler if unhealthy
Why both probes?
Liveness: Detect deadlocks (restart pod)
Readiness: Detect slow startup (don’t send traffic until ready)
Troubleshooting Component Issues¶
Webservice Pod CrashLoopBackOff¶
Symptoms:
Webservice pods repeatedly crashing
Logs show
PG::ConnectionBadorRedis::CannotConnectError
Diagnosis:
kubectl logs deploy/gitlab-webservice -n gitlabbda | grep -i error
Common causes:
PostgreSQL not ready - Check CNPG cluster status
kubectl get cluster gitlab-postgres -n gitlabbdaRedis not ready - Check Redis pod
kubectl get pods -l app.kubernetes.io/name=redis -n gitlabbdaS3 credentials invalid - Check ExternalSecret sync status
kubectl get externalsecret gitlab-s3-credentials -n gitlabbda
Solution: Fix dependency, webservice will recover automatically (readiness probe).
Gitaly High CPU¶
Symptoms:
Gitaly pod using 80-100% CPU
Git operations slow (clone, diff take minutes)
Diagnosis:
kubectl top pod -l app.kubernetes.io/name=gitaly -n gitlabbda
Common causes:
Large git operation (e.g., cloning 5GB repo) - Normal, temporary
Repository garbage collection - Scheduled housekeeping task
Many concurrent git operations - Multiple users pushing simultaneously
Solution:
Temporary spike: Wait for operation to complete
Sustained high CPU: Consider adding CPU limit headroom or optimizing repository
Sidekiq Jobs Not Processing¶
Symptoms:
Emails not sending
CI pipelines stuck in “pending”
Background tasks accumulating
Diagnosis:
kubectl logs deploy/gitlab-sidekiq -n gitlabbda | tail -n 100
Common causes:
Sidekiq pod crashed - Check pod status
Redis connection lost - Check Redis logs
Job stuck (infinite loop or deadlock) - Check Sidekiq logs for repeated errors
Solution:
# Restart Sidekiq
kubectl rollout restart deploy/gitlab-sidekiq -n gitlabbda
# If jobs still stuck, check Redis queue
kubectl exec -it deploy/redis -n gitlabbda -- redis-cli
> LLEN resque:queue:default # Check queue length
For complete troubleshooting guide, see Troubleshooting Reference.
Summary¶
GitLab is a microservices platform with clear separation of concerns:
Webservice - HTTP requests, UI rendering
Workhorse - Efficient file handling, proxying
Gitaly - Git operations (the source of truth for
.gitdata)Sidekiq - Asynchronous jobs (CI, emails, housekeeping)
GitLab Shell - SSH git access
GitLab Pages - Static site hosting
External dependencies provide critical HA and management benefits:
PostgreSQL (CNPG) - Automated failover, point-in-time recovery, connection pooling
Redis - Dedicated resources, persistence control, independent scaling
Communication patterns:
Synchronous - HTTP/gRPC for user-facing operations (Webservice ↔ Gitaly)
Asynchronous - Redis job queue for background work (Webservice → Sidekiq)
Storage tier separation - Block storage (Gitaly), replicated storage (DB/cache), object storage (artifacts/uploads)
For implementation details:
Constructs API Reference - CDK8S components
Configuration Reference - Configuration fields
Storage Architecture - Storage tier details
Resource Requirements - Resource sizing