Explanation
Harbor Container Registry Integration¶
Overview¶
GitLab BDA uses Harbor v2.14.0 as a separate container registry instead of GitLab’s built-in registry. This document explains WHY we chose Harbor, HOW it integrates with GitLab, and the architectural decisions behind this setup.
Key benefits:
OAuth authentication via GitLab (single sign-on, unified user management)
Vulnerability scanning capability (Trivy integration available)
Independent lifecycle (upgrade Harbor without touching GitLab)
AMD64 architecture (official images not ARM64-compatible)
Why Harbor Instead of GitLab Registry?¶
The Problem with GitLab’s Built-in Registry¶
GitLab includes a built-in container registry (enabled via Helm chart) - so why use Harbor?
Feature |
GitLab Built-in Registry |
Harbor Registry |
|---|---|---|
Authentication |
GitLab-managed (automatic) |
OAuth via GitLab (SSO) |
Vulnerability scanning |
Requires GitLab Ultimate (paid) |
Built-in Trivy support (free) |
UI |
Basic (list images, delete) |
Rich web UI (projects, tags, labels, replication) |
Lifecycle |
Tied to GitLab upgrade |
Independent (upgrade separately) |
ARM64 support |
Yes (multi-arch) |
No (AMD64 only) |
RBAC |
GitLab project permissions |
Separate project RBAC |
Image replication |
No |
Yes (multi-region, DR) |
Webhook notifications |
Limited |
Comprehensive (push, pull, scan complete) |
Decision Rationale¶
Harbor chosen for these reasons:
Vulnerability scanning - Free tier includes Trivy scanner (vs. GitLab Ultimate requirement)
Separate concerns - Registry lifecycle independent of GitLab platform lifecycle
Better observability - Dedicated UI for image management (not buried in GitLab UI)
Future-proof - Can add features (replication, RBAC) without GitLab dependency
Trade-offs accepted:
AMD64-only - Harbor official images don’t support ARM64 (pods run on AMD64 nodes)
Auth complexity - Requires OAuth setup (vs. automatic GitLab integration)
Shared database - Harbor uses
harbordatabase in GitLab PostgreSQL cluster (simpler than separate cluster)
For 2-5 users, Harbor’s benefits (vulnerability scanning, better UI) outweigh the complexity overhead.
Harbor Architecture¶
Components¶
Harbor is itself a microservices application:
Component |
Role |
Port |
Dependencies |
|---|---|---|---|
Core |
API server, webhook handler |
8080 |
PostgreSQL, Redis |
Registry |
OCI image storage (Docker Registry v2) |
5000 |
S3, Redis (cache) |
JobService |
Async jobs (scanning, garbage collection, replication) |
8080 |
Redis |
Portal |
Web UI (nginx + Angular) |
8080 |
Core API |
Missing from GitLab BDA (future additions):
Trivy Scanner - Vulnerability scanning (not yet deployed)
Notary - Image signing (not needed for 2-5 users)
ChartMuseum - Helm chart storage (not needed, using git for charts)
Deployment Topology¶
User (docker push/pull)
↓
Traefik Ingress (registry.staging.bluedynamics.eu)
↓
Harbor Portal (nginx) → Harbor Core (API) → Harbor Registry (OCI storage)
↓ ↓
PostgreSQL S3 (registry bucket)
(harbor DB) (registry-gitlabbda-kup6s)
↓
Redis (cache DB 2)
Key architectural decisions:
Shared PostgreSQL - Harbor uses
harbordatabase in GitLab CNPG clusterWhy: Simpler than separate database cluster (adequate for 2-5 users)
Trade-off: Harbor and GitLab share database resources
Future: Separate CNPG cluster for Harbor when scaling beyond 20 users
Shared Redis - Harbor uses database 2 in GitLab Redis instance
Why: Redis lightweight (cache only, no critical data)
Trade-off: Redis restart affects both GitLab and Harbor
Future: Separate Redis for Harbor when scaling
S3 object storage - Images stored in
registry-gitlabbda-kup6sbucketWhy: Scalable, no local disk needed
Benefit: Can serve images directly from S3 (future CDN integration)
For storage details, see Storage Architecture.
OAuth Integration with GitLab¶
How It Works¶
Harbor uses OpenID Connect (OIDC) to authenticate users via GitLab:
1. User → https://registry.staging.bluedynamics.eu (Harbor UI)
2. Harbor → Redirect to GitLab OAuth endpoint
URL: https://gitlab.staging.bluedynamics.eu/oauth/authorize
Params: client_id, redirect_uri, scope=openid profile email
3. User → Login to GitLab (if not already logged in)
GitLab → Show OAuth consent (first time only)
User → Click "Authorize"
4. GitLab → Redirect back to Harbor with authorization code
URL: https://registry.staging.bluedynamics.eu/c/oidc/callback?code=...
5. Harbor → Exchange code for tokens (access token, ID token)
POST https://gitlab.staging.bluedynamics.eu/oauth/token
6. GitLab → Return tokens with user info (email, name, username)
7. Harbor → Create/update Harbor user account
- Email from GitLab becomes Harbor username
- User role defaults to "developer" (can push/pull images)
8. Harbor → Return session cookie to user
9. User → Access Harbor UI (logged in)
Key benefits:
Single sign-on - No separate Harbor password (use GitLab credentials)
Unified user management - Add user to GitLab → automatic Harbor access
Token-based API access - GitLab personal access tokens work for
docker login
OAuth Configuration¶
GitLab side (OAuth Application):
# Created via GitLab UI: Admin → Applications → New Application
Name: Harbor Registry
Redirect URI: https://registry.staging.bluedynamics.eu/c/oidc/callback
Scopes: openid, profile, email
Secrets stored in application-secrets namespace:
gitlab-oauth-client-id- Application ID from GitLabgitlab-oauth-client-secret- Secret from GitLabReplicated to
gitlabbdanamespace via ESO (ExternalSecret)
Harbor side (OIDC configuration):
# Environment variables in harbor-core deployment
AUTH_MODE: oidc_auth
OIDC_NAME: gitlab
OIDC_ENDPOINT: https://gitlab.staging.bluedynamics.eu
OIDC_CLIENT_ID: <from secret>
OIDC_CLIENT_SECRET: <from secret>
OIDC_SCOPE: openid,profile,email
User experience:
First time: User clicks “Login via GitLab” → OAuth consent → Harbor account created
Subsequent: User clicks “Login via GitLab” → Instant login (no consent)
For secrets configuration, see Secrets Reference.
Docker Login Flow¶
CLI authentication:
# Option 1: GitLab personal access token
export GITLAB_TOKEN=glpat-xxxxxxxxxxxxxxxxxxxx
echo $GITLAB_TOKEN | docker login registry.staging.bluedynamics.eu -u gitlab-token --password-stdin
# Option 2: Harbor robot account (future, for CI/CD)
docker login registry.staging.bluedynamics.eu -u robot$myapp -p <robot-token>
How docker login works with OAuth:
1. docker login → Send credentials to Harbor
2. Harbor → Validate via GitLab OAuth token endpoint
POST https://gitlab.staging.bluedynamics.eu/oauth/token
Body: grant_type=password, username=user@example.com, password=<gitlab-token>
3. GitLab → Return access token if valid
4. Harbor → Create session, return "Login Succeeded"
5. docker push → Include token in Authorization header
Harbor → Validate token → Store image in S3
Why this works:
Docker Registry v2 API supports Basic Auth (username/password)
Harbor translates Basic Auth to OAuth token validation
GitLab personal access tokens work as passwords (secure, revocable)
AMD64 Architecture Requirement¶
Why AMD64-Only?¶
Harbor official images are AMD64-only (no ARM64 multi-arch builds):
# Check image manifest
docker manifest inspect goharbor/harbor-core:v2.14.0
# Architectures: [amd64]
# Missing: arm64
Implication: Harbor pods must run on AMD64 nodes.
KUP6S cluster is multi-architecture (3 ARM64 nodes + 2 AMD64 nodes), so we use nodeSelector + tolerations:
# All Harbor pods
spec:
nodeSelector:
kubernetes.io/arch: amd64
tolerations:
- key: kubernetes.io/arch
operator: Equal
value: amd64
effect: NoSchedule
Why nodeSelector + tolerations?
KUP6S cluster has taints on AMD64 nodes to prefer ARM64 scheduling:
# AMD64 nodes have taint
taints:
- key: kubernetes.io/arch
value: amd64
effect: NoSchedule
Without tolerations: Harbor pods would be Pending (AMD64 node rejects scheduling)
With tolerations: Harbor pods can schedule on AMD64 nodes (taint tolerated)
For cluster architecture, see Main Cluster Docs: Multi-Architecture.
Node Placement in Practice¶
GitLab BDA workload distribution:
Workload |
Architecture |
Rationale |
|---|---|---|
GitLab pods (webservice, gitaly, sidekiq, shell, pages) |
ARM64 preferred |
Official images are multi-arch |
Harbor pods (core, registry, jobservice, portal) |
AMD64 required |
No ARM64 images |
PostgreSQL (CNPG) |
ARM64 preferred |
PostgreSQL supports multi-arch |
Redis |
ARM64 preferred |
Redis supports multi-arch |
Resource allocation:
ARM64 nodes (3 nodes): GitLab + PostgreSQL + Redis (majority of pods)
AMD64 nodes (2 nodes): Harbor + other AMD64-only workloads
Scaling consideration: If Harbor becomes bottleneck, add AMD64 nodes (not ARM64).
Vulnerability Scanning¶
Trivy Integration (Available, Not Yet Deployed)¶
Harbor supports Trivy vulnerability scanning, but it’s not currently deployed in GitLab BDA.
How Trivy scanning would work:
1. docker push registry.staging.bluedynamics.eu/project/image:tag
→ Harbor receives image, stores in S3
2. Harbor → Trigger scan job (JobService)
→ Launch Trivy scanner pod
3. Trivy → Pull image from Registry
→ Scan image layers for known vulnerabilities
→ Check against CVE databases (Alpine, Debian, npm, etc.)
4. Trivy → Report vulnerabilities to Harbor Core
→ Store scan results in PostgreSQL
5. Harbor UI → Show scan results
→ Critical: 3, High: 12, Medium: 45, Low: 120
→ Block deployment if critical vulnerabilities (policy)
6. Webhook → Notify external systems
→ Send scan results to GitLab (issue creation)
→ Send to Slack (alert on critical CVEs)
Future deployment:
When implementing Trivy scanner, add these components:
# Trivy scanner deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: harbor-trivy
spec:
replicas: 1
template:
spec:
nodeSelector:
kubernetes.io/arch: amd64 # Trivy also AMD64-only
containers:
- name: trivy
image: goharbor/trivy-adapter-photon:v2.14.0
env:
- name: SCANNER_TRIVY_VULN_TYPE
value: os,library
- name: SCANNER_TRIVY_SEVERITY
value: UNKNOWN,LOW,MEDIUM,HIGH,CRITICAL
Why not deployed yet?
2-5 users - Manual image review sufficient (scan before deploy)
Resource cost - Trivy databases large (1-2GB), scans CPU-intensive
Complexity - Additional deployment to maintain
When to deploy Trivy:
Compliance requirements - PCI-DSS, SOC2 require automated scanning
Scaling beyond 10 users - Too many images to manually review
CI/CD integration - Block deploys with critical CVEs
For future improvements, see Storage Architecture: Future Improvements.
Manual Scanning Workaround (Current Approach)¶
Without Trivy deployment, scan images manually:
# Scan image locally before push
trivy image myapp:latest
# Output: Vulnerabilities by severity
# Fix vulnerabilities
# Update base image: alpine:3.18 → alpine:3.19
# Update dependencies: npm audit fix
# Re-scan
trivy image myapp:latest
# Output: 0 critical, 0 high (safe to deploy)
# Push to Harbor
docker push registry.staging.bluedynamics.eu/project/myapp:latest
Trade-off: Manual process (slower, error-prone) vs. automated scanning overhead.
Harbor Storage Architecture¶
S3 Backend¶
Harbor stores all image layers in S3 (not local disk):
Bucket: registry-gitlabbda-kup6s (fsn1 region)
Why S3?
Scalability - No disk space limits (grows on demand)
Cost - €0.01/GB/month (much cheaper than block storage)
Durability - Hetzner manages redundancy (multi-AZ replication)
Future CDN - Can front S3 with Cloudflare (global image pulls)
Image push flow:
1. docker push registry.staging.bluedynamics.eu/project/myapp:tag
2. Client → Upload layers to Harbor Registry (HTTP POST)
3. Harbor Registry → Write layers directly to S3
Bucket: registry-gitlabbda-kup6s
Path: /docker/registry/v2/blobs/sha256/<hash>
4. Harbor Core → Record manifest in PostgreSQL
Table: artifact (image metadata)
Table: artifact_blob (layer references)
5. Harbor → Return success to client
Image pull flow:
1. docker pull registry.staging.bluedynamics.eu/project/myapp:tag
2. Client → Request manifest from Harbor Core
3. Harbor Core → Query PostgreSQL (artifact + artifact_blob tables)
→ Return manifest (list of layer hashes)
4. Client → Request each layer from Harbor Registry
5. Harbor Registry → Stream layer from S3
S3 GET /docker/registry/v2/blobs/sha256/<hash>
→ Stream to client (no local disk buffering)
6. Client → Assemble image from layers
Performance:
Pull: 50-200 MB/s (limited by S3 → client bandwidth)
Push: 10-100 MB/s (limited by client → S3 upload speed)
Concurrent pulls: Unlimited (S3 scales horizontally)
For S3 configuration, see S3 Buckets Reference.
Redis Caching¶
Harbor uses Redis database 2 for caching:
Cache types:
Blob descriptor cache - Layer metadata (size, digest)
Manifest cache - Image manifests (avoid PostgreSQL queries)
Tag cache - Image tags → manifest mapping
Why Redis cache?
Performance - Manifest queries 10-100× faster (1ms vs 50ms PostgreSQL)
Reduced DB load - Frequent
docker pulldoesn’t hit PostgreSQLEphemeral - Cache can be cleared without data loss (rebuilds from PostgreSQL + S3)
Redis configuration:
# Harbor Registry config (ConfigMap: harbor-registry-config)
redis:
addr: redis:6379
db: 2 # Separate database from GitLab (DB 0)
pool:
maxidle: 100
maxactive: 500
Why database 2?
Isolation - GitLab uses DB 0, Harbor uses DB 2 (no key conflicts)
Observability - Can monitor Harbor cache separately (
INFO keyspace)
For Redis architecture, see GitLab Components: Redis.
Harbor Web UI¶
Features¶
Harbor UI provides:
Project management - Create projects (image repositories)
Image browsing - View tags, layers, size, push time
Tag management - Delete old tags, add labels
User management - Invite users, assign roles (admin, developer, guest)
Replication - (Future) Replicate images to other registries
Vulnerability reports - (Future with Trivy) View scan results per image
Webhooks - Configure notifications (image push/pull events)
Access: https://registry.staging.bluedynamics.eu
Authentication: Login via GitLab OAuth (single sign-on)
Workflow Example¶
Creating a project and pushing images:
# 1. Create project via UI
# Navigate to: https://registry.staging.bluedynamics.eu
# → Projects → New Project → Name: myapp → Public/Private → Create
# 2. Tag image locally
docker tag myapp:latest registry.staging.bluedynamics.eu/myapp/myapp:latest
# 3. Login to registry
echo $GITLAB_TOKEN | docker login registry.staging.bluedynamics.eu -u gitlab-token --password-stdin
# 4. Push image
docker push registry.staging.bluedynamics.eu/myapp/myapp:latest
# 5. View in UI
# Navigate to: Projects → myapp → Repositories → myapp
# See: Tags, size, layers, push time
RBAC (Role-Based Access Control):
Role |
Can View |
Can Pull |
Can Push |
Can Delete |
Can Manage Users |
|---|---|---|---|---|---|
Guest |
✅ |
❌ |
❌ |
❌ |
❌ |
Developer |
✅ |
✅ |
✅ |
❌ |
❌ |
Maintainer |
✅ |
✅ |
✅ |
✅ |
❌ |
Admin |
✅ |
✅ |
✅ |
✅ |
✅ |
Default role: Developer (users from GitLab OAuth automatically get developer role)
Lifecycle Management¶
Harbor Upgrades¶
Harbor lifecycle is independent of GitLab:
# Upgrade Harbor (e.g., v2.14.0 → v2.15.0)
# 1. Update version in config.yaml
versions:
harbor: v2.15.0
# 2. Rebuild manifests
npm run build
# 3. Commit and push
git add . && git commit -m "Upgrade Harbor to v2.15.0" && git push
# 4. ArgoCD auto-syncs
# Harbor pods rolling update (Core → Registry → JobService → Portal)
# 5. Verify
kubectl get pods -n gitlabbda -l app.kubernetes.io/part-of=harbor
# All pods Running with new version
Why independent lifecycle?
Risk isolation - Harbor upgrade doesn’t touch GitLab (separate failure domain)
Flexibility - Can upgrade Harbor for security fix without GitLab maintenance window
Rollback - Harbor rollback doesn’t affect GitLab availability
For upgrade procedures, see How-To: Upgrade Harbor (future).
Database Schema Migrations¶
Harbor uses automatic schema migrations (like GitLab):
1. New Harbor Core version deployed
→ Init container runs migrations
2. Harbor Core → Check PostgreSQL schema_migrations table
→ Compare with required version for v2.15.0
3. Migrations needed → Run SQL migration scripts
→ Add new tables (e.g., p2p_preheat_instance)
→ Add columns (e.g., artifact.icon)
4. Migrations complete → Harbor Core starts
5. Old Harbor Core pods → Drain connections, terminate
Migration safety:
Backward-compatible - Old version can read new schema (during rolling update)
Idempotent - Re-running migration has no effect (already applied)
Automatic - No manual SQL scripts (built into Harbor image)
Rollback limitation: Can’t rollback to version before schema change (one-way migration).
For database management, see GitLab Components: PostgreSQL.
Troubleshooting¶
Harbor Core Pod CrashLoopBackOff¶
Symptoms:
Harbor UI inaccessible (502 Bad Gateway)
harbor-corepod repeatedly crashing
Diagnosis:
kubectl logs deploy/harbor-core -n gitlabbda | tail -n 50
Common causes:
PostgreSQL connection failure - Check
gitlab-postgres-poolerservicekubectl get svc gitlab-postgres-pooler -n gitlabbdaHarbor database not initialized - Check if
harbordatabase existskubectl exec -it gitlab-postgres-1 -n gitlabbda -- psql -U postgres -l | grep harborOAuth misconfiguration - Check
harbor-secretssecretkubectl get secret harbor-secrets -n gitlabbda -o jsonpath='{.data.gitlab-oauth-client-id}' | base64 -d
Solution: Fix dependency, Harbor Core will restart automatically.
Image Push Fails (S3 Error)¶
Symptoms:
docker pushhangs or fails with “blob upload invalid”Harbor Registry logs show S3 errors
Diagnosis:
kubectl logs deploy/harbor-registry -n gitlabbda | grep -i s3
Common causes:
S3 credentials invalid - Check
harbor-s3-credentialssecretS3 bucket doesn’t exist - Check Crossplane Bucket CR
kubectl get bucket registry-gitlabbda-kup6s -n crossplane-systemS3 endpoint unreachable - Check network connectivity
kubectl exec -it deploy/harbor-registry -n gitlabbda -- wget -O- https://fsn1.your-objectstorage.com
Solution: Fix S3 configuration, restart Harbor Registry deployment.
For complete troubleshooting, see Troubleshooting Reference.
Future Improvements¶
Short-term (Next 6 months)¶
Trivy scanner deployment - Enable automated vulnerability scanning
Harbor robot accounts - CI/CD authentication (no personal tokens)
Image retention policies - Auto-delete old tags (save S3 costs)
Medium-term (6-12 months)¶
Separate PostgreSQL cluster - Dedicated CNPG cluster for Harbor (when scaling)
Redis Sentinel for Harbor - HA cache (when scaling)
Content Trust (Notary) - Image signing and verification (security)
Long-term (12+ months)¶
Multi-region replication - Replicate images to hel1/nbg1 (DR)
CDN integration - Cloudflare in front of S3 (global image pulls)
Cosign integration - Modern image signing (vs. Notary)
Summary¶
Harbor integration provides:
OAuth SSO - Single sign-on via GitLab (unified user management)
Vulnerability scanning capability - Trivy integration available (deploy when needed)
Independent lifecycle - Upgrade Harbor without touching GitLab
Better UI - Dedicated registry management interface
S3 storage - Scalable, cost-effective image storage
Architectural decisions:
AMD64-only - Harbor official images not ARM64-compatible (use nodeSelector + tolerations)
Shared PostgreSQL - Harbor uses
harbordatabase in GitLab CNPG cluster (adequate for 2-5 users)Shared Redis - Harbor uses DB 2 in GitLab Redis instance (cache only, ephemeral)
Trade-offs:
Complexity - Separate deployment vs. built-in GitLab registry
Architecture constraint - Requires AMD64 nodes in cluster
Shared resources - Harbor and GitLab share PostgreSQL/Redis (acceptable for small scale)
For implementation details:
Constructs API Reference - Harbor deployment specification
Secrets Reference - OAuth configuration
S3 Buckets Reference - Registry bucket details