Explanation

Harbor Container Registry Integration¶

Type: Explanation (Understanding-oriented)

Related Concepts: Architecture Overview | GitLab Components

Overview¶

GitLab BDA uses Harbor v2.14.0 as a separate container registry instead of GitLab’s built-in registry. This document explains WHY we chose Harbor, HOW it integrates with GitLab, and the architectural decisions behind this setup.

Key benefits:

OAuth authentication via GitLab (single sign-on, unified user management)
Vulnerability scanning capability (Trivy integration available)
Independent lifecycle (upgrade Harbor without touching GitLab)
AMD64 architecture (official images not ARM64-compatible)

Why Harbor Instead of GitLab Registry?¶

The Problem with GitLab’s Built-in Registry¶

GitLab includes a built-in container registry (enabled via Helm chart) - so why use Harbor?

Feature	GitLab Built-in Registry	Harbor Registry
Authentication	GitLab-managed (automatic)	OAuth via GitLab (SSO)
Vulnerability scanning	Requires GitLab Ultimate (paid)	Built-in Trivy support (free)
UI	Basic (list images, delete)	Rich web UI (projects, tags, labels, replication)
Lifecycle	Tied to GitLab upgrade	Independent (upgrade separately)
ARM64 support	Yes (multi-arch)	No (AMD64 only)
RBAC	GitLab project permissions	Separate project RBAC
Image replication	No	Yes (multi-region, DR)
Webhook notifications	Limited	Comprehensive (push, pull, scan complete)

Decision Rationale¶

Harbor chosen for these reasons:

Vulnerability scanning - Free tier includes Trivy scanner (vs. GitLab Ultimate requirement)
Separate concerns - Registry lifecycle independent of GitLab platform lifecycle
Better observability - Dedicated UI for image management (not buried in GitLab UI)
Future-proof - Can add features (replication, RBAC) without GitLab dependency

Trade-offs accepted:

AMD64-only - Harbor official images don’t support ARM64 (pods run on AMD64 nodes)
Auth complexity - Requires OAuth setup (vs. automatic GitLab integration)
Shared database - Harbor uses harbor database in GitLab PostgreSQL cluster (simpler than separate cluster)

For 2-5 users, Harbor’s benefits (vulnerability scanning, better UI) outweigh the complexity overhead.

Harbor Architecture¶

Components¶

Harbor is itself a microservices application:

Component	Role	Port	Dependencies
Core	API server, webhook handler	8080	PostgreSQL, Redis
Registry	OCI image storage (Docker Registry v2)	5000	S3, Redis (cache)
JobService	Async jobs (scanning, garbage collection, replication)	8080	Redis
Portal	Web UI (nginx + Angular)	8080	Core API

Missing from GitLab BDA (future additions):

Trivy Scanner - Vulnerability scanning (not yet deployed)
Notary - Image signing (not needed for 2-5 users)
ChartMuseum - Helm chart storage (not needed, using git for charts)

Deployment Topology¶

User (docker push/pull)
      ↓
Traefik Ingress (registry.staging.bluedynamics.eu)
      ↓
Harbor Portal (nginx) → Harbor Core (API) → Harbor Registry (OCI storage)
                              ↓                       ↓
                        PostgreSQL              S3 (registry bucket)
                          (harbor DB)           (registry-gitlabbda-kup6s)
                              ↓
                        Redis (cache DB 2)

Key architectural decisions:

Shared PostgreSQL - Harbor uses harbor database in GitLab CNPG cluster
- Why: Simpler than separate database cluster (adequate for 2-5 users)
- Trade-off: Harbor and GitLab share database resources
- Future: Separate CNPG cluster for Harbor when scaling beyond 20 users
Shared Redis - Harbor uses database 2 in GitLab Redis instance
- Why: Redis lightweight (cache only, no critical data)
- Trade-off: Redis restart affects both GitLab and Harbor
- Future: Separate Redis for Harbor when scaling
S3 object storage - Images stored in registry-gitlabbda-kup6s bucket
- Why: Scalable, no local disk needed
- Benefit: Can serve images directly from S3 (future CDN integration)

For storage details, see Storage Architecture.

OAuth Integration with GitLab¶

How It Works¶

Harbor uses OpenID Connect (OIDC) to authenticate users via GitLab:

1. User → https://registry.staging.bluedynamics.eu (Harbor UI)

2. Harbor → Redirect to GitLab OAuth endpoint
   URL: https://gitlab.staging.bluedynamics.eu/oauth/authorize
   Params: client_id, redirect_uri, scope=openid profile email

3. User → Login to GitLab (if not already logged in)
   GitLab → Show OAuth consent (first time only)
   User → Click "Authorize"

4. GitLab → Redirect back to Harbor with authorization code
   URL: https://registry.staging.bluedynamics.eu/c/oidc/callback?code=...

5. Harbor → Exchange code for tokens (access token, ID token)
   POST https://gitlab.staging.bluedynamics.eu/oauth/token

6. GitLab → Return tokens with user info (email, name, username)

7. Harbor → Create/update Harbor user account
   - Email from GitLab becomes Harbor username
   - User role defaults to "developer" (can push/pull images)

8. Harbor → Return session cookie to user

9. User → Access Harbor UI (logged in)

Key benefits:

Single sign-on - No separate Harbor password (use GitLab credentials)
Unified user management - Add user to GitLab → automatic Harbor access
Token-based API access - GitLab personal access tokens work for docker login

OAuth Configuration¶

GitLab side (OAuth Application):

# Created via GitLab UI: Admin → Applications → New Application
Name: Harbor Registry
Redirect URI: https://registry.staging.bluedynamics.eu/c/oidc/callback
Scopes: openid, profile, email

Secrets stored in application-secrets namespace:

gitlab-oauth-client-id - Application ID from GitLab
gitlab-oauth-client-secret - Secret from GitLab
Replicated to gitlabbda namespace via ESO (ExternalSecret)

Harbor side (OIDC configuration):

# Environment variables in harbor-core deployment
AUTH_MODE: oidc_auth
OIDC_NAME: gitlab
OIDC_ENDPOINT: https://gitlab.staging.bluedynamics.eu
OIDC_CLIENT_ID: <from secret>
OIDC_CLIENT_SECRET: <from secret>
OIDC_SCOPE: openid,profile,email

User experience:

First time: User clicks “Login via GitLab” → OAuth consent → Harbor account created
Subsequent: User clicks “Login via GitLab” → Instant login (no consent)

For secrets configuration, see Secrets Reference.

Docker Login Flow¶

CLI authentication:

# Option 1: GitLab personal access token
export GITLAB_TOKEN=glpat-xxxxxxxxxxxxxxxxxxxx
echo $GITLAB_TOKEN | docker login registry.staging.bluedynamics.eu -u gitlab-token --password-stdin

# Option 2: Harbor robot account (future, for CI/CD)
docker login registry.staging.bluedynamics.eu -u robot$myapp -p <robot-token>

How docker login works with OAuth:

1. docker login → Send credentials to Harbor

2. Harbor → Validate via GitLab OAuth token endpoint
   POST https://gitlab.staging.bluedynamics.eu/oauth/token
   Body: grant_type=password, username=user@example.com, password=<gitlab-token>

3. GitLab → Return access token if valid

4. Harbor → Create session, return "Login Succeeded"

5. docker push → Include token in Authorization header
   Harbor → Validate token → Store image in S3

Why this works:

Docker Registry v2 API supports Basic Auth (username/password)
Harbor translates Basic Auth to OAuth token validation
GitLab personal access tokens work as passwords (secure, revocable)

AMD64 Architecture Requirement¶

Why AMD64-Only?¶

Harbor official images are AMD64-only (no ARM64 multi-arch builds):

# Check image manifest
docker manifest inspect goharbor/harbor-core:v2.14.0
# Architectures: [amd64]
# Missing: arm64

Implication: Harbor pods must run on AMD64 nodes.

KUP6S cluster is multi-architecture (3 ARM64 nodes + 2 AMD64 nodes), so we use nodeSelector + tolerations:

# All Harbor pods
spec:
  nodeSelector:
    kubernetes.io/arch: amd64
  tolerations:
    - key: kubernetes.io/arch
      operator: Equal
      value: amd64
      effect: NoSchedule

Why nodeSelector + tolerations?

KUP6S cluster has taints on AMD64 nodes to prefer ARM64 scheduling:

# AMD64 nodes have taint
taints:
  - key: kubernetes.io/arch
    value: amd64
    effect: NoSchedule

Without tolerations: Harbor pods would be Pending (AMD64 node rejects scheduling)

With tolerations: Harbor pods can schedule on AMD64 nodes (taint tolerated)

For cluster architecture, see Main Cluster Docs: Multi-Architecture.

Node Placement in Practice¶

GitLab BDA workload distribution:

Workload	Architecture	Rationale
GitLab pods (webservice, gitaly, sidekiq, shell, pages)	ARM64 preferred	Official images are multi-arch
Harbor pods (core, registry, jobservice, portal)	AMD64 required	No ARM64 images
PostgreSQL (CNPG)	ARM64 preferred	PostgreSQL supports multi-arch
Redis	ARM64 preferred	Redis supports multi-arch

Resource allocation:

ARM64 nodes (3 nodes): GitLab + PostgreSQL + Redis (majority of pods)
AMD64 nodes (2 nodes): Harbor + other AMD64-only workloads

Scaling consideration: If Harbor becomes bottleneck, add AMD64 nodes (not ARM64).

Vulnerability Scanning¶

Trivy Integration (Available, Not Yet Deployed)¶

Harbor supports Trivy vulnerability scanning, but it’s not currently deployed in GitLab BDA.

How Trivy scanning would work:

1. docker push registry.staging.bluedynamics.eu/project/image:tag
   → Harbor receives image, stores in S3

2. Harbor → Trigger scan job (JobService)
   → Launch Trivy scanner pod

3. Trivy → Pull image from Registry
   → Scan image layers for known vulnerabilities
   → Check against CVE databases (Alpine, Debian, npm, etc.)

4. Trivy → Report vulnerabilities to Harbor Core
   → Store scan results in PostgreSQL

5. Harbor UI → Show scan results
   → Critical: 3, High: 12, Medium: 45, Low: 120
   → Block deployment if critical vulnerabilities (policy)

6. Webhook → Notify external systems
   → Send scan results to GitLab (issue creation)
   → Send to Slack (alert on critical CVEs)

Future deployment:

When implementing Trivy scanner, add these components:

# Trivy scanner deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: harbor-trivy
spec:
  replicas: 1
  template:
    spec:
      nodeSelector:
        kubernetes.io/arch: amd64  # Trivy also AMD64-only
      containers:
        - name: trivy
          image: goharbor/trivy-adapter-photon:v2.14.0
          env:
            - name: SCANNER_TRIVY_VULN_TYPE
              value: os,library
            - name: SCANNER_TRIVY_SEVERITY
              value: UNKNOWN,LOW,MEDIUM,HIGH,CRITICAL

Why not deployed yet?

2-5 users - Manual image review sufficient (scan before deploy)
Resource cost - Trivy databases large (1-2GB), scans CPU-intensive
Complexity - Additional deployment to maintain

When to deploy Trivy:

Compliance requirements - PCI-DSS, SOC2 require automated scanning
Scaling beyond 10 users - Too many images to manually review
CI/CD integration - Block deploys with critical CVEs

For future improvements, see Storage Architecture: Future Improvements.

Manual Scanning Workaround (Current Approach)¶

Without Trivy deployment, scan images manually:

# Scan image locally before push
trivy image myapp:latest
# Output: Vulnerabilities by severity

# Fix vulnerabilities
# Update base image: alpine:3.18 → alpine:3.19
# Update dependencies: npm audit fix

# Re-scan
trivy image myapp:latest
# Output: 0 critical, 0 high (safe to deploy)

# Push to Harbor
docker push registry.staging.bluedynamics.eu/project/myapp:latest

Trade-off: Manual process (slower, error-prone) vs. automated scanning overhead.

Harbor Storage Architecture¶

S3 Backend¶

Harbor stores all image layers in S3 (not local disk):

Bucket: registry-gitlabbda-kup6s (fsn1 region)

Why S3?

Scalability - No disk space limits (grows on demand)
Cost - €0.01/GB/month (much cheaper than block storage)
Durability - Hetzner manages redundancy (multi-AZ replication)
Future CDN - Can front S3 with Cloudflare (global image pulls)

Image push flow:

1. docker push registry.staging.bluedynamics.eu/project/myapp:tag

2. Client → Upload layers to Harbor Registry (HTTP POST)

3. Harbor Registry → Write layers directly to S3
   Bucket: registry-gitlabbda-kup6s
   Path: /docker/registry/v2/blobs/sha256/<hash>

4. Harbor Core → Record manifest in PostgreSQL
   Table: artifact (image metadata)
   Table: artifact_blob (layer references)

5. Harbor → Return success to client

Image pull flow:

1. docker pull registry.staging.bluedynamics.eu/project/myapp:tag

2. Client → Request manifest from Harbor Core

3. Harbor Core → Query PostgreSQL (artifact + artifact_blob tables)
   → Return manifest (list of layer hashes)

4. Client → Request each layer from Harbor Registry

5. Harbor Registry → Stream layer from S3
   S3 GET /docker/registry/v2/blobs/sha256/<hash>
   → Stream to client (no local disk buffering)

6. Client → Assemble image from layers

Performance:

Pull: 50-200 MB/s (limited by S3 → client bandwidth)
Push: 10-100 MB/s (limited by client → S3 upload speed)
Concurrent pulls: Unlimited (S3 scales horizontally)

For S3 configuration, see S3 Buckets Reference.

Redis Caching¶

Harbor uses Redis database 2 for caching:

Cache types:

Blob descriptor cache - Layer metadata (size, digest)
Manifest cache - Image manifests (avoid PostgreSQL queries)
Tag cache - Image tags → manifest mapping

Why Redis cache?

Performance - Manifest queries 10-100× faster (1ms vs 50ms PostgreSQL)
Reduced DB load - Frequent docker pull doesn’t hit PostgreSQL
Ephemeral - Cache can be cleared without data loss (rebuilds from PostgreSQL + S3)

Redis configuration:

# Harbor Registry config (ConfigMap: harbor-registry-config)
redis:
  addr: redis:6379
  db: 2  # Separate database from GitLab (DB 0)
  pool:
    maxidle: 100
    maxactive: 500

Why database 2?

Isolation - GitLab uses DB 0, Harbor uses DB 2 (no key conflicts)
Observability - Can monitor Harbor cache separately (INFO keyspace)

For Redis architecture, see GitLab Components: Redis.

Harbor Web UI¶

Features¶

Harbor UI provides:

Project management - Create projects (image repositories)
Image browsing - View tags, layers, size, push time
Tag management - Delete old tags, add labels
User management - Invite users, assign roles (admin, developer, guest)
Replication - (Future) Replicate images to other registries
Vulnerability reports - (Future with Trivy) View scan results per image
Webhooks - Configure notifications (image push/pull events)

Access: https://registry.staging.bluedynamics.eu

Authentication: Login via GitLab OAuth (single sign-on)

Workflow Example¶

Creating a project and pushing images:

# 1. Create project via UI
# Navigate to: https://registry.staging.bluedynamics.eu
# → Projects → New Project → Name: myapp → Public/Private → Create

# 2. Tag image locally
docker tag myapp:latest registry.staging.bluedynamics.eu/myapp/myapp:latest

# 3. Login to registry
echo $GITLAB_TOKEN | docker login registry.staging.bluedynamics.eu -u gitlab-token --password-stdin

# 4. Push image
docker push registry.staging.bluedynamics.eu/myapp/myapp:latest

# 5. View in UI
# Navigate to: Projects → myapp → Repositories → myapp
# See: Tags, size, layers, push time

RBAC (Role-Based Access Control):

Role	Can View	Can Pull	Can Push	Can Delete	Can Manage Users
Guest	✅	❌	❌	❌	❌
Developer	✅	✅	✅	❌	❌
Maintainer	✅	✅	✅	✅	❌
Admin	✅	✅	✅	✅	✅

Default role: Developer (users from GitLab OAuth automatically get developer role)

Lifecycle Management¶

Harbor Upgrades¶

Harbor lifecycle is independent of GitLab:

# Upgrade Harbor (e.g., v2.14.0 → v2.15.0)

# 1. Update version in config.yaml
versions:
  harbor: v2.15.0

# 2. Rebuild manifests
npm run build

# 3. Commit and push
git add . && git commit -m "Upgrade Harbor to v2.15.0" && git push

# 4. ArgoCD auto-syncs
# Harbor pods rolling update (Core → Registry → JobService → Portal)

# 5. Verify
kubectl get pods -n gitlabbda -l app.kubernetes.io/part-of=harbor
# All pods Running with new version

Why independent lifecycle?

Risk isolation - Harbor upgrade doesn’t touch GitLab (separate failure domain)
Flexibility - Can upgrade Harbor for security fix without GitLab maintenance window
Rollback - Harbor rollback doesn’t affect GitLab availability

For upgrade procedures, see How-To: Upgrade Harbor (future).

Database Schema Migrations¶

Harbor uses automatic schema migrations (like GitLab):

1. New Harbor Core version deployed
   → Init container runs migrations

2. Harbor Core → Check PostgreSQL schema_migrations table
   → Compare with required version for v2.15.0

3. Migrations needed → Run SQL migration scripts
   → Add new tables (e.g., p2p_preheat_instance)
   → Add columns (e.g., artifact.icon)

4. Migrations complete → Harbor Core starts

5. Old Harbor Core pods → Drain connections, terminate

Migration safety:

Backward-compatible - Old version can read new schema (during rolling update)
Idempotent - Re-running migration has no effect (already applied)
Automatic - No manual SQL scripts (built into Harbor image)

Rollback limitation: Can’t rollback to version before schema change (one-way migration).

For database management, see GitLab Components: PostgreSQL.

Troubleshooting¶

Harbor Core Pod CrashLoopBackOff¶

Symptoms:

Harbor UI inaccessible (502 Bad Gateway)
harbor-core pod repeatedly crashing

Diagnosis:

kubectl logs deploy/harbor-core -n gitlabbda | tail -n 50

Common causes:

PostgreSQL connection failure - Check gitlab-postgres-pooler service
```
kubectl get svc gitlab-postgres-pooler -n gitlabbda
```

Harbor database not initialized - Check if harbor database exists

kubectl exec -it gitlab-postgres-1 -n gitlabbda -- psql -U postgres -l | grep harbor

OAuth misconfiguration - Check harbor-secrets secret

kubectl get secret harbor-secrets -n gitlabbda -o jsonpath='{.data.gitlab-oauth-client-id}' | base64 -d

Solution: Fix dependency, Harbor Core will restart automatically.

Image Push Fails (S3 Error)¶

Symptoms:

docker push hangs or fails with “blob upload invalid”
Harbor Registry logs show S3 errors

Diagnosis:

kubectl logs deploy/harbor-registry -n gitlabbda | grep -i s3

Common causes:

S3 credentials invalid - Check harbor-s3-credentials secret

S3 bucket doesn’t exist - Check Crossplane Bucket CR

kubectl get bucket registry-gitlabbda-kup6s -n crossplane-system

S3 endpoint unreachable - Check network connectivity

kubectl exec -it deploy/harbor-registry -n gitlabbda -- wget -O- https://fsn1.your-objectstorage.com

Solution: Fix S3 configuration, restart Harbor Registry deployment.

For complete troubleshooting, see Troubleshooting Reference.

Future Improvements¶

Short-term (Next 6 months)¶

Trivy scanner deployment - Enable automated vulnerability scanning
Harbor robot accounts - CI/CD authentication (no personal tokens)
Image retention policies - Auto-delete old tags (save S3 costs)

Medium-term (6-12 months)¶

Separate PostgreSQL cluster - Dedicated CNPG cluster for Harbor (when scaling)
Redis Sentinel for Harbor - HA cache (when scaling)
Content Trust (Notary) - Image signing and verification (security)

Long-term (12+ months)¶

Multi-region replication - Replicate images to hel1/nbg1 (DR)
CDN integration - Cloudflare in front of S3 (global image pulls)
Cosign integration - Modern image signing (vs. Notary)

Summary¶

Harbor integration provides:

OAuth SSO - Single sign-on via GitLab (unified user management)
Vulnerability scanning capability - Trivy integration available (deploy when needed)
Independent lifecycle - Upgrade Harbor without touching GitLab
Better UI - Dedicated registry management interface
S3 storage - Scalable, cost-effective image storage

Architectural decisions:

AMD64-only - Harbor official images not ARM64-compatible (use nodeSelector + tolerations)
Shared PostgreSQL - Harbor uses harbor database in GitLab CNPG cluster (adequate for 2-5 users)
Shared Redis - Harbor uses DB 2 in GitLab Redis instance (cache only, ephemeral)

Trade-offs:

Complexity - Separate deployment vs. built-in GitLab registry
Architecture constraint - Requires AMD64 nodes in cluster
Shared resources - Harbor and GitLab share PostgreSQL/Redis (acceptable for small scale)

For implementation details:

Constructs API Reference - Harbor deployment specification
Secrets Reference - OAuth configuration
S3 Buckets Reference - Registry bucket details