Explanation

Harbor Container Registry Integration


Overview

GitLab BDA uses Harbor v2.14.0 as a separate container registry instead of GitLab’s built-in registry. This document explains WHY we chose Harbor, HOW it integrates with GitLab, and the architectural decisions behind this setup.

Key benefits:

  • OAuth authentication via GitLab (single sign-on, unified user management)

  • Vulnerability scanning capability (Trivy integration available)

  • Independent lifecycle (upgrade Harbor without touching GitLab)

  • AMD64 architecture (official images not ARM64-compatible)


Why Harbor Instead of GitLab Registry?

The Problem with GitLab’s Built-in Registry

GitLab includes a built-in container registry (enabled via Helm chart) - so why use Harbor?

Feature

GitLab Built-in Registry

Harbor Registry

Authentication

GitLab-managed (automatic)

OAuth via GitLab (SSO)

Vulnerability scanning

Requires GitLab Ultimate (paid)

Built-in Trivy support (free)

UI

Basic (list images, delete)

Rich web UI (projects, tags, labels, replication)

Lifecycle

Tied to GitLab upgrade

Independent (upgrade separately)

ARM64 support

Yes (multi-arch)

No (AMD64 only)

RBAC

GitLab project permissions

Separate project RBAC

Image replication

No

Yes (multi-region, DR)

Webhook notifications

Limited

Comprehensive (push, pull, scan complete)

Decision Rationale

Harbor chosen for these reasons:

  1. Vulnerability scanning - Free tier includes Trivy scanner (vs. GitLab Ultimate requirement)

  2. Separate concerns - Registry lifecycle independent of GitLab platform lifecycle

  3. Better observability - Dedicated UI for image management (not buried in GitLab UI)

  4. Future-proof - Can add features (replication, RBAC) without GitLab dependency

Trade-offs accepted:

  1. AMD64-only - Harbor official images don’t support ARM64 (pods run on AMD64 nodes)

  2. Auth complexity - Requires OAuth setup (vs. automatic GitLab integration)

  3. Shared database - Harbor uses harbor database in GitLab PostgreSQL cluster (simpler than separate cluster)

For 2-5 users, Harbor’s benefits (vulnerability scanning, better UI) outweigh the complexity overhead.


Harbor Architecture

Components

Harbor is itself a microservices application:

Component

Role

Port

Dependencies

Core

API server, webhook handler

8080

PostgreSQL, Redis

Registry

OCI image storage (Docker Registry v2)

5000

S3, Redis (cache)

JobService

Async jobs (scanning, garbage collection, replication)

8080

Redis

Portal

Web UI (nginx + Angular)

8080

Core API

Missing from GitLab BDA (future additions):

  • Trivy Scanner - Vulnerability scanning (not yet deployed)

  • Notary - Image signing (not needed for 2-5 users)

  • ChartMuseum - Helm chart storage (not needed, using git for charts)

Deployment Topology

User (docker push/pull)
Traefik Ingress (registry.staging.bluedynamics.eu)
Harbor Portal (nginx) → Harbor Core (API) → Harbor Registry (OCI storage)
                              ↓                       ↓
                        PostgreSQL              S3 (registry bucket)
                          (harbor DB)           (registry-gitlabbda-kup6s)
                        Redis (cache DB 2)

Key architectural decisions:

  1. Shared PostgreSQL - Harbor uses harbor database in GitLab CNPG cluster

    • Why: Simpler than separate database cluster (adequate for 2-5 users)

    • Trade-off: Harbor and GitLab share database resources

    • Future: Separate CNPG cluster for Harbor when scaling beyond 20 users

  2. Shared Redis - Harbor uses database 2 in GitLab Redis instance

    • Why: Redis lightweight (cache only, no critical data)

    • Trade-off: Redis restart affects both GitLab and Harbor

    • Future: Separate Redis for Harbor when scaling

  3. S3 object storage - Images stored in registry-gitlabbda-kup6s bucket

    • Why: Scalable, no local disk needed

    • Benefit: Can serve images directly from S3 (future CDN integration)

For storage details, see Storage Architecture.


OAuth Integration with GitLab

How It Works

Harbor uses OpenID Connect (OIDC) to authenticate users via GitLab:

1. User → https://registry.staging.bluedynamics.eu (Harbor UI)

2. Harbor → Redirect to GitLab OAuth endpoint
   URL: https://gitlab.staging.bluedynamics.eu/oauth/authorize
   Params: client_id, redirect_uri, scope=openid profile email

3. User → Login to GitLab (if not already logged in)
   GitLab → Show OAuth consent (first time only)
   User → Click "Authorize"

4. GitLab → Redirect back to Harbor with authorization code
   URL: https://registry.staging.bluedynamics.eu/c/oidc/callback?code=...

5. Harbor → Exchange code for tokens (access token, ID token)
   POST https://gitlab.staging.bluedynamics.eu/oauth/token

6. GitLab → Return tokens with user info (email, name, username)

7. Harbor → Create/update Harbor user account
   - Email from GitLab becomes Harbor username
   - User role defaults to "developer" (can push/pull images)

8. Harbor → Return session cookie to user

9. User → Access Harbor UI (logged in)

Key benefits:

  • Single sign-on - No separate Harbor password (use GitLab credentials)

  • Unified user management - Add user to GitLab → automatic Harbor access

  • Token-based API access - GitLab personal access tokens work for docker login

OAuth Configuration

GitLab side (OAuth Application):

# Created via GitLab UI: Admin → Applications → New Application
Name: Harbor Registry
Redirect URI: https://registry.staging.bluedynamics.eu/c/oidc/callback
Scopes: openid, profile, email

Secrets stored in application-secrets namespace:

  • gitlab-oauth-client-id - Application ID from GitLab

  • gitlab-oauth-client-secret - Secret from GitLab

  • Replicated to gitlabbda namespace via ESO (ExternalSecret)

Harbor side (OIDC configuration):

# Environment variables in harbor-core deployment
AUTH_MODE: oidc_auth
OIDC_NAME: gitlab
OIDC_ENDPOINT: https://gitlab.staging.bluedynamics.eu
OIDC_CLIENT_ID: <from secret>
OIDC_CLIENT_SECRET: <from secret>
OIDC_SCOPE: openid,profile,email

User experience:

  1. First time: User clicks “Login via GitLab” → OAuth consent → Harbor account created

  2. Subsequent: User clicks “Login via GitLab” → Instant login (no consent)

For secrets configuration, see Secrets Reference.

Docker Login Flow

CLI authentication:

# Option 1: GitLab personal access token
export GITLAB_TOKEN=glpat-xxxxxxxxxxxxxxxxxxxx
echo $GITLAB_TOKEN | docker login registry.staging.bluedynamics.eu -u gitlab-token --password-stdin

# Option 2: Harbor robot account (future, for CI/CD)
docker login registry.staging.bluedynamics.eu -u robot$myapp -p <robot-token>

How docker login works with OAuth:

1. docker login → Send credentials to Harbor

2. Harbor → Validate via GitLab OAuth token endpoint
   POST https://gitlab.staging.bluedynamics.eu/oauth/token
   Body: grant_type=password, username=user@example.com, password=<gitlab-token>

3. GitLab → Return access token if valid

4. Harbor → Create session, return "Login Succeeded"

5. docker push → Include token in Authorization header
   Harbor → Validate token → Store image in S3

Why this works:

  • Docker Registry v2 API supports Basic Auth (username/password)

  • Harbor translates Basic Auth to OAuth token validation

  • GitLab personal access tokens work as passwords (secure, revocable)


AMD64 Architecture Requirement

Why AMD64-Only?

Harbor official images are AMD64-only (no ARM64 multi-arch builds):

# Check image manifest
docker manifest inspect goharbor/harbor-core:v2.14.0
# Architectures: [amd64]
# Missing: arm64

Implication: Harbor pods must run on AMD64 nodes.

KUP6S cluster is multi-architecture (3 ARM64 nodes + 2 AMD64 nodes), so we use nodeSelector + tolerations:

# All Harbor pods
spec:
  nodeSelector:
    kubernetes.io/arch: amd64
  tolerations:
    - key: kubernetes.io/arch
      operator: Equal
      value: amd64
      effect: NoSchedule

Why nodeSelector + tolerations?

KUP6S cluster has taints on AMD64 nodes to prefer ARM64 scheduling:

# AMD64 nodes have taint
taints:
  - key: kubernetes.io/arch
    value: amd64
    effect: NoSchedule

Without tolerations: Harbor pods would be Pending (AMD64 node rejects scheduling)

With tolerations: Harbor pods can schedule on AMD64 nodes (taint tolerated)

For cluster architecture, see Main Cluster Docs: Multi-Architecture.

Node Placement in Practice

GitLab BDA workload distribution:

Workload

Architecture

Rationale

GitLab pods (webservice, gitaly, sidekiq, shell, pages)

ARM64 preferred

Official images are multi-arch

Harbor pods (core, registry, jobservice, portal)

AMD64 required

No ARM64 images

PostgreSQL (CNPG)

ARM64 preferred

PostgreSQL supports multi-arch

Redis

ARM64 preferred

Redis supports multi-arch

Resource allocation:

  • ARM64 nodes (3 nodes): GitLab + PostgreSQL + Redis (majority of pods)

  • AMD64 nodes (2 nodes): Harbor + other AMD64-only workloads

Scaling consideration: If Harbor becomes bottleneck, add AMD64 nodes (not ARM64).


Vulnerability Scanning

Trivy Integration (Available, Not Yet Deployed)

Harbor supports Trivy vulnerability scanning, but it’s not currently deployed in GitLab BDA.

How Trivy scanning would work:

1. docker push registry.staging.bluedynamics.eu/project/image:tag
   → Harbor receives image, stores in S3

2. Harbor → Trigger scan job (JobService)
   → Launch Trivy scanner pod

3. Trivy → Pull image from Registry
   → Scan image layers for known vulnerabilities
   → Check against CVE databases (Alpine, Debian, npm, etc.)

4. Trivy → Report vulnerabilities to Harbor Core
   → Store scan results in PostgreSQL

5. Harbor UI → Show scan results
   → Critical: 3, High: 12, Medium: 45, Low: 120
   → Block deployment if critical vulnerabilities (policy)

6. Webhook → Notify external systems
   → Send scan results to GitLab (issue creation)
   → Send to Slack (alert on critical CVEs)

Future deployment:

When implementing Trivy scanner, add these components:

# Trivy scanner deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: harbor-trivy
spec:
  replicas: 1
  template:
    spec:
      nodeSelector:
        kubernetes.io/arch: amd64  # Trivy also AMD64-only
      containers:
        - name: trivy
          image: goharbor/trivy-adapter-photon:v2.14.0
          env:
            - name: SCANNER_TRIVY_VULN_TYPE
              value: os,library
            - name: SCANNER_TRIVY_SEVERITY
              value: UNKNOWN,LOW,MEDIUM,HIGH,CRITICAL

Why not deployed yet?

  • 2-5 users - Manual image review sufficient (scan before deploy)

  • Resource cost - Trivy databases large (1-2GB), scans CPU-intensive

  • Complexity - Additional deployment to maintain

When to deploy Trivy:

  • Compliance requirements - PCI-DSS, SOC2 require automated scanning

  • Scaling beyond 10 users - Too many images to manually review

  • CI/CD integration - Block deploys with critical CVEs

For future improvements, see Storage Architecture: Future Improvements.

Manual Scanning Workaround (Current Approach)

Without Trivy deployment, scan images manually:

# Scan image locally before push
trivy image myapp:latest
# Output: Vulnerabilities by severity

# Fix vulnerabilities
# Update base image: alpine:3.18 → alpine:3.19
# Update dependencies: npm audit fix

# Re-scan
trivy image myapp:latest
# Output: 0 critical, 0 high (safe to deploy)

# Push to Harbor
docker push registry.staging.bluedynamics.eu/project/myapp:latest

Trade-off: Manual process (slower, error-prone) vs. automated scanning overhead.


Harbor Storage Architecture

S3 Backend

Harbor stores all image layers in S3 (not local disk):

Bucket: registry-gitlabbda-kup6s (fsn1 region)

Why S3?

  • Scalability - No disk space limits (grows on demand)

  • Cost - €0.01/GB/month (much cheaper than block storage)

  • Durability - Hetzner manages redundancy (multi-AZ replication)

  • Future CDN - Can front S3 with Cloudflare (global image pulls)

Image push flow:

1. docker push registry.staging.bluedynamics.eu/project/myapp:tag

2. Client → Upload layers to Harbor Registry (HTTP POST)

3. Harbor Registry → Write layers directly to S3
   Bucket: registry-gitlabbda-kup6s
   Path: /docker/registry/v2/blobs/sha256/<hash>

4. Harbor Core → Record manifest in PostgreSQL
   Table: artifact (image metadata)
   Table: artifact_blob (layer references)

5. Harbor → Return success to client

Image pull flow:

1. docker pull registry.staging.bluedynamics.eu/project/myapp:tag

2. Client → Request manifest from Harbor Core

3. Harbor Core → Query PostgreSQL (artifact + artifact_blob tables)
   → Return manifest (list of layer hashes)

4. Client → Request each layer from Harbor Registry

5. Harbor Registry → Stream layer from S3
   S3 GET /docker/registry/v2/blobs/sha256/<hash>
   → Stream to client (no local disk buffering)

6. Client → Assemble image from layers

Performance:

  • Pull: 50-200 MB/s (limited by S3 → client bandwidth)

  • Push: 10-100 MB/s (limited by client → S3 upload speed)

  • Concurrent pulls: Unlimited (S3 scales horizontally)

For S3 configuration, see S3 Buckets Reference.

Redis Caching

Harbor uses Redis database 2 for caching:

Cache types:

  1. Blob descriptor cache - Layer metadata (size, digest)

  2. Manifest cache - Image manifests (avoid PostgreSQL queries)

  3. Tag cache - Image tags → manifest mapping

Why Redis cache?

  • Performance - Manifest queries 10-100× faster (1ms vs 50ms PostgreSQL)

  • Reduced DB load - Frequent docker pull doesn’t hit PostgreSQL

  • Ephemeral - Cache can be cleared without data loss (rebuilds from PostgreSQL + S3)

Redis configuration:

# Harbor Registry config (ConfigMap: harbor-registry-config)
redis:
  addr: redis:6379
  db: 2  # Separate database from GitLab (DB 0)
  pool:
    maxidle: 100
    maxactive: 500

Why database 2?

  • Isolation - GitLab uses DB 0, Harbor uses DB 2 (no key conflicts)

  • Observability - Can monitor Harbor cache separately (INFO keyspace)

For Redis architecture, see GitLab Components: Redis.


Harbor Web UI

Features

Harbor UI provides:

  1. Project management - Create projects (image repositories)

  2. Image browsing - View tags, layers, size, push time

  3. Tag management - Delete old tags, add labels

  4. User management - Invite users, assign roles (admin, developer, guest)

  5. Replication - (Future) Replicate images to other registries

  6. Vulnerability reports - (Future with Trivy) View scan results per image

  7. Webhooks - Configure notifications (image push/pull events)

Access: https://registry.staging.bluedynamics.eu

Authentication: Login via GitLab OAuth (single sign-on)

Workflow Example

Creating a project and pushing images:

# 1. Create project via UI
# Navigate to: https://registry.staging.bluedynamics.eu
# → Projects → New Project → Name: myapp → Public/Private → Create

# 2. Tag image locally
docker tag myapp:latest registry.staging.bluedynamics.eu/myapp/myapp:latest

# 3. Login to registry
echo $GITLAB_TOKEN | docker login registry.staging.bluedynamics.eu -u gitlab-token --password-stdin

# 4. Push image
docker push registry.staging.bluedynamics.eu/myapp/myapp:latest

# 5. View in UI
# Navigate to: Projects → myapp → Repositories → myapp
# See: Tags, size, layers, push time

RBAC (Role-Based Access Control):

Role

Can View

Can Pull

Can Push

Can Delete

Can Manage Users

Guest

Developer

Maintainer

Admin

Default role: Developer (users from GitLab OAuth automatically get developer role)


Lifecycle Management

Harbor Upgrades

Harbor lifecycle is independent of GitLab:

# Upgrade Harbor (e.g., v2.14.0 → v2.15.0)

# 1. Update version in config.yaml
versions:
  harbor: v2.15.0

# 2. Rebuild manifests
npm run build

# 3. Commit and push
git add . && git commit -m "Upgrade Harbor to v2.15.0" && git push

# 4. ArgoCD auto-syncs
# Harbor pods rolling update (Core → Registry → JobService → Portal)

# 5. Verify
kubectl get pods -n gitlabbda -l app.kubernetes.io/part-of=harbor
# All pods Running with new version

Why independent lifecycle?

  • Risk isolation - Harbor upgrade doesn’t touch GitLab (separate failure domain)

  • Flexibility - Can upgrade Harbor for security fix without GitLab maintenance window

  • Rollback - Harbor rollback doesn’t affect GitLab availability

For upgrade procedures, see How-To: Upgrade Harbor (future).

Database Schema Migrations

Harbor uses automatic schema migrations (like GitLab):

1. New Harbor Core version deployed
   → Init container runs migrations

2. Harbor Core → Check PostgreSQL schema_migrations table
   → Compare with required version for v2.15.0

3. Migrations needed → Run SQL migration scripts
   → Add new tables (e.g., p2p_preheat_instance)
   → Add columns (e.g., artifact.icon)

4. Migrations complete → Harbor Core starts

5. Old Harbor Core pods → Drain connections, terminate

Migration safety:

  • Backward-compatible - Old version can read new schema (during rolling update)

  • Idempotent - Re-running migration has no effect (already applied)

  • Automatic - No manual SQL scripts (built into Harbor image)

Rollback limitation: Can’t rollback to version before schema change (one-way migration).

For database management, see GitLab Components: PostgreSQL.


Troubleshooting

Harbor Core Pod CrashLoopBackOff

Symptoms:

  • Harbor UI inaccessible (502 Bad Gateway)

  • harbor-core pod repeatedly crashing

Diagnosis:

kubectl logs deploy/harbor-core -n gitlabbda | tail -n 50

Common causes:

  1. PostgreSQL connection failure - Check gitlab-postgres-pooler service

    kubectl get svc gitlab-postgres-pooler -n gitlabbda
    
  2. Harbor database not initialized - Check if harbor database exists

    kubectl exec -it gitlab-postgres-1 -n gitlabbda -- psql -U postgres -l | grep harbor
    
  3. OAuth misconfiguration - Check harbor-secrets secret

    kubectl get secret harbor-secrets -n gitlabbda -o jsonpath='{.data.gitlab-oauth-client-id}' | base64 -d
    

Solution: Fix dependency, Harbor Core will restart automatically.

Image Push Fails (S3 Error)

Symptoms:

  • docker push hangs or fails with “blob upload invalid”

  • Harbor Registry logs show S3 errors

Diagnosis:

kubectl logs deploy/harbor-registry -n gitlabbda | grep -i s3

Common causes:

  1. S3 credentials invalid - Check harbor-s3-credentials secret

  2. S3 bucket doesn’t exist - Check Crossplane Bucket CR

    kubectl get bucket registry-gitlabbda-kup6s -n crossplane-system
    
  3. S3 endpoint unreachable - Check network connectivity

    kubectl exec -it deploy/harbor-registry -n gitlabbda -- wget -O- https://fsn1.your-objectstorage.com
    

Solution: Fix S3 configuration, restart Harbor Registry deployment.

For complete troubleshooting, see Troubleshooting Reference.


Future Improvements

Short-term (Next 6 months)

  1. Trivy scanner deployment - Enable automated vulnerability scanning

  2. Harbor robot accounts - CI/CD authentication (no personal tokens)

  3. Image retention policies - Auto-delete old tags (save S3 costs)

Medium-term (6-12 months)

  1. Separate PostgreSQL cluster - Dedicated CNPG cluster for Harbor (when scaling)

  2. Redis Sentinel for Harbor - HA cache (when scaling)

  3. Content Trust (Notary) - Image signing and verification (security)

Long-term (12+ months)

  1. Multi-region replication - Replicate images to hel1/nbg1 (DR)

  2. CDN integration - Cloudflare in front of S3 (global image pulls)

  3. Cosign integration - Modern image signing (vs. Notary)


Summary

Harbor integration provides:

  • OAuth SSO - Single sign-on via GitLab (unified user management)

  • Vulnerability scanning capability - Trivy integration available (deploy when needed)

  • Independent lifecycle - Upgrade Harbor without touching GitLab

  • Better UI - Dedicated registry management interface

  • S3 storage - Scalable, cost-effective image storage

Architectural decisions:

  • AMD64-only - Harbor official images not ARM64-compatible (use nodeSelector + tolerations)

  • Shared PostgreSQL - Harbor uses harbor database in GitLab CNPG cluster (adequate for 2-5 users)

  • Shared Redis - Harbor uses DB 2 in GitLab Redis instance (cache only, ephemeral)

Trade-offs:

  • Complexity - Separate deployment vs. built-in GitLab registry

  • Architecture constraint - Requires AMD64 nodes in cluster

  • Shared resources - Harbor and GitLab share PostgreSQL/Redis (acceptable for small scale)

For implementation details: