Explanation

Resource Management and QoS in Kubernetes

Last Updated: 2025-10-29

Introduction

Resource management is critical for Kubernetes cluster stability and efficiency. This document explains why resource requests and limits matter, how Kubernetes uses them for scheduling and quality of service (QoS), and best practices for the kup6s cluster.

Why Resource Requests and Limits Matter

The Scheduling Problem

Without resource requests, Kubernetes has no way to know:

  • How much CPU/memory a pod needs

  • Where to schedule the pod

  • When nodes are overloaded

This leads to:

  • Random pod placement (may land on overloaded nodes)

  • Unpredictable performance (resource contention)

  • Cascading failures (node runs out of memory, kills random pods)

The Eviction Problem

When a node runs out of memory, Kubernetes must evict pods to reclaim resources. Without resource requests, your critical infrastructure can be evicted first.

Eviction order:

  1. BestEffort pods (no requests) - evicted first

  2. Burstable pods exceeding requests - evicted next

  3. Burstable pods within requests - relatively safe

  4. Guaranteed pods (requests = limits) - last resort

Quality of Service (QoS) Classes

Kubernetes automatically assigns QoS classes based on resource configuration:

BestEffort (No Requests or Limits)

resources: {}  # No configuration

Characteristics:

  • Lowest priority

  • First to be evicted during memory pressure

  • Can use unlimited resources (if available)

  • Unpredictable scheduling

When to use:

  • Non-critical batch jobs

  • Testing/experimentation

  • Canary pods

DO NOT use for: Production infrastructure, critical applications

Burstable (Requests < Limits)

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

Characteristics:

  • Medium priority

  • Protected from eviction if using less than requests

  • Can burst above requests (up to limits)

  • Predictable scheduling (based on requests)

When to use:

  • Most production workloads (recommended)

  • Infrastructure components

  • Applications with variable load

Current kup6s configuration: All critical infrastructure uses Burstable

Guaranteed (Requests = Limits)

resources:
  requests:
    cpu: 1000m
    memory: 2Gi
  limits:
    cpu: 1000m
    memory: 2Gi

Characteristics:

  • Highest priority

  • Never evicted due to other pods

  • Cannot burst beyond limits

  • Reserves resources exclusively

When to use:

  • Ultra-critical stateful workloads

  • Databases with strict performance requirements

  • Applications sensitive to CPU throttling

Downside: Wastes cluster capacity (no sharing, no bursting)

CPU vs Memory: Key Differences

CPU (Compressible Resource)

CPU is compressible - pods can share CPU time:

  • If a pod needs more CPU than its limit, it gets throttled (slowed down)

  • Throttling doesn’t kill the pod, just makes it slower

  • Node CPU overcommitment (limits > capacity) is acceptable (up to ~120%)

Example: 10 pods each with 1 CPU limit on an 8 CPU node

  • Total: 10 CPU limits on 8 physical CPUs (125% overcommit)

  • Reality: If only 5 pods are active, each gets plenty of CPU

  • If all 10 pods burst simultaneously, each gets throttled to 80%

Memory (Non-Compressible Resource)

Memory is non-compressible - pods cannot share memory:

  • If a pod needs more memory than available, it gets OOM killed (terminated)

  • Node memory overcommitment is dangerous (can cause cascading failures)

  • Keep node memory limits <110% of capacity

Example: 10 pods each with 1Gi limit on a 8Gi memory node

  • Total: 10Gi limits on 8Gi physical memory (125% overcommit) - DANGEROUS

  • Reality: If 9 pods collectively use 8Gi, the 10th pod may be killed

  • Risk: One pod’s memory leak can crash multiple other pods

Practical Implications

CPU:

  • Can set higher limits (2-5x requests) to allow bursting

  • Slight overcommitment acceptable

  • Monitor for excessive throttling

Memory:

  • Set conservative limits (1.5-2x requests)

  • Avoid overcommitment

  • Monitor for OOM kills

Resource Allocation Strategy

Step 1: Measure Actual Usage

Always start by measuring before setting requests:

# Deploy without constraints
kubectl apply -f my-app.yaml

# Monitor for 24-48 hours
kubectl top pods -n my-namespace --sort-by=memory
kubectl top pods -n my-namespace --sort-by=cpu

# Record peak usage
# Peak CPU: 50m
# Peak Memory: 200Mi

Step 2: Calculate Requests

Add headroom to peak usage:

CPU Requests: Peak × 1.25-1.5

  • Example: 50m peak → 75m request (50% headroom)

Memory Requests: Peak × 1.5-2.0

  • Example: 200Mi peak → 400Mi request (100% headroom)

  • Memory needs more headroom (can’t be compressed)

Step 3: Calculate Limits

Allow bursting while preventing runaway processes:

CPU Limits: Requests × 2-5

  • Example: 75m request → 300m limit (4x)

  • Allows bursting to 4x normal without waste

Memory Limits: Requests × 1.5-2

  • Example: 400Mi request → 800Mi limit (2x)

  • Catches memory leaks before node exhaustion

Step 4: Monitor and Adjust

After deployment:

# Check actual vs requested
kubectl top pods -n my-namespace

# If pod uses >80% of requests consistently: increase requests
# If pod uses <25% of requests consistently: decrease requests
# If pod hits limits frequently: increase limits

Common Sizing Patterns

Microservices (Stateless Applications)

resources:
  requests:
    cpu: 50m       # Low baseline
    memory: 128Mi  # Minimal memory
  limits:
    cpu: 200m      # 4x for request spikes
    memory: 256Mi  # 2x for safety

Backend APIs (Medium Load)

resources:
  requests:
    cpu: 100m      # Handle baseline traffic
    memory: 512Mi  # Caching, session data
  limits:
    cpu: 500m      # 5x for traffic spikes
    memory: 1Gi    # 2x for safety

Data Processing (CPU-Intensive)

resources:
  requests:
    cpu: 500m      # High baseline CPU
    memory: 512Mi  # Moderate memory
  limits:
    cpu: 2000m     # 4x for parallel processing
    memory: 1Gi    # 2x for data buffering

Caching Layer (Memory-Intensive)

resources:
  requests:
    cpu: 100m      # Low CPU usage
    memory: 2Gi    # Large cache
  limits:
    cpu: 500m      # 5x for eviction processing
    memory: 4Gi    # 2x for cache growth

kup6s Cluster: Resource Configuration

Infrastructure Components

All critical infrastructure components are properly configured for Burstable QoS:

ArgoCD (GitOps Controller):

  • Application Controller: 250m/768Mi (handles all cluster syncs)

  • Server/Repo: 50m/128Mi each (API and git operations)

  • Supporting components: 25-50m/64Mi

Monitoring Stack:

  • Prometheus: 100m/2500Mi (time-series database)

  • Loki: 100m/256Mi per component (log aggregation)

  • Grafana: 50m/512Mi (visualization)

Storage:

  • Longhorn Manager: 100m/256Mi per node (volume management)

  • Longhorn CSI: 50m/128Mi per node (volume mounting)

Provisioning:

  • Crossplane Providers: 100m/512Mi each (resource provisioning)

Resource Optimization History

October 2025 Optimization Project:

The cluster underwent comprehensive resource optimization:

Problem:

  • Worker nodes at 126-148% CPU limits (overcommitted)

  • ArgoCD had no resource requests (BestEffort QoS)

  • Loki requesting 5-10x actual usage (massive waste)

Solution:

  • Added requests to all ArgoCD components

  • Right-sized Loki from 500m/1Gi to 100m/256Mi per pod

  • Added guarantees to Longhorn and Crossplane

Results:

  • 2.4 CPU cores freed (from Loki optimization)

  • Overcommitment eliminated: 148% → 85%

  • All infrastructure now has QoS guarantees

  • Zero service disruptions during changes

See cluster-resource-optimization.md for complete details.

Monitoring and Troubleshooting

Check Node Resource Allocation

# Overall allocation per node
kubectl describe nodes | grep -A 5 "Allocated resources"

Healthy targets:

  • CPU requests: 30-60% of node capacity

  • CPU limits: <120% of node capacity (slight overcommit OK)

  • Memory requests: 40-70% of node capacity

  • Memory limits: <110% of node capacity (avoid overcommit)

Check Pod Resource Usage

# Top memory consumers
kubectl top pods -A --sort-by=memory | head -20

# Top CPU consumers
kubectl top pods -A --sort-by=cpu | head -20

# Specific namespace
kubectl top pods -n my-namespace

Identify Resource Issues

Pods stuck in Pending:

kubectl get pods -A | grep Pending
kubectl describe pod <pod-name> -n <namespace>
# Look for: "Insufficient cpu" or "Insufficient memory"

Pods being OOM killed:

kubectl get events -A | grep OOMKilled
kubectl describe pod <pod-name> -n <namespace>
# Look for: "Last State: Terminated (Reason: OOMKilled)"

CPU throttling:

# Check if pods are being throttled
kubectl top pods -n <namespace>
# If CPU usage equals limit consistently, likely throttled

Anti-Patterns and Mistakes

❌ Using Default Helm Values Blindly

Many Helm charts have overly conservative defaults:

# Loki default (DO NOT USE)
resources:
  requests:
    cpu: 500m
    memory: 1Gi
# Actual usage: 20m CPU, 150Mi memory
# Waste: 96% CPU, 85% memory

Solution: Always measure first, then set appropriate values.

❌ No Requests on Production Workloads

# Dangerous for production
resources: {}

Problem: Pod can be evicted at any time, no scheduling guarantees.

Solution: Always set at minimum requests for production.

❌ Requests = Limits (Overuse of Guaranteed QoS)

# Wastes cluster capacity
resources:
  requests:
    cpu: 1000m
    memory: 2Gi
  limits:
    cpu: 1000m    # Same as request
    memory: 2Gi   # Same as request

Problem: Reserves resources exclusively, prevents sharing, wastes capacity.

Solution: Use Burstable (requests < limits) for most workloads.

❌ Memory Limits Too Low

# Will cause OOM kills
resources:
  requests:
    memory: 128Mi
  limits:
    memory: 128Mi  # No headroom!

Problem: Any memory spike causes immediate OOM kill.

Solution: Set limits to at least 2x requests for safety.

❌ CPU Limits Too Restrictive

# Will cause excessive throttling
resources:
  requests:
    cpu: 100m
  limits:
    cpu: 120m  # Only 20% headroom

Problem: Constant CPU throttling under normal load.

Solution: Set CPU limits to 2-5x requests to allow bursting.

Best Practices Summary

  1. Always measure before setting requests - deploy, monitor, then configure

  2. Use Burstable QoS for most workloads - requests < limits

  3. Reserve Guaranteed QoS for databases - only when strictly necessary

  4. Never use BestEffort in production - always set requests

  5. CPU can overcommit slightly - up to 120% limits acceptable

  6. Memory cannot overcommit - keep below 110% limits

  7. Monitor actual vs requested - adjust based on real usage

  8. Update source templates - don’t rely on manual kubectl patches

Further Reading