Explanation

Resource Management and QoS in Kubernetes¶

Type: Explanation (Understanding-oriented)

Related Concepts: kube.tf Configuration | Infrastructure as Code

Last Updated: 2025-10-29

Introduction¶

Resource management is critical for Kubernetes cluster stability and efficiency. This document explains why resource requests and limits matter, how Kubernetes uses them for scheduling and quality of service (QoS), and best practices for the kup6s cluster.

Why Resource Requests and Limits Matter¶

The Scheduling Problem¶

Without resource requests, Kubernetes has no way to know:

How much CPU/memory a pod needs
Where to schedule the pod
When nodes are overloaded

This leads to:

Random pod placement (may land on overloaded nodes)
Unpredictable performance (resource contention)
Cascading failures (node runs out of memory, kills random pods)

The Eviction Problem¶

When a node runs out of memory, Kubernetes must evict pods to reclaim resources. Without resource requests, your critical infrastructure can be evicted first.

Eviction order:

BestEffort pods (no requests) - evicted first
Burstable pods exceeding requests - evicted next
Burstable pods within requests - relatively safe
Guaranteed pods (requests = limits) - last resort

Quality of Service (QoS) Classes¶

Kubernetes automatically assigns QoS classes based on resource configuration:

BestEffort (No Requests or Limits)¶

resources: {}  # No configuration

Characteristics:

Lowest priority
First to be evicted during memory pressure
Can use unlimited resources (if available)
Unpredictable scheduling

When to use:

Non-critical batch jobs
Testing/experimentation
Canary pods

DO NOT use for: Production infrastructure, critical applications

Burstable (Requests < Limits)¶

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

Characteristics:

Medium priority
Protected from eviction if using less than requests
Can burst above requests (up to limits)
Predictable scheduling (based on requests)

When to use:

Most production workloads (recommended)
Infrastructure components
Applications with variable load

Current kup6s configuration: All critical infrastructure uses Burstable

Guaranteed (Requests = Limits)¶

resources:
  requests:
    cpu: 1000m
    memory: 2Gi
  limits:
    cpu: 1000m
    memory: 2Gi

Characteristics:

Highest priority
Never evicted due to other pods
Cannot burst beyond limits
Reserves resources exclusively

When to use:

Ultra-critical stateful workloads
Databases with strict performance requirements
Applications sensitive to CPU throttling

Downside: Wastes cluster capacity (no sharing, no bursting)

CPU vs Memory: Key Differences¶

CPU (Compressible Resource)¶

CPU is compressible - pods can share CPU time:

If a pod needs more CPU than its limit, it gets throttled (slowed down)
Throttling doesn’t kill the pod, just makes it slower
Node CPU overcommitment (limits > capacity) is acceptable (up to ~120%)

Example: 10 pods each with 1 CPU limit on an 8 CPU node

Total: 10 CPU limits on 8 physical CPUs (125% overcommit)
Reality: If only 5 pods are active, each gets plenty of CPU
If all 10 pods burst simultaneously, each gets throttled to 80%

Memory (Non-Compressible Resource)¶

Memory is non-compressible - pods cannot share memory:

If a pod needs more memory than available, it gets OOM killed (terminated)
Node memory overcommitment is dangerous (can cause cascading failures)
Keep node memory limits <110% of capacity

Example: 10 pods each with 1Gi limit on a 8Gi memory node

Total: 10Gi limits on 8Gi physical memory (125% overcommit) - DANGEROUS
Reality: If 9 pods collectively use 8Gi, the 10th pod may be killed
Risk: One pod’s memory leak can crash multiple other pods

Practical Implications¶

CPU:

Can set higher limits (2-5x requests) to allow bursting
Slight overcommitment acceptable
Monitor for excessive throttling

Memory:

Set conservative limits (1.5-2x requests)
Avoid overcommitment
Monitor for OOM kills

Resource Allocation Strategy¶

Step 1: Measure Actual Usage¶

Always start by measuring before setting requests:

# Deploy without constraints
kubectl apply -f my-app.yaml

# Monitor for 24-48 hours
kubectl top pods -n my-namespace --sort-by=memory
kubectl top pods -n my-namespace --sort-by=cpu

# Record peak usage
# Peak CPU: 50m
# Peak Memory: 200Mi

Step 2: Calculate Requests¶

Add headroom to peak usage:

CPU Requests: Peak × 1.25-1.5

Example: 50m peak → 75m request (50% headroom)

Memory Requests: Peak × 1.5-2.0

Example: 200Mi peak → 400Mi request (100% headroom)
Memory needs more headroom (can’t be compressed)

Step 3: Calculate Limits¶

Allow bursting while preventing runaway processes:

CPU Limits: Requests × 2-5

Example: 75m request → 300m limit (4x)
Allows bursting to 4x normal without waste

Memory Limits: Requests × 1.5-2

Example: 400Mi request → 800Mi limit (2x)
Catches memory leaks before node exhaustion

Step 4: Monitor and Adjust¶

After deployment:

# Check actual vs requested
kubectl top pods -n my-namespace

# If pod uses >80% of requests consistently: increase requests
# If pod uses <25% of requests consistently: decrease requests
# If pod hits limits frequently: increase limits

Common Sizing Patterns¶

Microservices (Stateless Applications)¶

resources:
  requests:
    cpu: 50m       # Low baseline
    memory: 128Mi  # Minimal memory
  limits:
    cpu: 200m      # 4x for request spikes
    memory: 256Mi  # 2x for safety

Backend APIs (Medium Load)¶

resources:
  requests:
    cpu: 100m      # Handle baseline traffic
    memory: 512Mi  # Caching, session data
  limits:
    cpu: 500m      # 5x for traffic spikes
    memory: 1Gi    # 2x for safety

Data Processing (CPU-Intensive)¶

resources:
  requests:
    cpu: 500m      # High baseline CPU
    memory: 512Mi  # Moderate memory
  limits:
    cpu: 2000m     # 4x for parallel processing
    memory: 1Gi    # 2x for data buffering

Caching Layer (Memory-Intensive)¶

resources:
  requests:
    cpu: 100m      # Low CPU usage
    memory: 2Gi    # Large cache
  limits:
    cpu: 500m      # 5x for eviction processing
    memory: 4Gi    # 2x for cache growth

kup6s Cluster: Resource Configuration¶

Infrastructure Components¶

All critical infrastructure components are properly configured for Burstable QoS:

ArgoCD (GitOps Controller):

Application Controller: 250m/768Mi (handles all cluster syncs)
Server/Repo: 50m/128Mi each (API and git operations)
Supporting components: 25-50m/64Mi

Monitoring Stack:

Prometheus: 100m/2500Mi (time-series database)
Loki: 100m/256Mi per component (log aggregation)
Grafana: 50m/512Mi (visualization)

Storage:

Longhorn Manager: 100m/256Mi per node (volume management)
Longhorn CSI: 50m/128Mi per node (volume mounting)

Provisioning:

Crossplane Providers: 100m/512Mi each (resource provisioning)

Resource Optimization History¶

October 2025 Optimization Project:

The cluster underwent comprehensive resource optimization:

Problem:

Worker nodes at 126-148% CPU limits (overcommitted)
ArgoCD had no resource requests (BestEffort QoS)
Loki requesting 5-10x actual usage (massive waste)

Solution:

Added requests to all ArgoCD components
Right-sized Loki from 500m/1Gi to 100m/256Mi per pod
Added guarantees to Longhorn and Crossplane

Results:

2.4 CPU cores freed (from Loki optimization)
Overcommitment eliminated: 148% → 85%
All infrastructure now has QoS guarantees
Zero service disruptions during changes

See cluster-resource-optimization.md for complete details.

Monitoring and Troubleshooting¶

Check Node Resource Allocation¶

# Overall allocation per node
kubectl describe nodes | grep -A 5 "Allocated resources"

Healthy targets:

CPU requests: 30-60% of node capacity
CPU limits: <120% of node capacity (slight overcommit OK)
Memory requests: 40-70% of node capacity
Memory limits: <110% of node capacity (avoid overcommit)

Check Pod Resource Usage¶

# Top memory consumers
kubectl top pods -A --sort-by=memory | head -20

# Top CPU consumers
kubectl top pods -A --sort-by=cpu | head -20

# Specific namespace
kubectl top pods -n my-namespace

Identify Resource Issues¶

Pods stuck in Pending:

kubectl get pods -A | grep Pending
kubectl describe pod <pod-name> -n <namespace>
# Look for: "Insufficient cpu" or "Insufficient memory"

Pods being OOM killed:

kubectl get events -A | grep OOMKilled
kubectl describe pod <pod-name> -n <namespace>
# Look for: "Last State: Terminated (Reason: OOMKilled)"

CPU throttling:

# Check if pods are being throttled
kubectl top pods -n <namespace>
# If CPU usage equals limit consistently, likely throttled

Anti-Patterns and Mistakes¶

❌ Using Default Helm Values Blindly¶

Many Helm charts have overly conservative defaults:

# Loki default (DO NOT USE)
resources:
  requests:
    cpu: 500m
    memory: 1Gi
# Actual usage: 20m CPU, 150Mi memory
# Waste: 96% CPU, 85% memory

Solution: Always measure first, then set appropriate values.

❌ No Requests on Production Workloads¶

# Dangerous for production
resources: {}

Problem: Pod can be evicted at any time, no scheduling guarantees.

Solution: Always set at minimum requests for production.

❌ Requests = Limits (Overuse of Guaranteed QoS)¶

# Wastes cluster capacity
resources:
  requests:
    cpu: 1000m
    memory: 2Gi
  limits:
    cpu: 1000m    # Same as request
    memory: 2Gi   # Same as request

Problem: Reserves resources exclusively, prevents sharing, wastes capacity.

Solution: Use Burstable (requests < limits) for most workloads.

❌ Memory Limits Too Low¶

# Will cause OOM kills
resources:
  requests:
    memory: 128Mi
  limits:
    memory: 128Mi  # No headroom!

Problem: Any memory spike causes immediate OOM kill.

Solution: Set limits to at least 2x requests for safety.

❌ CPU Limits Too Restrictive¶

# Will cause excessive throttling
resources:
  requests:
    cpu: 100m
  limits:
    cpu: 120m  # Only 20% headroom

Problem: Constant CPU throttling under normal load.

Solution: Set CPU limits to 2-5x requests to allow bursting.

Best Practices Summary¶

Always measure before setting requests - deploy, monitor, then configure
Use Burstable QoS for most workloads - requests < limits
Reserve Guaranteed QoS for databases - only when strictly necessary
Never use BestEffort in production - always set requests
CPU can overcommit slightly - up to 120% limits acceptable
Memory cannot overcommit - keep below 110% limits
Monitor actual vs requested - adjust based on real usage
Update source templates - don’t rely on manual kubectl patches

Resource Management and QoS in Kubernetes¶

Introduction¶

Why Resource Requests and Limits Matter¶

The Scheduling Problem¶

The Eviction Problem¶

Quality of Service (QoS) Classes¶

BestEffort (No Requests or Limits)¶

Burstable (Requests < Limits)¶

Guaranteed (Requests = Limits)¶

CPU vs Memory: Key Differences¶

CPU (Compressible Resource)¶

Memory (Non-Compressible Resource)¶

Practical Implications¶

Resource Allocation Strategy¶

Step 1: Measure Actual Usage¶

Step 2: Calculate Requests¶

Step 3: Calculate Limits¶

Step 4: Monitor and Adjust¶

Common Sizing Patterns¶

Microservices (Stateless Applications)¶

Backend APIs (Medium Load)¶

Data Processing (CPU-Intensive)¶

Caching Layer (Memory-Intensive)¶

kup6s Cluster: Resource Configuration¶

Infrastructure Components¶

Resource Optimization History¶

Monitoring and Troubleshooting¶

Check Node Resource Allocation¶

Check Pod Resource Usage¶

Identify Resource Issues¶

Anti-Patterns and Mistakes¶

❌ Using Default Helm Values Blindly¶

❌ No Requests on Production Workloads¶

❌ Requests = Limits (Overuse of Guaranteed QoS)¶

❌ Memory Limits Too Low¶

❌ CPU Limits Too Restrictive¶

Best Practices Summary¶

Further Reading¶

Related Documentation¶