Explanation
Resource Management and QoS in Kubernetes¶
Last Updated: 2025-10-29
Introduction¶
Resource management is critical for Kubernetes cluster stability and efficiency. This document explains why resource requests and limits matter, how Kubernetes uses them for scheduling and quality of service (QoS), and best practices for the kup6s cluster.
Why Resource Requests and Limits Matter¶
The Scheduling Problem¶
Without resource requests, Kubernetes has no way to know:
How much CPU/memory a pod needs
Where to schedule the pod
When nodes are overloaded
This leads to:
Random pod placement (may land on overloaded nodes)
Unpredictable performance (resource contention)
Cascading failures (node runs out of memory, kills random pods)
The Eviction Problem¶
When a node runs out of memory, Kubernetes must evict pods to reclaim resources. Without resource requests, your critical infrastructure can be evicted first.
Eviction order:
BestEffort pods (no requests) - evicted first
Burstable pods exceeding requests - evicted next
Burstable pods within requests - relatively safe
Guaranteed pods (requests = limits) - last resort
Quality of Service (QoS) Classes¶
Kubernetes automatically assigns QoS classes based on resource configuration:
BestEffort (No Requests or Limits)¶
resources: {} # No configuration
Characteristics:
Lowest priority
First to be evicted during memory pressure
Can use unlimited resources (if available)
Unpredictable scheduling
When to use:
Non-critical batch jobs
Testing/experimentation
Canary pods
DO NOT use for: Production infrastructure, critical applications
Burstable (Requests < Limits)¶
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
Characteristics:
Medium priority
Protected from eviction if using less than requests
Can burst above requests (up to limits)
Predictable scheduling (based on requests)
When to use:
Most production workloads (recommended)
Infrastructure components
Applications with variable load
Current kup6s configuration: All critical infrastructure uses Burstable
Guaranteed (Requests = Limits)¶
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 1000m
memory: 2Gi
Characteristics:
Highest priority
Never evicted due to other pods
Cannot burst beyond limits
Reserves resources exclusively
When to use:
Ultra-critical stateful workloads
Databases with strict performance requirements
Applications sensitive to CPU throttling
Downside: Wastes cluster capacity (no sharing, no bursting)
CPU vs Memory: Key Differences¶
CPU (Compressible Resource)¶
CPU is compressible - pods can share CPU time:
If a pod needs more CPU than its limit, it gets throttled (slowed down)
Throttling doesn’t kill the pod, just makes it slower
Node CPU overcommitment (limits > capacity) is acceptable (up to ~120%)
Example: 10 pods each with 1 CPU limit on an 8 CPU node
Total: 10 CPU limits on 8 physical CPUs (125% overcommit)
Reality: If only 5 pods are active, each gets plenty of CPU
If all 10 pods burst simultaneously, each gets throttled to 80%
Memory (Non-Compressible Resource)¶
Memory is non-compressible - pods cannot share memory:
If a pod needs more memory than available, it gets OOM killed (terminated)
Node memory overcommitment is dangerous (can cause cascading failures)
Keep node memory limits <110% of capacity
Example: 10 pods each with 1Gi limit on a 8Gi memory node
Total: 10Gi limits on 8Gi physical memory (125% overcommit) - DANGEROUS
Reality: If 9 pods collectively use 8Gi, the 10th pod may be killed
Risk: One pod’s memory leak can crash multiple other pods
Practical Implications¶
CPU:
Can set higher limits (2-5x requests) to allow bursting
Slight overcommitment acceptable
Monitor for excessive throttling
Memory:
Set conservative limits (1.5-2x requests)
Avoid overcommitment
Monitor for OOM kills
Resource Allocation Strategy¶
Step 1: Measure Actual Usage¶
Always start by measuring before setting requests:
# Deploy without constraints
kubectl apply -f my-app.yaml
# Monitor for 24-48 hours
kubectl top pods -n my-namespace --sort-by=memory
kubectl top pods -n my-namespace --sort-by=cpu
# Record peak usage
# Peak CPU: 50m
# Peak Memory: 200Mi
Step 2: Calculate Requests¶
Add headroom to peak usage:
CPU Requests: Peak × 1.25-1.5
Example: 50m peak → 75m request (50% headroom)
Memory Requests: Peak × 1.5-2.0
Example: 200Mi peak → 400Mi request (100% headroom)
Memory needs more headroom (can’t be compressed)
Step 3: Calculate Limits¶
Allow bursting while preventing runaway processes:
CPU Limits: Requests × 2-5
Example: 75m request → 300m limit (4x)
Allows bursting to 4x normal without waste
Memory Limits: Requests × 1.5-2
Example: 400Mi request → 800Mi limit (2x)
Catches memory leaks before node exhaustion
Step 4: Monitor and Adjust¶
After deployment:
# Check actual vs requested
kubectl top pods -n my-namespace
# If pod uses >80% of requests consistently: increase requests
# If pod uses <25% of requests consistently: decrease requests
# If pod hits limits frequently: increase limits
Common Sizing Patterns¶
Microservices (Stateless Applications)¶
resources:
requests:
cpu: 50m # Low baseline
memory: 128Mi # Minimal memory
limits:
cpu: 200m # 4x for request spikes
memory: 256Mi # 2x for safety
Backend APIs (Medium Load)¶
resources:
requests:
cpu: 100m # Handle baseline traffic
memory: 512Mi # Caching, session data
limits:
cpu: 500m # 5x for traffic spikes
memory: 1Gi # 2x for safety
Data Processing (CPU-Intensive)¶
resources:
requests:
cpu: 500m # High baseline CPU
memory: 512Mi # Moderate memory
limits:
cpu: 2000m # 4x for parallel processing
memory: 1Gi # 2x for data buffering
Caching Layer (Memory-Intensive)¶
resources:
requests:
cpu: 100m # Low CPU usage
memory: 2Gi # Large cache
limits:
cpu: 500m # 5x for eviction processing
memory: 4Gi # 2x for cache growth
kup6s Cluster: Resource Configuration¶
Infrastructure Components¶
All critical infrastructure components are properly configured for Burstable QoS:
ArgoCD (GitOps Controller):
Application Controller: 250m/768Mi (handles all cluster syncs)
Server/Repo: 50m/128Mi each (API and git operations)
Supporting components: 25-50m/64Mi
Monitoring Stack:
Prometheus: 100m/2500Mi (time-series database)
Loki: 100m/256Mi per component (log aggregation)
Grafana: 50m/512Mi (visualization)
Storage:
Longhorn Manager: 100m/256Mi per node (volume management)
Longhorn CSI: 50m/128Mi per node (volume mounting)
Provisioning:
Crossplane Providers: 100m/512Mi each (resource provisioning)
Resource Optimization History¶
October 2025 Optimization Project:
The cluster underwent comprehensive resource optimization:
Problem:
Worker nodes at 126-148% CPU limits (overcommitted)
ArgoCD had no resource requests (BestEffort QoS)
Loki requesting 5-10x actual usage (massive waste)
Solution:
Added requests to all ArgoCD components
Right-sized Loki from 500m/1Gi to 100m/256Mi per pod
Added guarantees to Longhorn and Crossplane
Results:
2.4 CPU cores freed (from Loki optimization)
Overcommitment eliminated: 148% → 85%
All infrastructure now has QoS guarantees
Zero service disruptions during changes
See cluster-resource-optimization.md for complete details.
Monitoring and Troubleshooting¶
Check Node Resource Allocation¶
# Overall allocation per node
kubectl describe nodes | grep -A 5 "Allocated resources"
Healthy targets:
CPU requests: 30-60% of node capacity
CPU limits: <120% of node capacity (slight overcommit OK)
Memory requests: 40-70% of node capacity
Memory limits: <110% of node capacity (avoid overcommit)
Check Pod Resource Usage¶
# Top memory consumers
kubectl top pods -A --sort-by=memory | head -20
# Top CPU consumers
kubectl top pods -A --sort-by=cpu | head -20
# Specific namespace
kubectl top pods -n my-namespace
Identify Resource Issues¶
Pods stuck in Pending:
kubectl get pods -A | grep Pending
kubectl describe pod <pod-name> -n <namespace>
# Look for: "Insufficient cpu" or "Insufficient memory"
Pods being OOM killed:
kubectl get events -A | grep OOMKilled
kubectl describe pod <pod-name> -n <namespace>
# Look for: "Last State: Terminated (Reason: OOMKilled)"
CPU throttling:
# Check if pods are being throttled
kubectl top pods -n <namespace>
# If CPU usage equals limit consistently, likely throttled
Anti-Patterns and Mistakes¶
❌ Using Default Helm Values Blindly¶
Many Helm charts have overly conservative defaults:
# Loki default (DO NOT USE)
resources:
requests:
cpu: 500m
memory: 1Gi
# Actual usage: 20m CPU, 150Mi memory
# Waste: 96% CPU, 85% memory
Solution: Always measure first, then set appropriate values.
❌ No Requests on Production Workloads¶
# Dangerous for production
resources: {}
Problem: Pod can be evicted at any time, no scheduling guarantees.
Solution: Always set at minimum requests for production.
❌ Requests = Limits (Overuse of Guaranteed QoS)¶
# Wastes cluster capacity
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 1000m # Same as request
memory: 2Gi # Same as request
Problem: Reserves resources exclusively, prevents sharing, wastes capacity.
Solution: Use Burstable (requests < limits) for most workloads.
❌ Memory Limits Too Low¶
# Will cause OOM kills
resources:
requests:
memory: 128Mi
limits:
memory: 128Mi # No headroom!
Problem: Any memory spike causes immediate OOM kill.
Solution: Set limits to at least 2x requests for safety.
❌ CPU Limits Too Restrictive¶
# Will cause excessive throttling
resources:
requests:
cpu: 100m
limits:
cpu: 120m # Only 20% headroom
Problem: Constant CPU throttling under normal load.
Solution: Set CPU limits to 2-5x requests to allow bursting.
Best Practices Summary¶
Always measure before setting requests - deploy, monitor, then configure
Use Burstable QoS for most workloads - requests < limits
Reserve Guaranteed QoS for databases - only when strictly necessary
Never use BestEffort in production - always set requests
CPU can overcommit slightly - up to 120% limits acceptable
Memory cannot overcommit - keep below 110% limits
Monitor actual vs requested - adjust based on real usage
Update source templates - don’t rely on manual kubectl patches