Tutorial
Monitoring Basics¶
In this tutorial, you’ll learn to use the KUP6S monitoring stack (Prometheus, Grafana, and Loki) to observe your cluster and applications. You’ll explore dashboards, query metrics, and search logs.
What you’ll learn¶
Access and navigate Grafana
View cluster and application metrics with Prometheus
Query logs with Loki and LogQL
Create simple alerts
Debug application issues using observability tools
Step 1: Access Grafana¶
Get Grafana credentials¶
# Get admin password
kubectl get secret -n monitoring kube-prometheus-stack-grafana \
-o jsonpath='{.data.admin-password}' | base64 -d
echo
Open Grafana UI¶
Visit: https://grafana.ops.kup6s.net
Login with:
Username:
adminPassword: (from command above)
Tip
If you see a SSL warning, it’s because Let’s Encrypt needs a few minutes to issue the certificate. Wait or accept the self-signed cert temporarily.
Step 2: Explore pre-installed dashboards¶
Grafana comes with comprehensive dashboards for Kubernetes monitoring.
Key dashboards to explore¶
Kubernetes / Compute Resources / Cluster
Overall cluster CPU and memory usage
Node resource utilization
Pod count and distribution
Try it:
Open the dashboard
Observe the cluster overview
Notice the time range selector (top right)
Kubernetes / Compute Resources / Namespace (Pods)
Select namespace:
hello-kup6s(from previous tutorial)View pod CPU and memory usage
See network traffic
Kubernetes / Networking / Pod
Select pod from
hello-kup6snamespaceView incoming/outgoing bandwidth
See packet drops and errors
Step 3: Query metrics with Prometheus¶
Access Prometheus¶
Click Explore (compass icon) in left sidebar, then select Prometheus as data source.
Basic PromQL queries¶
Try these queries in the query editor:
Node CPU usage:
rate(node_cpu_seconds_total{mode!="idle"}[5m])
Click Run query (or press Shift+Enter)
Pod memory usage:
container_memory_usage_bytes{namespace="hello-kup6s"}
HTTP requests per second (if your app exports metrics):
rate(http_requests_total[5m])
Number of running pods per namespace:
count by (namespace) (kube_pod_info)
Tip
Click Metrics explorer button to browse available metrics instead of typing.
Step 4: Create a custom dashboard¶
Let’s create a simple dashboard for your hello-kup6s app.
Create new dashboard¶
Click Dashboards → New → New Dashboard
Click Add visualization
Select Prometheus as data source
Add panel: Pod CPU Usage¶
In the query editor, enter:
rate(container_cpu_usage_seconds_total{namespace="hello-kup6s"}[5m])
Configure panel:
Panel title: “Hello KUP6S CPU Usage”
Legend:
{{pod}}Click Apply (top right)
Add panel: Pod Memory Usage¶
Click Add → Visualization
Query:
container_memory_usage_bytes{namespace="hello-kup6s"}
Configure:
Panel title: “Hello KUP6S Memory Usage”
Unit: bytes (in Standard options)
Legend:
{{pod}}Click Apply
Save dashboard¶
Click Save dashboard (disk icon, top right)
Name: “Hello KUP6S Monitoring”
Click Save
Step 5: Query logs with Loki¶
Now let’s explore your application logs.
Access Loki in Explore¶
Click Explore (compass icon)
Select Loki as data source (dropdown at top)
Basic LogQL queries¶
All logs from hello-kup6s namespace:
{namespace="hello-kup6s"}
Click Run query
You’ll see log lines from your pods. Click ▶ to expand individual log entries.
Filter for specific pod:
{namespace="hello-kup6s", pod=~"hello-kup6s-.*"}
Search for specific text:
{namespace="hello-kup6s"} |= "GET"
This finds all logs containing “GET” (HTTP GET requests).
Case-insensitive search:
{namespace="hello-kup6s"} |~ `(?i)error`
Finds “error”, “Error”, “ERROR”, etc.
Exclude unwanted logs:
{namespace="hello-kup6s"} != "healthcheck"
Advanced LogQL: Parsing and filtering¶
Parse JSON logs and filter:
{namespace="hello-kup6s"}
| json
| status >= 400
Count error rates:
sum(rate({namespace="hello-kup6s"} |= "error" [5m]))
Top 5 log sources:
topk(5, count_over_time({namespace="hello-kup6s"}[1h]))
Tip
Use the Label browser button to discover available labels without typing.
Step 6: Debug an application issue¶
Let’s simulate an issue and use monitoring to debug it.
Generate some load¶
# Install hey for load testing
go install github.com/rakyll/hey@latest
# Generate requests
hey -n 1000 -c 10 https://hello.sites.kup6s.com
Monitor in Grafana¶
Go to your “Hello KUP6S Monitoring” dashboard
Watch CPU and memory increase
Notice the time it takes to handle requests
Check logs for errors¶
In Explore (Loki):
{namespace="hello-kup6s"}
|= "error"
OR |= "timeout"
OR |= "failed"
View pod resource usage¶
kubectl top pods -n hello-kup6s
Compare with Grafana metrics.
Step 7: Set up a basic alert¶
Let’s create an alert when pod memory exceeds a threshold.
Create alert rule¶
Go to Alerting → Alert rules (bell icon)
Click New alert rule
Configure:
Rule name: “High Memory Usage - Hello KUP6S”
Query A:
container_memory_usage_bytes{namespace="hello-kup6s"} > 100000000(Alert when memory > 100MB)
Condition:
Alast()IS ABOVE100000000Evaluation interval: 1m
For duration: 5m (alert after 5 minutes above threshold)
Configure notification¶
Folder: Create new: “Hello KUP6S Alerts”
Evaluation group: “Resource Alerts”
Summary: “Pod {{$labels.pod}} memory usage is {{$value}} bytes”
Click Save rule and exit
Test the alert¶
Scale up your app to trigger alert:
# Edit deployment to request more memory
kubectl edit deployment -n hello-kup6s hello-kup6s
# Change memory limit to 50Mi to trigger alert quickly
Wait 5-6 minutes and check Alerting → Alert rules to see if it fires.
Step 8: Explore cluster-wide monitoring¶
View node metrics¶
Dashboard: Kubernetes / Compute Resources / Node (Pods)
Select a node
See CPU, memory, disk I/O
Check which pods are consuming resources
Check persistent volumes¶
Dashboard: Longhorn / Volume
See Longhorn volume usage
Check replication health
Monitor I/O performance
Traefik ingress metrics¶
Dashboard: Traefik
HTTP request rates
Response times
Status code distribution
Backend health
Step 9: Best practices learned¶
Monitoring strategy¶
Start with pre-built dashboards - Don’t reinvent the wheel
Create app-specific dashboards - For your own applications
Use labels effectively - namespace, pod, container tags help filter
Set meaningful alerts - Alert on symptoms, not causes
Query tips¶
Prometheus (PromQL):
Use
rate()for counters (e.g., requests per second)Use
avg_over_time()for gauges (e.g., average memory)Always specify time ranges:
[5m],[1h]
Loki (LogQL):
Start broad, then filter:
{namespace="x"}→|= "error"Use regex for patterns:
|~ "error|failed|timeout"Parse structured logs:
| jsonor| logfmt
Congratulations! 🎉¶
You’ve mastered the basics of Kubernetes observability!
What you’ve learned¶
Navigate Grafana and explore dashboards
Query metrics with Prometheus and PromQL
Search and analyze logs with Loki and LogQL
Create custom dashboards and panels
Set up basic alerting rules
Debug application issues using observability
What’s next?¶
Deep dive into specific topics:
Query Loki logs (How-To) - Advanced LogQL
Create alerts (How-To) - Comprehensive alerting guide
Monitoring stack reference - Technical details
Add more capabilities:
Create S3 bucket - For log retention
Backup and restore - Disaster recovery
Understand the architecture:
Monitoring philosophy - Why Prometheus + Loki + Grafana
Storage architecture - Where logs and metrics are stored
Troubleshooting¶
Grafana won’t load¶
# Check Grafana pod
kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana
# Check logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana
No metrics showing¶
# Check Prometheus is scraping
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# Open http://localhost:9090/targets
No logs in Loki¶
# Check Loki pods
kubectl get pods -n monitoring -l app.kubernetes.io/name=loki
# Check Alloy (log shipper)
kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy
Alert not firing¶
Verify query returns data in Explore first
Check evaluation interval and “For” duration
Look at Alerting → Alert rules → State history