Tutorial

Monitoring Basics¶

Type: Tutorial (Learning-oriented)

Time: ~30 minutes | Level: Beginner

Prerequisites: Deploy First App tutorial

Next: Sharing Secrets Across Namespaces

In this tutorial, you’ll learn to use the KUP6S monitoring stack (Prometheus, Grafana, and Loki) to observe your cluster and applications. You’ll explore dashboards, query metrics, and search logs.

What you’ll learn¶

Access and navigate Grafana
View cluster and application metrics with Prometheus
Query logs with Loki and LogQL
Create simple alerts
Debug application issues using observability tools

Step 1: Access Grafana¶

Get Grafana credentials¶

# Get admin password
kubectl get secret -n monitoring kube-prometheus-stack-grafana \
  -o jsonpath='{.data.admin-password}' | base64 -d
echo

Open Grafana UI¶

Visit: https://grafana.ops.kup6s.net

Username: admin
Password: (from command above)

Tip

If you see a SSL warning, it’s because Let’s Encrypt needs a few minutes to issue the certificate. Wait or accept the self-signed cert temporarily.

Step 2: Explore pre-installed dashboards¶

Grafana comes with comprehensive dashboards for Kubernetes monitoring.

Navigate to dashboards¶

Click Dashboards (four squares icon) in the left sidebar
Browse available dashboards

Key dashboards to explore¶

Kubernetes / Compute Resources / Cluster

Overall cluster CPU and memory usage
Node resource utilization
Pod count and distribution

Try it:

Open the dashboard
Observe the cluster overview
Notice the time range selector (top right)

Kubernetes / Compute Resources / Namespace (Pods)

Select namespace: hello-kup6s (from previous tutorial)
View pod CPU and memory usage
See network traffic

Kubernetes / Networking / Pod

Select pod from hello-kup6s namespace
View incoming/outgoing bandwidth
See packet drops and errors

Step 3: Query metrics with Prometheus¶

Access Prometheus¶

Click Explore (compass icon) in left sidebar, then select Prometheus as data source.

Basic PromQL queries¶

Try these queries in the query editor:

Node CPU usage:

rate(node_cpu_seconds_total{mode!="idle"}[5m])

Click Run query (or press Shift+Enter)

Pod memory usage:

container_memory_usage_bytes{namespace="hello-kup6s"}

HTTP requests per second (if your app exports metrics):

rate(http_requests_total[5m])

Number of running pods per namespace:

count by (namespace) (kube_pod_info)

Tip

Click Metrics explorer button to browse available metrics instead of typing.

Step 4: Create a custom dashboard¶

Let’s create a simple dashboard for your hello-kup6s app.

Create new dashboard¶

Click Dashboards → New → New Dashboard
Click Add visualization
Select Prometheus as data source

Add panel: Pod CPU Usage¶

In the query editor, enter:

rate(container_cpu_usage_seconds_total{namespace="hello-kup6s"}[5m])

Configure panel:

Panel title: “Hello KUP6S CPU Usage”
Legend: {{pod}}
Click Apply (top right)

Add panel: Pod Memory Usage¶

Click Add → Visualization

Query:

container_memory_usage_bytes{namespace="hello-kup6s"}

Configure:

Panel title: “Hello KUP6S Memory Usage”
Unit: bytes (in Standard options)
Legend: {{pod}}
Click Apply

Save dashboard¶

Click Save dashboard (disk icon, top right)
Name: “Hello KUP6S Monitoring”
Click Save

Step 5: Query logs with Loki¶

Now let’s explore your application logs.

Access Loki in Explore¶

Click Explore (compass icon)
Select Loki as data source (dropdown at top)

Basic LogQL queries¶

All logs from hello-kup6s namespace:

{namespace="hello-kup6s"}

Click Run query

You’ll see log lines from your pods. Click ▶ to expand individual log entries.

Filter for specific pod:

{namespace="hello-kup6s", pod=~"hello-kup6s-.*"}

Search for specific text:

{namespace="hello-kup6s"} |= "GET"

This finds all logs containing “GET” (HTTP GET requests).

Case-insensitive search:

{namespace="hello-kup6s"} |~ `(?i)error`

Finds “error”, “Error”, “ERROR”, etc.

Exclude unwanted logs:

{namespace="hello-kup6s"} != "healthcheck"

Advanced LogQL: Parsing and filtering¶

Parse JSON logs and filter:

{namespace="hello-kup6s"}
  | json
  | status >= 400

Count error rates:

sum(rate({namespace="hello-kup6s"} |= "error" [5m]))

Top 5 log sources:

topk(5, count_over_time({namespace="hello-kup6s"}[1h]))

Tip

Use the Label browser button to discover available labels without typing.

Step 6: Debug an application issue¶

Let’s simulate an issue and use monitoring to debug it.

Generate some load¶

# Install hey for load testing
go install github.com/rakyll/hey@latest

# Generate requests
hey -n 1000 -c 10 https://hello.sites.kup6s.com

Monitor in Grafana¶

Go to your “Hello KUP6S Monitoring” dashboard
Watch CPU and memory increase
Notice the time it takes to handle requests

Check logs for errors¶

In Explore (Loki):

{namespace="hello-kup6s"}
  |= "error"
  OR |= "timeout"
  OR |= "failed"

View pod resource usage¶

kubectl top pods -n hello-kup6s

Compare with Grafana metrics.

Step 7: Set up a basic alert¶

Let’s create an alert when pod memory exceeds a threshold.

Create alert rule¶

Go to Alerting → Alert rules (bell icon)
Click New alert rule

Configure:

Rule name: “High Memory Usage - Hello KUP6S”

Query A:

container_memory_usage_bytes{namespace="hello-kup6s"} > 100000000

(Alert when memory > 100MB)

Condition: A last() IS ABOVE 100000000
Evaluation interval: 1m
For duration: 5m (alert after 5 minutes above threshold)

Configure notification¶

Folder: Create new: “Hello KUP6S Alerts”
Evaluation group: “Resource Alerts”
Summary: “Pod {{$labels.pod}} memory usage is {{$value}} bytes”

Click Save rule and exit

Test the alert¶

Scale up your app to trigger alert:

# Edit deployment to request more memory
kubectl edit deployment -n hello-kup6s hello-kup6s
# Change memory limit to 50Mi to trigger alert quickly

Wait 5-6 minutes and check Alerting → Alert rules to see if it fires.

Step 8: Explore cluster-wide monitoring¶

View node metrics¶

Dashboard: Kubernetes / Compute Resources / Node (Pods)

Select a node
See CPU, memory, disk I/O
Check which pods are consuming resources

Check persistent volumes¶

Dashboard: Longhorn / Volume

See Longhorn volume usage
Check replication health
Monitor I/O performance

Traefik ingress metrics¶

Dashboard: Traefik

HTTP request rates
Response times
Status code distribution
Backend health

Step 9: Best practices learned¶

Monitoring strategy¶

Start with pre-built dashboards - Don’t reinvent the wheel
Create app-specific dashboards - For your own applications
Use labels effectively - namespace, pod, container tags help filter
Set meaningful alerts - Alert on symptoms, not causes

Query tips¶

Prometheus (PromQL):

Use rate() for counters (e.g., requests per second)
Use avg_over_time() for gauges (e.g., average memory)
Always specify time ranges: [5m], [1h]

Loki (LogQL):

Start broad, then filter: {namespace="x"} → |= "error"
Use regex for patterns: |~ "error|failed|timeout"
Parse structured logs: | json or | logfmt

Congratulations! 🎉¶

You’ve mastered the basics of Kubernetes observability!

What you’ve learned¶

Navigate Grafana and explore dashboards
Query metrics with Prometheus and PromQL
Search and analyze logs with Loki and LogQL
Create custom dashboards and panels
Set up basic alerting rules
Debug application issues using observability

What’s next?¶

Deep dive into specific topics:

Query Loki logs (How-To) - Advanced LogQL
Create alerts (How-To) - Comprehensive alerting guide
Monitoring stack reference - Technical details

Add more capabilities:

Create S3 bucket - For log retention
Backup and restore - Disaster recovery

Understand the architecture:

Monitoring philosophy - Why Prometheus + Loki + Grafana
Storage architecture - Where logs and metrics are stored

Troubleshooting¶

Grafana won’t load¶

# Check Grafana pod
kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana

# Check logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana

No metrics showing¶

# Check Prometheus is scraping
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# Open http://localhost:9090/targets

No logs in Loki¶

# Check Loki pods
kubectl get pods -n monitoring -l app.kubernetes.io/name=loki

# Check Alloy (log shipper)
kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy

Alert not firing¶

Verify query returns data in Explore first
Check evaluation interval and “For” duration
Look at Alerting → Alert rules → State history