Tutorial

Monitoring Basics

In this tutorial, you’ll learn to use the KUP6S monitoring stack (Prometheus, Grafana, and Loki) to observe your cluster and applications. You’ll explore dashboards, query metrics, and search logs.

What you’ll learn

  • Access and navigate Grafana

  • View cluster and application metrics with Prometheus

  • Query logs with Loki and LogQL

  • Create simple alerts

  • Debug application issues using observability tools

Step 1: Access Grafana

Get Grafana credentials

# Get admin password
kubectl get secret -n monitoring kube-prometheus-stack-grafana \
  -o jsonpath='{.data.admin-password}' | base64 -d
echo

Open Grafana UI

Visit: https://grafana.ops.kup6s.net

Login with:

  • Username: admin

  • Password: (from command above)

Tip

If you see a SSL warning, it’s because Let’s Encrypt needs a few minutes to issue the certificate. Wait or accept the self-signed cert temporarily.

Step 2: Explore pre-installed dashboards

Grafana comes with comprehensive dashboards for Kubernetes monitoring.

Key dashboards to explore

Kubernetes / Compute Resources / Cluster

  • Overall cluster CPU and memory usage

  • Node resource utilization

  • Pod count and distribution

Try it:

  1. Open the dashboard

  2. Observe the cluster overview

  3. Notice the time range selector (top right)

Kubernetes / Compute Resources / Namespace (Pods)

  • Select namespace: hello-kup6s (from previous tutorial)

  • View pod CPU and memory usage

  • See network traffic

Kubernetes / Networking / Pod

  • Select pod from hello-kup6s namespace

  • View incoming/outgoing bandwidth

  • See packet drops and errors

Step 3: Query metrics with Prometheus

Access Prometheus

Click Explore (compass icon) in left sidebar, then select Prometheus as data source.

Basic PromQL queries

Try these queries in the query editor:

Node CPU usage:

rate(node_cpu_seconds_total{mode!="idle"}[5m])

Click Run query (or press Shift+Enter)

Pod memory usage:

container_memory_usage_bytes{namespace="hello-kup6s"}

HTTP requests per second (if your app exports metrics):

rate(http_requests_total[5m])

Number of running pods per namespace:

count by (namespace) (kube_pod_info)

Tip

Click Metrics explorer button to browse available metrics instead of typing.

Step 4: Create a custom dashboard

Let’s create a simple dashboard for your hello-kup6s app.

Create new dashboard

  1. Click DashboardsNewNew Dashboard

  2. Click Add visualization

  3. Select Prometheus as data source

Add panel: Pod CPU Usage

In the query editor, enter:

rate(container_cpu_usage_seconds_total{namespace="hello-kup6s"}[5m])

Configure panel:

  1. Panel title: “Hello KUP6S CPU Usage”

  2. Legend: {{pod}}

  3. Click Apply (top right)

Add panel: Pod Memory Usage

Click AddVisualization

Query:

container_memory_usage_bytes{namespace="hello-kup6s"}

Configure:

  1. Panel title: “Hello KUP6S Memory Usage”

  2. Unit: bytes (in Standard options)

  3. Legend: {{pod}}

  4. Click Apply

Save dashboard

  1. Click Save dashboard (disk icon, top right)

  2. Name: “Hello KUP6S Monitoring”

  3. Click Save

Step 5: Query logs with Loki

Now let’s explore your application logs.

Access Loki in Explore

  1. Click Explore (compass icon)

  2. Select Loki as data source (dropdown at top)

Basic LogQL queries

All logs from hello-kup6s namespace:

{namespace="hello-kup6s"}

Click Run query

You’ll see log lines from your pods. Click to expand individual log entries.

Filter for specific pod:

{namespace="hello-kup6s", pod=~"hello-kup6s-.*"}

Search for specific text:

{namespace="hello-kup6s"} |= "GET"

This finds all logs containing “GET” (HTTP GET requests).

Case-insensitive search:

{namespace="hello-kup6s"} |~ `(?i)error`

Finds “error”, “Error”, “ERROR”, etc.

Exclude unwanted logs:

{namespace="hello-kup6s"} != "healthcheck"

Advanced LogQL: Parsing and filtering

Parse JSON logs and filter:

{namespace="hello-kup6s"}
  | json
  | status >= 400

Count error rates:

sum(rate({namespace="hello-kup6s"} |= "error" [5m]))

Top 5 log sources:

topk(5, count_over_time({namespace="hello-kup6s"}[1h]))

Tip

Use the Label browser button to discover available labels without typing.

Step 6: Debug an application issue

Let’s simulate an issue and use monitoring to debug it.

Generate some load

# Install hey for load testing
go install github.com/rakyll/hey@latest

# Generate requests
hey -n 1000 -c 10 https://hello.sites.kup6s.com

Monitor in Grafana

  1. Go to your “Hello KUP6S Monitoring” dashboard

  2. Watch CPU and memory increase

  3. Notice the time it takes to handle requests

Check logs for errors

In Explore (Loki):

{namespace="hello-kup6s"}
  |= "error"
  OR |= "timeout"
  OR |= "failed"

View pod resource usage

kubectl top pods -n hello-kup6s

Compare with Grafana metrics.

Step 7: Set up a basic alert

Let’s create an alert when pod memory exceeds a threshold.

Create alert rule

  1. Go to AlertingAlert rules (bell icon)

  2. Click New alert rule

Configure:

  • Rule name: “High Memory Usage - Hello KUP6S”

  • Query A:

    container_memory_usage_bytes{namespace="hello-kup6s"} > 100000000
    

    (Alert when memory > 100MB)

  • Condition: A last() IS ABOVE 100000000

  • Evaluation interval: 1m

  • For duration: 5m (alert after 5 minutes above threshold)

Configure notification

  1. Folder: Create new: “Hello KUP6S Alerts”

  2. Evaluation group: “Resource Alerts”

  3. Summary: “Pod {{$labels.pod}} memory usage is {{$value}} bytes”

Click Save rule and exit

Test the alert

Scale up your app to trigger alert:

# Edit deployment to request more memory
kubectl edit deployment -n hello-kup6s hello-kup6s
# Change memory limit to 50Mi to trigger alert quickly

Wait 5-6 minutes and check AlertingAlert rules to see if it fires.

Step 8: Explore cluster-wide monitoring

View node metrics

Dashboard: Kubernetes / Compute Resources / Node (Pods)

  • Select a node

  • See CPU, memory, disk I/O

  • Check which pods are consuming resources

Check persistent volumes

Dashboard: Longhorn / Volume

  • See Longhorn volume usage

  • Check replication health

  • Monitor I/O performance

Traefik ingress metrics

Dashboard: Traefik

  • HTTP request rates

  • Response times

  • Status code distribution

  • Backend health

Step 9: Best practices learned

Monitoring strategy

  1. Start with pre-built dashboards - Don’t reinvent the wheel

  2. Create app-specific dashboards - For your own applications

  3. Use labels effectively - namespace, pod, container tags help filter

  4. Set meaningful alerts - Alert on symptoms, not causes

Query tips

Prometheus (PromQL):

  • Use rate() for counters (e.g., requests per second)

  • Use avg_over_time() for gauges (e.g., average memory)

  • Always specify time ranges: [5m], [1h]

Loki (LogQL):

  • Start broad, then filter: {namespace="x"}|= "error"

  • Use regex for patterns: |~ "error|failed|timeout"

  • Parse structured logs: | json or | logfmt

Congratulations! 🎉

You’ve mastered the basics of Kubernetes observability!

What you’ve learned

  • Navigate Grafana and explore dashboards

  • Query metrics with Prometheus and PromQL

  • Search and analyze logs with Loki and LogQL

  • Create custom dashboards and panels

  • Set up basic alerting rules

  • Debug application issues using observability

What’s next?

Deep dive into specific topics:

Add more capabilities:

Understand the architecture:

Troubleshooting

Grafana won’t load

# Check Grafana pod
kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana

# Check logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana

No metrics showing

# Check Prometheus is scraping
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# Open http://localhost:9090/targets

No logs in Loki

# Check Loki pods
kubectl get pods -n monitoring -l app.kubernetes.io/name=loki

# Check Alloy (log shipper)
kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy

Alert not firing

  • Verify query returns data in Explore first

  • Check evaluation interval and “For” duration

  • Look at AlertingAlert rulesState history