How-To

Troubleshoot CNPG Issues¶

Common CloudNativePG troubleshooting scenarios and solutions.

Operator Issues¶

Operator Pod Not Starting¶

Check operator pod status:

kubectl get pods -n cnpg-system
kubectl describe pod <operator-pod> -n cnpg-system
kubectl logs <operator-pod> -n cnpg-system

Common causes:

Image pull errors (check image tag)
Resource constraints (check node resources)
Invalid configuration (check HelmChart values)

ArgoCD Application OutOfSync¶

Check ArgoCD sync status:

kubectl get application cnpg-app-* -n argocd
kubectl describe application cnpg-app-* -n argocd

Manual sync:

argocd app sync cnpg-app-* -n argocd

Cluster Issues¶

Cluster Stuck in “Creating” State¶

Check cluster events:

kubectl describe cluster <cluster-name> -n <namespace>

Check pod status:

kubectl get pods -n <namespace> -l cnpg.io/cluster=<cluster-name>
kubectl logs <pod-name> -n <namespace>

Common causes:

Storage provisioning issues (check PVC status)
Init container failures (check initdb logs)
Resource constraints (check resource requests)

Primary Pod CrashLooping¶

Check pod logs:

kubectl logs <primary-pod> -n <namespace>
kubectl logs <primary-pod> -n <namespace> --previous

Check PostgreSQL logs inside pod:

kubectl exec -it <primary-pod> -n <namespace> -- \
  tail -100 /controller/log/postgres.log

Common causes:

Disk space exhaustion
Configuration errors
Corrupted data directory

Failover Not Working¶

Check cluster status:

kubectl get cluster <cluster-name> -n <namespace> -o yaml

Force manual failover:

kubectl cnpg promote <cluster-name> <instance-number> -n <namespace>

Backup Issues¶

Backup Job Failing¶

Check backup job logs:

kubectl get backup -n <namespace>
kubectl describe backup <backup-name> -n <namespace>
kubectl logs -n <namespace> -l job-name=<backup-job-name>

Test S3 connectivity:

kubectl exec -it <pod-name> -n <namespace> -- \
  barman-cloud-backup-list s3://bucket-name/

Common causes:

Invalid S3 credentials
Network connectivity issues
Insufficient S3 permissions
Disk space exhaustion

WAL Archiving Failed¶

Check WAL archive status:

kubectl exec -it <primary-pod> -n <namespace> -- \
  psql -U postgres -c "SELECT * FROM pg_stat_archiver;"

Check failed WAL files:

kubectl exec -it <primary-pod> -n <namespace> -- \
  ls -la /controller/log/pg_wal/archive_status/

Connection Issues¶

Cannot Connect to Database¶

Verify service exists:

kubectl get svc -n <namespace> -l cnpg.io/cluster=<cluster-name>

Test connection from pod:

kubectl run -it --rm psql --image=postgres:16 --restart=Never -- \
  psql "postgresql://user:pass@service-name.namespace.svc:5432/dbname"

Check PostgreSQL logs:

kubectl logs <primary-pod> -n <namespace>

Common causes:

Wrong connection string
Invalid credentials
pg_hba.conf restrictions
Network policies blocking access

Pooler Not Working¶

Check pooler status:

kubectl get pooler <pooler-name> -n <namespace>
kubectl get pods -n <namespace> -l cnpg.io/pooler=<pooler-name>
kubectl logs -n <namespace> -l cnpg.io/pooler=<pooler-name>

Performance Issues¶

High CPU Usage¶

Check PostgreSQL queries:

kubectl exec -it <primary-pod> -n <namespace> -- \
  psql -U postgres -c "SELECT * FROM pg_stat_activity ORDER BY state_change;"

Check slow queries:

kubectl exec -it <primary-pod> -n <namespace> -- \
  psql -U postgres -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"

High Memory Usage¶

Check PostgreSQL memory settings:

kubectl exec -it <primary-pod> -n <namespace> -- \
  psql -U postgres -c "SHOW shared_buffers; SHOW work_mem; SHOW maintenance_work_mem;"

Adjust in Cluster spec:

spec:
  postgresql:
    parameters:
      shared_buffers: "512MB"
      work_mem: "16MB"

Slow Queries¶

Enable query logging:

spec:
  postgresql:
    parameters:
      log_min_duration_statement: "1000"  # Log queries > 1 second

Getting Help¶

If issues persist:

Check CloudNativePG Documentation
Review GitHub Issues
Gather diagnostic information:

kubectl cnpg status <cluster-name> -n <namespace>
kubectl describe cluster <cluster-name> -n <namespace>
kubectl logs -n <namespace> -l cnpg.io/cluster=<cluster-name>

Troubleshoot CNPG Issues¶

Operator Issues¶

Operator Pod Not Starting¶

ArgoCD Application OutOfSync¶

Cluster Issues¶

Cluster Stuck in “Creating” State¶

Primary Pod CrashLooping¶

Failover Not Working¶

Backup Issues¶

Backup Job Failing¶

WAL Archiving Failed¶

Connection Issues¶

Cannot Connect to Database¶

Pooler Not Working¶

Performance Issues¶

High CPU Usage¶

High Memory Usage¶

Slow Queries¶

Getting Help¶

Next Steps¶