How-To
Troubleshoot CNPG Issues¶
Common CloudNativePG troubleshooting scenarios and solutions.
Operator Issues¶
Operator Pod Not Starting¶
Check operator pod status:
kubectl get pods -n cnpg-system
kubectl describe pod <operator-pod> -n cnpg-system
kubectl logs <operator-pod> -n cnpg-system
Common causes:
Image pull errors (check image tag)
Resource constraints (check node resources)
Invalid configuration (check HelmChart values)
ArgoCD Application OutOfSync¶
Check ArgoCD sync status:
kubectl get application cnpg-app-* -n argocd
kubectl describe application cnpg-app-* -n argocd
Manual sync:
argocd app sync cnpg-app-* -n argocd
Cluster Issues¶
Cluster Stuck in “Creating” State¶
Check cluster events:
kubectl describe cluster <cluster-name> -n <namespace>
Check pod status:
kubectl get pods -n <namespace> -l cnpg.io/cluster=<cluster-name>
kubectl logs <pod-name> -n <namespace>
Common causes:
Storage provisioning issues (check PVC status)
Init container failures (check initdb logs)
Resource constraints (check resource requests)
Primary Pod CrashLooping¶
Check pod logs:
kubectl logs <primary-pod> -n <namespace>
kubectl logs <primary-pod> -n <namespace> --previous
Check PostgreSQL logs inside pod:
kubectl exec -it <primary-pod> -n <namespace> -- \
tail -100 /controller/log/postgres.log
Common causes:
Disk space exhaustion
Configuration errors
Corrupted data directory
Failover Not Working¶
Check cluster status:
kubectl get cluster <cluster-name> -n <namespace> -o yaml
Force manual failover:
kubectl cnpg promote <cluster-name> <instance-number> -n <namespace>
Backup Issues¶
Backup Job Failing¶
Check backup job logs:
kubectl get backup -n <namespace>
kubectl describe backup <backup-name> -n <namespace>
kubectl logs -n <namespace> -l job-name=<backup-job-name>
Test S3 connectivity:
kubectl exec -it <pod-name> -n <namespace> -- \
barman-cloud-backup-list s3://bucket-name/
Common causes:
Invalid S3 credentials
Network connectivity issues
Insufficient S3 permissions
Disk space exhaustion
WAL Archiving Failed¶
Check WAL archive status:
kubectl exec -it <primary-pod> -n <namespace> -- \
psql -U postgres -c "SELECT * FROM pg_stat_archiver;"
Check failed WAL files:
kubectl exec -it <primary-pod> -n <namespace> -- \
ls -la /controller/log/pg_wal/archive_status/
Connection Issues¶
Cannot Connect to Database¶
Verify service exists:
kubectl get svc -n <namespace> -l cnpg.io/cluster=<cluster-name>
Test connection from pod:
kubectl run -it --rm psql --image=postgres:16 --restart=Never -- \
psql "postgresql://user:pass@service-name.namespace.svc:5432/dbname"
Check PostgreSQL logs:
kubectl logs <primary-pod> -n <namespace>
Common causes:
Wrong connection string
Invalid credentials
pg_hba.conf restrictions
Network policies blocking access
Pooler Not Working¶
Check pooler status:
kubectl get pooler <pooler-name> -n <namespace>
kubectl get pods -n <namespace> -l cnpg.io/pooler=<pooler-name>
kubectl logs -n <namespace> -l cnpg.io/pooler=<pooler-name>
Performance Issues¶
High CPU Usage¶
Check PostgreSQL queries:
kubectl exec -it <primary-pod> -n <namespace> -- \
psql -U postgres -c "SELECT * FROM pg_stat_activity ORDER BY state_change;"
Check slow queries:
kubectl exec -it <primary-pod> -n <namespace> -- \
psql -U postgres -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"
High Memory Usage¶
Check PostgreSQL memory settings:
kubectl exec -it <primary-pod> -n <namespace> -- \
psql -U postgres -c "SHOW shared_buffers; SHOW work_mem; SHOW maintenance_work_mem;"
Adjust in Cluster spec:
spec:
postgresql:
parameters:
shared_buffers: "512MB"
work_mem: "16MB"
Slow Queries¶
Enable query logging:
spec:
postgresql:
parameters:
log_min_duration_statement: "1000" # Log queries > 1 second
Getting Help¶
If issues persist:
Review GitHub Issues
Gather diagnostic information:
kubectl cnpg status <cluster-name> -n <namespace>
kubectl describe cluster <cluster-name> -n <namespace>
kubectl logs -n <namespace> -l cnpg.io/cluster=<cluster-name>