Postgres

Prerequisites

template-operator is installed
canary-checker is installed
An S3 compatible object store is available to store logical backups of Postgres Cluster

Deploy¶

Postgres databases can be deployed using the Zalando Postgres Operator

karina.yml

postgresOperator:
  version: v1.3.4.flanksource.1
  backupPassword: passwordForResticBackup # Optional but can't be changed once the operator is deployed
  defaultBackupRetention: # Optionally specify the retention of the backup, this can be overridden in db.flanksource.com/v1/PostgresqlDB CR, see example below
    keepHourly: 24
    keepDaily: 7
    keepWeekly: 4
    keepMonthly: 6
    keepYearly: 1
templateOperator:
  version: v0.1.11
canaryChecker:
  version: v0.15.1

## Below are optional configurations:

# S3 connection information (to store logical backups of all Postgres Clusters) 
s3:
  access_key: minio
  secret_key: minio123
  endpoint: http://minio.minio.svc:9000

# Only applicable if you want to use MinIO as S3 Object Storage
minio:
  version: RELEASE.2020-09-02T18-19-50Z
  access_key: minio
  secret_key: minio123
  replicas: 1

Deploying using :

karina deploy postgres-operator -c karina.yml

A CRD called PostgresqlDB is used as a wrapper around the core zalando objects

Once the operator is deployed, you can create a new database

db.yml

apiVersion: db.flanksource.com/v1
kind: PostgresqlDB
metadata:
  name: db
  namespace: postgres-operator
spec:
  backup:
    bucket: postgres-backups
    schedule: "0 */4 * * *"
    retention: # Optionally specify the backup retention for this cluster, this will override the backup retention defined in operator configuration above
      keepHourly: 10
      keepDaily: 3
      keepWeekly: 2
      keepMonthly: 1
      keepYearly: 1
  cpu: 4000m
  memory: 8Gi
  replicas: 3
  storage:
    size: 200Gi
  parameters:
    archive_mode: "on"
    archive_timeout: 60s
    log_destination: stderr
    max_connections: "600"
    shared_buffers: 2048MB

kubectl apply -f db.yml

The template operator will pick up the new db.flanksource.com/v1 object and create underlying Zalando Postgres objects, CronJobs for backups and 2 Canary's - 1 for the backup freshness and another for connecting to the postgres instance

Day 2 Tasks¶

Failover¶

Clone¶

This command will create a new database cluster restored from WAL backup of another cluster.

karina db clone --name test1-clone --clone-cluster-name postgres-test1 --clone-timestamp "2020-04-05 14:01:00 UTC"

See karina db clone documentation for all command line arguments.

Backup¶

This command will perform a logical backup of the given cluster.

# Run backup once
karina db backup --name test1
# Deploy a cron job to run a backup every day at 04:00 AM
karina db backup --name test1 --schedule "0 4 * * *"

See karina db backup documentation for all command line arguments.

Restore¶

This command will list all backups of a given cluster

karina db backup list --name test1

This command will restore a given cluster from a previous logical backup

karina db restore "/backup-file-path.sql" --name test1

You can also restore a given cluster from a logical backup located in a different backup

karina db restore bucket-name "/backup-file-name.sql" --name test1

See karina db restore documentation for all command line arguments.

Port Forwarding¶

Retrieve the password

kubectl get secret postgres.postgres-{DB-NAME}.credentials -o json -n postgres-operator | jq -r '.data.password' | base64 -D

Port forward the DB port

kubectl port-forward  po postgres-{DB-NAME}-0 5432 -n postgres-operator

Connect to the database via localhost:5432

Disaster recovery¶

In the event that the patroni cluster fails completely, the following steps can be followed to reset the cluster state and recover from backup:

Scale down the cluster:

```shell script kubectl scale statefulset broken-cluster --replicas=0

Delete the existing persistent volume claims:

```shell script
# get list of pvcs
kubectl get pvc | grep broken-cluster
# run the delete command for each
kubectl delete pvc pgdata-broken-cluster-0

Create new persistent volume claims:

```shell script cat <<EOF | kubectl apply -f - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pgdata-broken-cluster-0 namespace: postgres-operator spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi storageClassName: vsan volumeMode: Filesystem EOF

Scale the replicas back up

```shell script
kubectl scale statefulset broken-cluster --replicas=n

at this point, if both pods report in the logs that leader: none, delete all the endpoints related to the cluster:

shell script kubectl get endpoints | grep broken-cluster

Once the cluster reports a leader and replicas, a database restore can be run (See karina db restore)

Useful utilities¶

pg_ctl stop should be used to stop the postgres server if required. This can only be done after sv stop patroni has been run. pg_resetwal can be used recover from corrupted/missing WAL files