Backup and Recovery

Backup and Recovery

Backups

PostgreSQL Backups: A cluster can have multiple backups. They are created

  1. When a cluster is created

  2. When the PostgreSQL version is changed to a higher major version

  3. When a Point-In-Time-Recovery operation is conducted.

At any time, Postgres only ships to one backup. We use base backups combined with continuous WAL archiving. A base backup is done via pg_basebackup regularly, and then WAL records are continuously added to the backup. Thus, a backup doesn't represent a point in time but a time range. We keep backups for the last 7 days so recovery is possible for up to one week in the past.

Data is added to the backup in chunks of 16MB or after 30 minutes, whichever comes first. Failures and delays in archiving do not prevent writes to the cluster. If you restore from a backup then only the data that is present in the backup will be restored. This means that you may lose up to the last 30 minutes or 16MB of data if all replicas lose their data at the same time.

You can restore from any backup of any PostgreSQL cluster as long as the backup was created with the same or an older PostgreSQL major version.

Backups are stored encrypted in an IONOS S3 Object Storage bucket in the same region your database is in. Databases in regions without IONOS S3 Object Storage will be backed up to eu-central-2.

Warning: When a database is stopped all transactions since the last WAL segment are written to a (partial) WAL file and shipped to the IONOS S3 Object Storage. This also happens when you delete a database. We provide an additional security timeout of 5 minutes to stop and delete the database gracefully. However, under rare circumstances it could happen that this last WAL Segment is not written to the IONOS S3 Object Storage (e.g. due to errors in the communication with the IONOS S3 Object Storage) and these transactions get lost.

As an additional security mechanism you can check which data has been backed up before deleting the database. To verify which was the last archived WAL segment and at what time it was written you can connect to the database and get information from the pg_stat_archiver.

SELECT now(); # verify server time
SELECT * FROM pg_stat_archiver; # get information about last archival wal and time

The `last_archived_time might be older than 30 minutes (WAL files are created with a specific timeout, see above) which is normal if there is no new data added.

Recovery

We provide Point-in-Time-Recovery (PITR). When recovering from a backup, the user chooses a specific backup and provides a time (optional), so that the new cluster will have all the data from the old cluster up until that time (exclusively). If the time was not provided, the current time will be used.

It is possible to set the recoveryTargetTime to a time in the future. If the end of the backup is reached before the recovery target time is met then the recovery will complete with the latest data available.

Note: WAL records shipping is a continuous process and the backup is continuously catching up with the workload. Should you require that all the data from the old cluster is completely available in the new cluster, stop the workload before recovery.

Failover and Upgrade

Failover procedures

Planned failover: During a failure or planned failover, the client must reconnect to the database. A planned failover is signaled to the client by the closing of the TCP connection on the server. The client must also close the connection and reconnect.

In the event of a failure, the connection might not be closed correctly. The new leader will send a gratuitous ARP packet to update the MAC address in the client's ARP table. Open TCP connections will be reset once the client sends a TCP packet. We recommend re-establishing a connection to the database by using an exponential back-off retry with an initial immediate retry.

Uncontrolled disconnection: Since we do not allow read connections to standby nodes, only primary disconnections are possible. However, uncontrolled disconnections can happen during maintenance windows, a cluster change, and during unexpected situations such as loss of storage disk space. Such disconnections are destructive for the ongoing transactions and also clients should reconnect.

If a node is disconnected from the cluster, then a new node will be created and provisioned. Losing a primary node leads to the same situation when a client should reconnect. Losing a replica is not noticeable to the customer.

PostgreSQL upgrades

IONOS Cloud updates and patches your database cluster to achieve high standards of functionality and security. This includes minor patches for PostgreSQL, as well as patches for the underlying OS. We try to make these updates unnoticeable to your operation. However, occasionally, we might have to restart your PostgreSQL instance to allow the changes to take effect. These interruptions will only occur during the maintenance window for your database, which is a weekly four-hour window.

When your cluster only contains one replica you might experience a short down-time during this maintenance window, while your database instance is being updated. In a replicated cluster, we only update standbys, but we might perform a switchover in order to change the leader node.

Considerations: Updates to a new minor version are always backward compatible. Such updates are done during the maintenance window with no additional actions from the user side.

Major Version Upgrades

Caution: Major changes of the PostgreSQL version are irreversible and can fail. You should read the official migration guide and test major version upgrades with an appropriate development cluster first.

Prerequisites:

  • Read the migration guide from Postgres (e.g. to version 13) and make sure your database cluster can be upgraded

  • Test the upgrade on development cluster with similar / the same data (you can create a new database cluster as a clone of your existing cluster)

  • Prepare for a downtime during the major version upgrade

  • Ensure the database cluster has enough available storage. While the upgrade is space-efficient (i.e. it does not copy the data directory), some temporary data is written to disk.

Before upgrading PostgreSQL major versions, customers should be aware that IONOS Cloud is not responsible for customer data or any utilized postgreSQL functionality. Hence, it is the responsibility of the customer to ensure that the migration to a new PostgreSQL major version does not impact their operations.

As per PostgreSQL official documentation: "New major versions also typically introduce some user-visible incompatibilities, so application programming changes might be required."

Supported Versions

Starting with version 10, PostgreSQL moved to a yearly release schedule, where each major version is supported for 5 years after initial release. You can find more details at https://www.postgresql.org/support/versioning/. We strive to support new versions as soon as possible.

When a major version approaches its end of life (EOL), we will announce the deprecation and removal of the version at least 3 months in advance. About 1 month before the EOL, no new database can be created with the deprecated version (the exact date will be part of the first announcement). When the EOL is reached, not yet upgraded databases will be upgraded in their next maintenance window.

Last updated

Revision created

Minor update from the comments