High Availability and Scaling
Cluster options
Single-node cluster: A single-node cluster has only one node, called the primary node. This node accepts customer connections and performs read/write operations. It is a single point of truth as well as a single point of failure.
Multi-node cluster: A multi-node cluster consists of a primary node and multiple standby nodes for maximum resilience. In the event of a primary node failure, it promotes a standby node to the primary role. Currently, standby nodes operate in warm standby mode and do not serve read requests. Future roadmap enhancements include hot standby functionality, enabling standby nodes to serve read requests as active replicas.
Database scaling
You can scale existing clusters in two ways:
Horizontal scaling: It is defined as configuring the number of instances that run in parallel.
The number of nodes in a cluster can be increased or decreased.
Increasing the number of instances does not cause disruption. However, decreasing the number of instances may trigger a switchover if the operation removes the current primary node.
Note: Horizontal scaling provides high availability; it does not increase performance.
Vertical scaling: Lets you configure the size of individual instances to handle more data and queries.
You can adjust the number of CPU cores and the amount of memory to match your requirements. Each instance runs on a dedicated node. When you scale up or down, the system creates a new node for each instance.
Once the new node becomes available, the system switches the instance from the old node to the new node and then removes the old node. If the cluster contains multiple nodes, the system performs this process sequentially. It always replaces the standby nodes first, then the primary node, resulting in only one switchover.
When the system performs the switch, it terminates any application connections to the database. The system also aborts all ongoing queries, which causes some disruption. For this reason, you should perform scaling operations outside of peak hours.
You can also increase the storage size. However, you cannot decrease the storage size or change the storage type. The system performs storage increases on the fly without disruption.
Replication modes
The synchronization_mode determines how transactions are replicated between multiple nodes before a transaction is confirmed to the client. IONOS Cloud DBaaS supports two modes of replication:
Asynchronous (default)
Strictly Synchronous
In either mode, the transaction is first committed on the leader and then replicated to the standby node(s).
Note:
We recommend choosing either Asynchronous or Strictly Synchronous, as Synchronous replication mode is deprecated and will be fully removed from the Data Center Designer (DCD) in upcoming releases.
You can update the existing clusters on Synchronous mode to either Asynchronous or Strictly Synchronous using Update PostgreSQL Replication Mode API.
Asynchronous replication
The Asynchronous replication does not wait for the standby before confirming a transaction back to the user. Transactions are confirmed to the client after being written to disk on the primary node. Replication takes place in the background. In asynchronous mode the cluster is allowed to lose some committed (not yet replicated) transactions during a failover to ensure availability.
The benefit of asynchronous replication is the lower latency. The downside is that recent transactions might be lost if standby is promoted to leader. The lag between the leader and standby tends to be a few milliseconds.
Caution: Data loss might happen if the server crashes and the data has not been replicated yet.
Strictly Synchronous replication
The replication mode is the same as synchronous replication with the exception that standalone mode is not permitted. This mode will prevent PostgreSQL from switching off the synchronous replication on the primary when no synchronous standby candidates are available. If no standby is available, no writes will be accepted anymore, so this mode sacrifices availability for replicated durability.
If replication mode is set to synchronous (either strict or non-strict) then data loss cannot occur during failovers; for example, node failures. The benefit of strict replication is that data is not lost in case of a storage failure of the primary node and a simultaneous failure of all standby nodes.
Synchronous replication
Note:
PostgreSQL deprecates
SYNCHRONOUSreplication mode for new clusters.You can update the existing clusters from
SYNCHRONOUSmode to eitherASYNCHRONOUSorSTRICTLY_SYNCHRONOUSusing this API.
It ensures that a transaction is committed to at least one standby before confirming the transaction back to the client. This standby is known as synchronous standby. If the primary node experiences a failure then only a synchronous standby can take over as primary. This ensures that committed transactions are not lost during a failover. If the synchronous standby fails and there is another standby available then the role of the synchronous standby changes to the latter. If no standby is available then the primary can continue in standalone mode. In standalone mode the primary role cannot change until at least one standby has caught up (regained the role of synchronous standby). Latency is generally higher than with asynchronous replication, but no data is lost during a failover.
At any time there will be at most one synchronous standby. If the synchronous standby fails then another healthy standby is automatically selected as the synchronous standby.
Caution: Turning on non-strict synchronous replication does not guarantee multi node durability of commits under all circumstances. When no suitable standby is available, the primary node will still accept writes, but does not guarantee their replication.
Synchronization mode considerations
The synchronization mode can impact DBaaS in several ways:
Aspect
Asynchronous
Strictly Synchronous
Primary failure
A healthy standby will be promoted if the primary node becomes unavailable.
Only standby nodes that contain all confirmed transactions can be promoted.
Standby failure
No effect on primary. Standby catches up once it is back online.
At least one standby must be available to accept write requests. There is a short delay in transaction processing if the synchronous standby changes.
Consistency model
Strongly consistent (except for lost data.)
Strongly consistent (except for lost data.)
Data loss during failover
Non-replicated data is lost.
Not possible.
Data loss during primary storage failure
Non-replicated data is lost.
Not possible.
Latency
Limited by the performance of the primary.
Limited by the performance of the primary, the strictly synchronous standby, and the latency between them (usually below 1ms).
The performance penalty of synchronous over asynchronous replication depends on the workload. The primary handles transactions the same way in all replication modes, except for COMMIT statements (including implicit transactions). When synchronous replication is enabled, the commit can be confirmed to the client only after it is replicated. Thus, there is a constant latency overhead per transaction, independent of its size or duration.
Change the commit guarantees per transaction
By default, the database cluster's replication mode determines the guarantees for a committed transaction. However, some workloads might have very diverse requirements regarding accepted data loss vs performance. To address this need, commit guarantees can be changed on a per-transaction basis. For more information, refer to the PostgreSQL Documentation.
Caution: You cannot enforce a synchronous commit when the cluster is configured to use asynchronous replication. Without a synchronous standby any setting higher than local is equivalent to local, which does not wait for replication to complete. Instead, you can configure your cluster to use synchronous replication and choose synchronous_commit=local whenever data loss is acceptable.
Last updated
Was this helpful?