> For the complete documentation index, see [llms.txt](https://docs.ionos.com/cloud/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.ionos.com/cloud/tutorials/databases/postgresql/monitor-postgresql-databases.md).

# Monitor PostgreSQL v2 Clusters using Grafana

**Level:** Intermediate | **Estimated completion time:** 20 minutes

## Overview

This tutorial demonstrates how to configure observability for [<mark style="color:blue;">PostgreSQL v2</mark>](https://docs.ionos.com/cloud/databases/postgresql/api/v2-api) clusters on <code class="expression">space.vars.ionos\_cloud</code> using the [<mark style="color:blue;">Logging Service</mark>](https://docs.ionos.com/cloud/observability/logging-service) and [<mark style="color:blue;">Monitoring Service</mark>](https://docs.ionos.com/cloud/observability/monitoring-service). Once both services are activated at the contract level and the cluster level, metrics and logs stream to a regional Grafana endpoint where you can view pre-built dashboards and configure alert thresholds.

PostgreSQL v2 on <code class="expression">space.vars.ionos\_cloud</code> provides a set of observable metrics that describe the following:

* Cluster health and availability, including primary host status, active instance count, and storage usage across `DBDATA` and `BACKUP` volumes.
* Node-level resource consumption, including CPU time across idle, user, system, and I/O wait states, system load averages, memory availability, disk throughput, IOPS per device, I/O saturation ratios, and filesystem capacity per mount point.
* Network telemetry and log data, including inbound and outbound byte totals per adapter, log line counts, error rates, critical event rates, and per-instance log volume with live filtering by cluster ID in Grafana.

PostgreSQL v2 metrics provide actionable insight into cluster performance, enabling reliable operations, faster incident resolution, and data-driven capacity planning for production deployments.

## Metrics for PostgreSQL v2

The following metrics are available to gauge the health of PostgreSQL v2 clusters.

<details>

<summary><strong>CPU and System Load</strong></summary>

| **Metric**                   | **Description**                                                                                                                                                                                                                                                                    |
| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`node_cpu_seconds_total`** | Cumulative CPU time spent in each mode (idle, user, system, I/O wait) per core. Apply `rate()` to convert to per-second usage. Monitor this metric to detect storage bottlenecks that degrade query execution. Alert if I/O wait exceeds 1%.                                       |
| **`node_load1`**             | Average number of runnable or uninterruptible processes over the last minute. Reflects immediate CPU demand on the node. Monitor this metric to detect contention that delays query processing. A sustained value above the number of available CPU cores indicates load pressure. |
| **`node_load5`**             | Average number of runnable or uninterruptible processes over the last 5 minutes. Smooths short-lived spikes to confirm whether a `node_load1` spike represents a genuine load trend. Monitor this metric to distinguish transient spikes from sustained pressure.                  |
| **`node_load15`**            | Average number of runnable or uninterruptible processes over the last 15 minutes. Serves as a long-term trend indicator. Monitor this metric to assess sustained workload growth and plan capacity adjustments for the cluster.                                                    |
| **`node_boot_time_seconds`** | Unix timestamp recording when the node last started. Subtract this value from the current time (`time() - node_boot_time_seconds`) to derive node uptime. Monitor this metric to detect unexpected restarts or failover events in the cluster.                                     |

</details>

<details>

<summary><strong>Memory</strong></summary>

| **Metric**                           | **Description**                                                                                                                                                                                                                                          |
| ------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`node_memory_MemAvailable_bytes`** | Estimated memory available for new allocations without swapping, accounting for reclaimable memory and buffers. Monitor this metric to detect memory pressure that can slow queries. Alert if the value falls below 10% of `node_memory_MemTotal_bytes`. |
| **`node_memory_MemTotal_bytes`**     | Total physical memory installed on the node. Use the value as the denominator when calculating the memory utilisation ratio (`1 - MemAvailable_bytes / MemTotal_bytes`).                                                                                 |

</details>

<details>

<summary><strong>Disk I/O</strong></summary>

| **Metric**                             | **Description**                                                                                                                                                                                                                                                                                 |
| -------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`node_disk_io_time_seconds_total`**  | Cumulative time the disk device was actively processing I/O requests. Apply `rate()` to obtain a utilisation ratio between 0 and 1. Monitor this metric to detect I/O saturation. A ratio above 0.9 indicates saturation that impacts write-ahead log flushing and checkpoint completion times. |
| **`node_disk_read_bytes_total`**       | Cumulative bytes read from the disk device since node start. Apply `rate()` to derive read throughput in bytes per second. Monitor this metric to detect sequential scans or index rebuilds on the data volume.                                                                                 |
| **`node_disk_written_bytes_total`**    | Cumulative bytes written to the disk device since node start. Apply `rate()` to derive write throughput in bytes per second. Monitor this metric to track transaction commit rates, checkpoint activity, and WAL archiving.                                                                     |
| **`node_disk_reads_completed_total`**  | Cumulative read operations successfully completed by the disk device. Apply `rate()` to compute read IOPS. Monitor this metric to detect frequent buffer misses in shared memory that force data retrieval from disk.                                                                           |
| **`node_disk_writes_completed_total`** | Cumulative write operations successfully completed by the disk device. Apply `rate()` to compute write IOPS. Monitor this metric to identify spikes from transaction commits, `autovacuum` runs, or bulk load operations.                                                                       |

</details>

<details>

<summary><strong>Filesystem</strong></summary>

| **Metric**                        | **Description**                                                                                                                                                                                                                         |
| --------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`node_filesystem_avail_bytes`** | Bytes available to unprivileged processes on the filesystem, monitored separately for the DBDATA and BACKUP volumes. Monitor this metric to prevent write failures. Alert if the value falls below 15% of `node_filesystem_size_bytes`. |
| **`node_filesystem_size_bytes`**  | Total capacity of the filesystem in bytes. Use the value as the denominator for the capacity utilisation ratio (`1 - avail_bytes / size_bytes`). Tracked across DBDATA and BACKUP mount points to support storage sizing decisions.     |

</details>

<details>

<summary><strong>Network</strong></summary>

| **Metric**                              | **Description**                                                                                                                                                                                                                                              |
| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **`node_network_receive_bytes_total`**  | Cumulative bytes received on the network adapter since node start. Apply `rate()` to derive inbound throughput in bytes per second. Monitor this metric to detect spikes in client connections, replication traffic, or backup data ingestion.               |
| **`node_network_transmit_bytes_total`** | Cumulative bytes transmitted on the network adapter since node start. Apply `rate()` to derive outbound throughput in bytes per second. Monitor this metric to track large query result sets, logical replication output, or WAL streaming to standby nodes. |

</details>

## Recommended alert thresholds

Configure the following alert rules in Grafana to detect the most common PostgreSQL v2 failure modes early:

| **Alert**             | **Threshold**                                           | **What it indicates**                                 |
| --------------------- | ------------------------------------------------------- | ----------------------------------------------------- |
| CPU I/O wait          | Above `1%` (warning), above `2%` (critical)             | Storage bottleneck degrading query execution          |
| Available memory      | Below `10%` of `node_memory_MemTotal_bytes`             | Memory pressure that can slow queries                 |
| Filesystem free space | Below `15%` of `node_filesystem_size_bytes`             | Risk of write failures on DBDATA or BACKUP volumes    |
| Disk I/O saturation   | `rate(node_disk_io_time_seconds_total[5m])` above `0.9` | I/O saturation impacting WAL flushing and checkpoints |
| System load           | `node_load1` above the CPU core count                   | Sustained load pressure delaying query processing     |

Example `PromQL` for the available memory alert:

```promql
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 < 10
```

Example `PromQL` for read IOPS:

```promql
rate(node_disk_reads_completed_total[5m])
```

## Target audience

This tutorial targets database administrators, DevOps engineers, and platform engineers who manage PostgreSQL workloads on <code class="expression">space.vars.ionos\_cloud</code>. Readers benefit from prior experience with REST APIs, Bearer token authentication, and basic `PromQL` or Grafana dashboard concepts.

## What you will learn

By following this tutorial, you will learn how to:

* Activate observability services and allow log and metric collection on a PostgreSQL v2 cluster.
* Access the regional Grafana endpoint and verify that logs and metrics appear correctly.
* Apply recommended alert thresholds for CPU, memory, disk, and I/O metrics.

## Prerequisites

Ensure you have:

* An active <code class="expression">space.vars.ionos\_cloud</code> contract with permissions to manage observability and database resources.
* A valid <code class="expression">space.vars.ionos\_cloud</code> API token exported as `IONOS_TOKEN`.
* A PostgreSQL v2 cluster, or permission to create one. For more information, see [<mark style="color:blue;">PostgreSQL v2 API specification</mark>](https://api.ionos.com/docs/postgresql/v2.ea).
* `curl` or any HTTP client for API calls.
* A web browser to use Grafana.

## Cost considerations

The following resources outlined in this tutorial are billable and will incur costs when used:

* **Logging and Monitoring services:** Charges apply per region after activation.
* **PostgreSQL v2 clusters:** Compute, storage, and backup charges apply independent of observability settings.

{% hint style="info" %}
**Note:** Delete clusters and make observability services inactive if not in use to avoid ongoing charges.
{% endhint %}

## Procedure

The activation flow has two independent layers. You activate the Logging and Monitoring services once per region at the contract level, and you toggle log and metric collection per cluster. Both layers must show `true` before data appears in Grafana.

Use the following procedure to configure observability for a PostgreSQL v2 cluster:

{% stepper %}
{% step %}

### Activate the Logging service for a region

Send a request to the regional Logging API endpoint with `enabled` set to `true`. The Logging service activates for the contract in that region.

Refer to the full schema in the [<mark style="color:blue;">Logging API v1 documentation</mark>](https://api.ionos.com/docs/logging/v1/).

{% hint style="info" %}
**Note:** You can also allow Central Logging through the DCD. For more information, see [<mark style="color:blue;">Send Logs to the Platform</mark>](https://docs.ionos.com/cloud/observability/logging-service/quick-start/send-logs-to-platform).
{% endhint %}

```bash
curl --request PUT \
  --url 'https://logging.de-fra.ionos.com/central' \
  --header 'Authorization: Bearer $IONOS_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "properties": {
      "enabled": true
    }
  }'
```

The response returns a resource that includes the `grafanaEndpoint` URL and the `enabled: true` property. The `grafanaEndpoint` value is identical for all customers in that region.
{% endstep %}

{% step %}

### Activate the Monitoring service for the same region

Repeat the activation against the Monitoring API endpoint. The Monitoring service activates independently from Logging.

Refer to the full schema in the [<mark style="color:blue;">Monitoring API v1 documentation</mark>](https://api.ionos.com/docs/observability-monitoring/v1/).

```bash
curl --request PUT \
  --url 'https://monitoring.de-fra.ionos.com/central' \
  --header 'Authorization: Bearer $IONOS_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "properties": {
      "enabled": true
    }
  }'
```

{% hint style="info" %}
**Note:**

* Activation is regional. You can activate Logging and Monitoring in every region where PostgreSQL clusters run. The `grafanaEndpoint` returned by both services resolves to the same Grafana instance per region, for example `https://grafana.logging.fr-par.ionos.com`.
* You can perform activation using the DCD. To do so, refer to the full schema in the [<mark style="color:blue;">Define logging pipeline properties</mark>](https://docs.ionos.com/cloud/observability/logging-service/dcd-how-tos/create-logging-service-pipeline#define-logging-pipeline-properties) and [<mark style="color:blue;">Send Metrics to the Platform</mark>](https://docs.ionos.com/cloud/observability/monitoring-service/quick-start/send-metrics-to-platform).
  {% endhint %}
  {% endstep %}

{% step %}

### Allow logs and metrics on a PostgreSQL cluster

Set `logsEnabled` and `metricsEnabled` to `true` in the cluster `properties` block. Apply the setting in one of two ways:

* `POST /clusters` to create a new cluster with observability enabled.
* `PUT /clusters/{clusterId}` to update an existing cluster.

There is no separate activation endpoint for the cluster-level toggles.

```json
{
  "properties": {
    "displayName": "javier-is-testing",
    "postgresVersion": "18",
    "instances": 1,
    "cores": 2,
    "ram": 18,
    "storageSize": 32,
    "storageType": "SSD",
    "connections": [
      {
        "datacenterId": "6d2ce7d4-dea2-4ba9-872e-b05329b1041a",
        "lanId": "2",
        "cidr": "10.3.0.18/24"
      }
    ],
    "location": "de/fra",
    "credentials": {
      "username": "mydbuser",
      "password": "<your-password>"
    },
    "maintenanceWindow": {
      "dayOfTheWeek": "Tuesday",
      "time": "13:37:00"
    },
    "synchronizationMode": "ASYNCHRONOUS",
    "connectionPooler": {
      "enabled": true,
      "poolMode": "TRANSACTION"
    },
    "backupLocation": "us",
    "logsEnabled": true,
    "metricsEnabled": true
  }
}
```

Each cluster manages logs and metrics independently. Use it to keep test clusters quiet while production clusters stream full telemetry.

{% hint style="warning" %}
**Warning:**

* If you set `logsEnabled` or `metricsEnabled` to `true` while the contract-level Logging or Monitoring service is inactive, the API accepts the request without error.
* No data flows to Grafana until you activate the corresponding service in that region. The design prevents activation order from blocking cluster creation.
  {% endhint %}
  {% endstep %}

{% step %}

### Verify the activation matrix

Both layers must report `true` for data to appear in Grafana. The following matrix summarizes every combination.

| **Logging or monitoring service enabled** | **PostgreSQL cluster logs or metrics enabled** | **Visibility result**         |
| ----------------------------------------- | ---------------------------------------------- | ----------------------------- |
| `true`                                    | `true`                                         | Logs visible, metrics visible |
| `true`                                    | `false`                                        | Not visible                   |
| `false`                                   | `true`                                         | Not visible                   |
| `false`                                   | `false`                                        | Not visible                   |
| {% endstep %}                             |                                                |                               |

{% step %}

### Access the Grafana dashboard

1\. Open the regional Grafana URL returned by the activation response. The URL follows the pattern `https://grafana.logging.{region}.ionos.com`, for example:

```
https://grafana.logging.fr-par.ionos.com
```

2\. Sign in with your <code class="expression">space.vars.ionos\_cloud</code> contract user credentials. Administrator users and users explicitly granted access to the observability product can sign in.

{% hint style="info" %}
**Note:** If you have not previously used the observability product, you cannot sign in to Grafana until your permissions propagate.
{% endhint %}

The following default dashboards will be available after you sign in:

**PostgreSQL Logs** shows total log lines, error counts, `FATAL` counts, warning counts, instances sending logs, log volume over time per instance, error rate over time, `FATAL` rate over time, and a live log stream filtered by cluster ID.

![PostgreSQL Logs dashboard in Grafana](/files/4be0K4mA3SKuHIyGwbiI)

**Cluster Metrics** shows cluster info and availability, master host status, disk space for DBDATA and BACKUP volumes, CPU usage, CPU load, memory usage, and data disk bandwidth.

![Cluster Metrics dashboard in Grafana](/files/s6yBQd4p9IdTRIEIOCch)

**Main Metrics Dashboard** shows CPU usage and CPU usage by mode, memory usage with total, used, and available breakdowns, disk I/O throughput, and filesystem usage per mount.

![Main Metrics Dashboard in Grafana](/files/m28qzxzhkopUGwsLkQAI)

Each dashboard supports filters for cluster ID, instance, log level, and maintenance windows.
{% endstep %}

{% step %}

### Review the curated metric catalog

For the full list of node-level metrics grouped by category, see [<mark style="color:blue;">Metrics for PostgreSQL v2</mark>](#metrics-for-postgresql-v2).
{% endstep %}

{% step %}

### Apply recommended alert thresholds

Configure alert rules in Grafana to catch the most common PostgreSQL failure modes. For threshold values and `PromQL` expressions, see [<mark style="color:blue;">Recommended alert thresholds</mark>](#recommended-alert-thresholds).
{% endstep %}

{% step %}

### Final result

The metrics and logs for PostgreSQL v2 clusters are successfully available in the Grafana dashboard. For details on the metrics available, see [<mark style="color:blue;">Metrics for PostgreSQL v2</mark>](#metrics-for-postgresql-v2).
{% endstep %}
{% endstepper %}

## Conclusion

You configured end-to-end observability for a PostgreSQL v2 cluster on <code class="expression">space.vars.ionos\_cloud</code>. The two-layer activation model gives you contract-level control over billing and per-cluster control over telemetry granularity. The curated metric catalog and recommended thresholds provide a starting point for production-grade alerting.

## Next steps

* Extend the default dashboards with custom `PromQL` panels for query latency and connection pool saturation.
* Explore the [<mark style="color:blue;">PostgreSQL v2 API reference</mark>](https://api.ionos.com/docs/postgresql/v2.ea) for additional cluster properties such as connection pooler tuning and maintenance windows.

## Troubleshooting

### No data appears in Grafana after activation

Both layers must report `true` before data flows. Verify that the contract-level Logging and Monitoring services are active for the region and that the cluster `properties` block shows `logsEnabled: true` and `metricsEnabled: true`. If either layer is `false`, no data flows regardless of the other layer's state.

### API accepts `logsEnabled: true` but no data flows

It is the expected silent no-op behaviour. If you set `logsEnabled` or `metricsEnabled` to `true` while the contract-level service is inactive, the API accepts the request without error. Activate the corresponding Logging or Monitoring service for the region first, then verify that the cluster property is still set to `true`.

### Grafana sign-in fails or shows a permission error

Only administrator users and users who have been explicitly granted access to the observability product can sign in. If you have never previously interacted with the observability product, your permissions may not have propagated yet. Wait a few minutes and try again, or ask your contract administrator to confirm that observability access has been granted.

### Activation response does not include a `grafanaEndpoint`

A missing `grafanaEndpoint` in the activation response indicates that the request body was malformed or that the service was not activated successfully. Confirm that the request body includes `"enabled": true` inside a valid `properties` object and that the `Content-Type: application/json` header is present. Re-send the request and check the response for an `enabled: true` field alongside the endpoint URL.

## Related topics

* [<mark style="color:blue;">Monitor MongoDB Databases</mark>](/cloud/tutorials/databases/mongodb/monitor-mongodb-databases.md): Parallel tutorial for configuring Grafana observability on MongoDB Enterprise clusters.
* [<mark style="color:blue;">Logging Service</mark>](https://docs.ionos.com/cloud/observability/logging-service): Overview of the Logging Service, including pipeline creation, regional activation, and log storage.
* [<mark style="color:blue;">Monitoring Service</mark>](https://docs.ionos.com/cloud/observability/monitoring-service): Overview of the Monitoring Service, including metric collection, regional activation, and Grafana integration.
* [<mark style="color:blue;">PostgreSQL v2 API reference</mark>](https://api.ionos.com/docs/postgresql/v2.ea): Full API specification for cluster creation, updates, and observability property configuration.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.ionos.com/cloud/tutorials/databases/postgresql/monitor-postgresql-databases.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.