1 of 6

Overview

Monitoring Service is a cloud-based service that allows you to ingest, aggregate, and analyze data to enhance your understanding of your system's performance and behavior. The service collects data from various parts of your environment into a central system that is responsible for storage, aggregation, visualization, and initiating automated responses and alert when certain conditions are met.

Components of Monitoring Service

The architecture of the Monitoring Service includes the following four main components that can be used to aggregate metrics from various sources, analyze the gathered metrics, and create visualizations for monitoring and report generation.

Data Collection: Data collection involves gathering metrics to track system performance and health. During this phase, metrics are collected by an agent running on each monitored resource. These metrics are then pushed to the Monitoring Service, ensuring that all relevant performance and behavior data from various parts of the environment are efficiently gathered.
Data Aggregation & Processing: Once the data reaches the Monitoring Service, it undergoes aggregation and processing. This step prepares the data for storage by transforming and organizing it to optimize subsequent retrieval and analysis.
Indexing & Storage: Indexing and storage in monitoring services ensure that collected data is organized and accessible for quick retrieval and analysis. Indexing involves creating searchable metadata for the data, enabling efficient queries. Storage involves saving the data in a scalable and durable manner, often using databases or specialized storage systems designed for high write and read performance.
Visualization & Analysis: Visualization in monitoring services involves displaying collected and processed data in an intuitive and accessible format. Users can access their metrics for visualization, reporting, analysis, and alerting through a dedicated Grafana Instance. This provides comprehensive tools for interpreting the collected metrics, enabling informed decision-making and proactive system management.

Features and Benefits

With a properly implemented Monitoring Service and various metrics, the following is a brief list of advantages of Monitoring Service for clients:

Centralized Metric Aggregation Users can push metrics from various Prometheus-compatible agents to a central endpoint, where the service aggregates and stores them for comprehensive analysis.
Unified Querying Interface Metrics and logs (from the logging service) are accessible through a single Grafana instance, allowing users to perform detailed queries and gain insights across all their data sources.
Scalable Architecture The service is designed to handle extensive volumes of metrics, with the ability to scale up or down according to user needs, ensuring consistent performance regardless of workload.
High-Performance Data Processing Optimized for speed and efficiency, the service delivers rapid data ingestion, aggregation, and querying, enabling efficient monitoring and analysis.

Benefits

Besides all the topics listed in the features set, there are some more business-related benefits:

Metrics Unification: Unify metrics across multiple data centers and applications, providing a holistic view of infrastructure and application performance.
Reduced Cost By centralizing and scaling the monitoring infrastructure, users can lower costs associated with maintaining and operating separate monitoring solutions.
Enhanced Data Consistency Aggregating all metrics in one place ensures consistent and accurate data, improving the reliability of insights and reporting.
In-Depth Analysis Leverages Grafana's powerful visualization and exploration capabilities to uncover insights from metric data.

Metric Formats

Various metric formats are employed in monitoring systems to measure performance, consumption, and other software properties. These formats serve as standardized ways to represent and transmit metric data. Here are some notable metric formats used in monitoring systems:

Prometheus format

Prometheus, a widely used monitoring system, defines its format for metrics, including types like Counter, Gauge, Histogram, and Summary.

JSON format

JSON is a lightweight, easy-to-read data-interchange format.

Warning: JSON data should be compressed using the Snappy protocol. Snappy provides fast compression and decompression, making storing and transmitting JSON data more efficient.

Metric source classification

From a technical standpoint, there is no distinct categorization of metric sources. Any device, software, or application capable of generating and presenting metrics compatible with Prometheus is considered part of this group. Nonetheless, in most instances, the Prometheus agent emerges as the optimal solution. Here, we can name some common agents that are capable of producing Prometheus-compatible metrics:

Prometheus
Grafana Agent
OpenTelemetry
FluentBit

Metrics examples

Metrics in JSON format

{
  "input": {
    "cpu.0": {
      "records": 8,
      "bytes": 2536
    }
  },
  "output": {
    "stdout.0": {
      "proc_records": 5,
      "proc_bytes": 1585,
      "errors": 0,
      "retries": 0,
      "retries_failed": 0
    }
  }
}

Metrics in Prometheus format

fluentbit_input_records_total{name="cpu.0"} 57 1509150350542
fluentbit_input_bytes_total{name="cpu.0"} 18069 1509150350542
fluentbit_output_proc_records_total{name="stdout.0"} 54 1509150350542
fluentbit_output_proc_bytes_total{name="stdout.0"} 17118 1509150350542
fluentbit_output_errors_total{name="stdout.0"} 0 1509150350542
fluentbit_output_retries_total{name="stdout.0"} 0 1509150350542
fluentbit_output_retries_failed_total{name="stdout.0"} 0 1509150350542

Metric Pipelines

A metric pipeline refers to an instance or configuration of the Monitoring Service you can create using the REST API. To create an instance of the Monitoring Service, you can request the designated regional endpoint based on your desired location:

Berlin: https://monitoring.de-txl.ionos.com/pipelines
Frankfurt: https://monitoring.de-fra.ionos.com/pipelines
London: https://monitoring.gb-lhr.ionos.com/pipelines
Paris: https://monitoring.fr-par.ionos.com/pipelines
Logroño: https://monitoring.es-vit.ionos.com/pipelines

When creating a metric pipeline instance, you can define multiple metric streams within each pipeline. Each stream functions as a separate metric source, allowing you to organize and manage different sources of metrics within your monitoring system.

After the pipeline is set up, a unique endpoint is assigned to each pipeline, thus establishing a connection with an independent metric server. This endpoint serves as the designated destination for sending metrics generated by all the metric sources within the pipeline.

Contract limitations

The maximum number of pipelines you are allowed to create. The default value is 10. If you require a higher limitation boundaries, you can contact IONOS Cloud Support team to discuss your specific requirements and adjust the limits accordingly.

Metric Sources

Monitoring Service supports metrics from various Promethues compatible metric sources. Selected agents or exporters can push metrics to selected regional endpoint for ingestion. The collected metrics are processed, stored, and made available for querying and visualization.

Note: You can send Prometheus compatible metrics from anywhere as long as you can reach the service endpoint.

Examples provide some suggested and functional Prometheus configurations for our available metric sources.

Prometheus

Prometheus is a monitoring and alerting toolkit that is designed for reliability and scalability. It is a pull-based system that scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. We also recommend you try our sample configuration.

Grafana Agent

Grafana Agent is an OpenTelemetry Collector distribution with configuration inspired by Terraform. It is designed to be flexible, performant, and compatible with multiple ecosystems such as Prometheus and OpenTelemetry. We also recommend you try our sample configuration.

OpenTelemetry

OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics) to help you analyze your software’s performance and behavior. We also recommend you try our sample configuration.

FluentBit

FluentBit is a super fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments. We also recommend you try our sample configuration. This repository provides a basic example of how to configure FluentBit to send metrics to the Ionos Monitoring Service.

Metric Types

To find out the available types of metrics, we can start by looking at what type of information we can track.

Host-based metrics

These would be anything involved in evaluating the health or performance of an individual machine, disregarding for the moment its application stacks and services.

CPU
Memory
Disk Space
Processes
Network traffic
Storage 1/O
System Metrics

Application metrics

These are metrics concerned with units of processing or work that depend on the host-level resources, like services or applications. The specific types of metrics to look at depend on what the service is providing, what dependencies it has, and what other components it interacts with.

Error and success rates
Service failures and restarts
Performance and latency of responses
Resource usage

Network and connectivity metrics

These are important gauges of outward-facing availability but are also essential in ensuring that services are accessible to other machines for any systems that span more than one machine.

Connectivity
Error rates and packet loss
Latency
Bandwidth utilization

Server pool metrics

While metrics about individual servers are useful, at scale a service is better represented as the ability of a collection of machines to perform work and respond adequately to requests. This type of metric is in many ways just a higher-level extrapolation of application and server metrics, but the resources in this case are homogeneous servers instead of machine-level components.

Pooled resource usage
Scaling adjustment indicators
Degraded instances

External dependency metrics

Other metrics you may wish to add to your system are those related to external dependencies. Often, services provide status pages or an API to discover service outages, but tracking these within your systems—as well as your actual interactions with the service—can help you identify problems with your providers that may affect your operations.

Service status and availability
Success and error rates
Run rate and operational costs
Resource exhaustion