1 of 6

Managed Stackable Data Platform

Create an orchestration system to deploy, scale, and manage extensive data infrastructure on cloud backends fully automated. The Managed Stackable Data Platform bundles the best open-source big data tools, making deploying a fully working software stack easy.

Product Overview

Recommended How-To's

Overview

Core Technology: The Stackable Data Platform is a universal Big Data distribution system, which serves as the base technology to orchestrate, deploy, scale, and manage Big Data within your IONOS Cloud infrastructure.

Managed Platform: The platform lets you deploy, scale, and manage your big data tools on IONOS Cloud servers in a fully automated way.

The open and modular approach of this distribution allows you to implement different data stacks on the same platform.

Supported Tools: This platform is based on a preconfigured Kubernetes cluster with pre-installed and fully managed Stackable Operators. IONOS offers the best available open-source tools and lets you bundle them in a way that makes it easy to deploy a fully working software stack

You can directly configure and manage these services in order to build your desired application on top of them. Stackable offers the best available open-source tools and lets you bundle them in a way that makes it easy to deploy a fully working software stack.

Key Features

Managed Service: A Kubernetes-powered stackable distribution that is ready to run any application available in the Stackable ecosystem.

Compatibility: Customize and manage data stacks flexibly based on the latest open-source software (e.g. Apache Kafka, Apache Spark, Apache NiFi).

Infrastructure as Data: The distribution allows you to configure and customize existing open-source software in a manageable manner.

Security: Managed Stackable adds common authentication and authorization to all delivered products, which is key to building successful customer data platforms.

Supported Tools

Stackable Operators are components that translate service definitions deployed via Kubernetes into deployment services on worker nodes. They are pre-installed on a control plane node. Managed Stackable supports the following operators:

Apache Airflow

Apache Druid

Apache HBase

Apache Hadoop HDFS

Apache Hive

Apache Kafka

Apache NiFi

Apache Spark

Apache Superset

Apache ZooKeeper

Trino

Platform Management

Services offered by IONOS Cloud

Management API
Managed Kubernetes
Stackable Distribution
Security and Bug Fixes

Customer administration duties

Initial Cluster Setup
Stack Configuration
Processing Instructions
Data Storage

Platform Components

Managed Stackable consists of two components: DataPlatformCluster and DataPlatformNodePools.

DataPlatformCluster

A DataPlatformCluster is the virtual instance of the customer services and operations that run managed Services like Stackable operators. A DataPlatformCluster is a Kubernetes Cluster in the VDC of the customer. Therefore, it is possible to integrate the cluster with other resources such as vLANs, for example, to shape the data center to the customer’s specifications and integrate the cluster within the topology the customer wants to build.

In addition to the Kubernetes cluster, a small node pool is provided which is exclusively used to run the Stackable operators.

DataPlatformNodePool

A DataPlatformNodePool represents the physical machines a DataPlatformCluster is built on top. In terms of configuration, all nodes in a node pool are identical. The nodes of a pool are provisioned into virtual data centers at a location of your choice, and you can freely specify the properties of all the nodes or node pools at once before creation.

Note: Nodes in node pools provisioned by the Managed Data Stack Solution Cloud API are read only in the customer's VDC and can only be modified or deleted via the API.

How-Tos

Initial Cluster Setup

Prerequisites: To get your DataPlatformCluster up and running, please make sure you are working within a provisioned Data Center and you have the appropriate permissions. The data center must be created upfront and must be accessible and editable by the user issuing the request. Only Contract Owners, Administrators, or Users with the Manage Dataplatform permission can create a cluster.

Note: To interact with this API, a user-specific authentication token is required. The IONOS CLI can generate this token.

Creating a new DataPlatformCluster

Before using the managed Stackable solution, you need to create a new DataPlatformCluster.

To create a cluster, use the Create DataPlatformCluster API endpoint. The cluster will be provisioned in the data center, matching the provided datacenterID. The request for cluster creation expects a string value for the dataPlatformVersion. Currently, you can use only the "23.4" Version. The other supported versions can be obtained from the versions API endpoint https://api.ionos.com/dataplatform/versions.

Request

{
  "properties": {
    "name": "my-cluster",
    "dataPlatformVersion": "23.4",
    "datacenterId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "maintenanceWindow": {
      "time": "16:30:59",
      "dayOfTheWeek": "Monday"
    }
  }
}

Response

You should see a response similar to this one:

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "type": "cluster",
  "href": "https://api.ionos.com/dataplatform/v1/clusters/498ae72f-411f-11eb-9d07-046c59cc737e",
  "metadata": {
    "ETag": "123aaa5d587dcd0d58f767d464abcdef",
    "createdDate": "2020-12-10T13:37:50Z",
    "createdBy": "john.doe@example.com",
    "createdByUserId": "12345693-9ae1-40c5-9b49-7c0afeabcdef",
    "createdInContractNumber": "12315812",
    "lastModifiedDate": "2020-12-11T13:37:50Z",
    "lastModifiedBy": "jane.doe@example.com",
    "lastModifiedByUserId": "98765493-9ae1-40c5-9b49-7c0afefedcba",
    "state": "DEPLOYING"
  },
  "properties": {
    "name": "my-cluster",
    "dataPlatformVersion": "23.4",
    "datacenterId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "maintenanceWindow": {
      "time": "16:30:59",
      "dayOfTheWeek": "Monday"
    }
  }
}

In the response, the id field represents the cluster ID of the Managed Stackable cluster that is being created. We will refer to the value of this field, in this case 3fa85f64-5717-4562-b3fc-2c963f66afa6, when we will use the term cluster ID.

The provisioning of the cluster might take some time, that is until the cluster has the state AVAILABLE. To check the current provisioning status, you can query the API by calling the Get Endpoint with the cluster ID.

Adding a DataPlatformNodePool

To deploy and run a Stackable service, the cluster must have enough computational resources. The node pool that is provisioned along with the cluster is reserved for Stackable operators. You may create further node pools with resources tailored to your use case.

To create a new node pool, use the Create DataPlatformNodepool endpoint. This creates a new node pool and assigns the node pool resources exclusively to the defined managed cluster.

Request

{
  "properties": {
    "name": "my-node-pool",
    "nodeCount": 2,
    "cpuFamily": "AUTO",
    "coresCount": 4,
    "ramSize": 4096,
    "availabilityZone": "AUTO",
    "storageType": "SSD",
    "storageSize": 20,
    "maintenanceWindow": {
      "time": "16:30:59",
      "dayOfTheWeek": "Monday"
    },
    "labels": {
      "foo": "bar"
    },
    "annotations": {
      "foo": "bar"
    }
  }
}

Response

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "type": "nodepool",
  "href": "https://api.ionos.com/dataplatform/v1/clusters/498ae72f-411f-11eb-9d07-046c59cc737e/nodepools/abcd7e2f-9876-1234-abcd-123459498ae7",
  "metadata": {
    "ETag": "123aaa5d587dcd0d58f767d464abcdef",
    "createdDate": "2020-12-10T13:37:50Z",
    "createdBy": "john.doe@example.com",
    "createdByUserId": "12345693-9ae1-40c5-9b49-7c0afeabcdef",
    "createdInContractNumber": "12315812",
    "lastModifiedDate": "2020-12-11T13:37:50Z",
    "lastModifiedBy": "jane.doe@example.com",
    "lastModifiedByUserId": "98765493-9ae1-40c5-9b49-7c0afefedcba",
    "state": "AVAILABLE"
  },
  "properties": {
    "name": "my-node-pool",
    "dataPlatformVersion": "23.4",
    "datacenterId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "nodeCount": 2,
    "cpuFamily": "AUTO",
    "coresCount": 4,
    "ramSize": 4096,
    "availabilityZone": "AUTO",
    "storageType": "SSD",
    "storageSize": 20,
    "maintenanceWindow": {
      "time": "16:30:59",
      "dayOfTheWeek": "Monday"
    },
    "labels": {
      "foo": "bar"
    },
    "annotations": {
      "foo": "bar"
    }
  }
}
Headers:
Name

Downloading kubectl Credentials (kubeconfig)

Once the DataPlatformCluster has been created, its kubeconfig can be accessed by the API. The kubeconfig allows interaction with the provided cluster as with any regular Kubernetes cluster.

To protect the deployment of the Stackable distribution, the kubeconfig does not provide you with admin rights for the cluster. What that means is, that your actions and deployments are limited to the default namespace.

If you still want to group your deployments, you have the option to create subnamespaces within the default namespace. This is made possible by the concept of Hierarchical Namespaces (HNS). For more information see Introducing Hierarchical Namespaces.

The kubeconfig can be downloaded with the Get Kubeconfig endpoint. This call retrieves the Kubernetes configuration file (kubeconfig) for the specified DataPlatformCluster by its cluster ID.

To make the call, you need to use the cluster ID of the created DataPlatformCluster.

Deploy Stackable Services

Once the basic infrastructure has been set up, you are ready to deploy services to the cluster. To do this, you must provide Kubernetes with service descriptions for each service you wish to deploy.

Apache Zookeeper

Deploy an Apache ZooKeeper instance to your cluster:

kubectl apply -f - <<EOF
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
  name: simple-zk
spec:
  servers:
    roleGroups:
      primary:
        replicas: 1
        config:
          myidOffset: 10
  version: 3.5.8
EOF

Apache Kafka

This deploys an Apache Kafka broker that depends on the ZooKeeper service you just deployed. The zookeeperReference property below points to the namespace and name you gave to the ZooKeeper service deployed previously.

kubectl apply -f - <<EOF
---
apiVersion: kafka.stackable.tech/v1alpha1
kind: KafkaCluster
metadata:
  name: simple-kafka
spec:
  version: 2.8.1
  zookeeperConfigMapName: simple-kafka-znode
  brokers:
    roleGroups:
      brokers:
        replicas: 1
        selector:
          matchLabels:
            node: quickstart-1
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperZnode
metadata:
  name: simple-kafka-znode
spec:
  clusterRef:
    name: simple-zk
    namespace: default
EOF

Apache NiFi

The next step is to deploy an Apache NiFi server:

kubectl apply -f - <<EOF
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperZnode
metadata:
  name: simple-nifi-znode
spec:
  clusterRef:
    name: simple-zk
---
apiVersion: v1
kind: Secret
metadata:
  name: nifi-admin-credentials-simple
stringData:
  username: admin
  password: AdminPassword
---
apiVersion: nifi.stackable.tech/v1alpha1
kind: NifiCluster
metadata:
  name: simple-nifi
spec:
  version: "1.15.0"
  zookeeperConfigMapName: simple-nifi-znode
  authenticationConfig:
    method:
      SingleUser:
        adminCredentialsSecret:
          name: nifi-admin-credentials-simple
        autoGenerate: true
  sensitivePropertiesConfig:
    keySecret: nifi-sensitive-property-key
    autoGenerate: true
  nodes:
    roleGroups:
      default:
        selector:
          matchLabels:
            node: quickstart-1
        config:
          log:
            rootLogLevel: INFO
        replicas: 1
EOF

You can use kubectl get pods to check the status of the services, but you must first install these tools (simple-kafka, simple-nifi, etc.). This will return the status of all pods that are currently running in the default namespace.

NAME                                             READY   STATUS    RESTARTS   AGE
nifi-operator-deployment-64c98c779c-nw6h8        1/1     Running   0          24m
kafka-operator-deployment-54df9f86c7-psqgd       1/1     Running   0          24m
zookeeper-operator-deployment-767458d4f5-2czb9   1/1     Running   0          24m
secret-operator-daemonset-pddkv                  2/2     Running   0          24m
simple-zk-server-primary-0                       1/1     Running   0          23m
simple-kafka-broker-brokers-0                    2/2     Running   0          21m
simple-nifi-node-default-0                       1/1     Running   0          22m

Note: The software download from the Stackable repository and deployment of the services will take time because this is the first time that each service has been deployed to these nodes. Your cluster is prepared for use once the pods are in the running state.

Use a Plain Cluster

A plain IONOS cluster is ready when kubectl get pods, and operator deployment pods are up and running. If all goes well, you will have successfully deployed a Stackable cluster and used it to start three services that should now be ready for you.

Apache ZooKeeper

We can test ZooKeeper by running the ZooKeeper CLI shell. The easiest way to do this is to run the CLI shell on the pod that is running ZooKeeper.

kubectl exec -i -t simple-zk-server-primary-0 -- bin/zkCli.sh

The shell should connect automatically to the ZooKeeper server running on the pod. You can run the ls / command to see the list of znodes in the root path, which should include those created by Apache Kafka and Apache NiFi.

[zk: localhost:2181(CONNECTED) 0] ls /
[nifi, znode-17b28a7e-0d45-450b-8209-871225c6efa1, zookeeper]

More information on how to use Apache Zookeeper can be found here.

Apache Kafka

To test Kafka we’ll create a topic, and verify that it was created. First create the topic with the following command:

kubectl exec -i -t simple-kafka-broker-brokers-0 -c kafka -- \
  bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic demo
Created topic demo.

Now let’s check if it was actually created:

kubectl exec -i -t simple-kafka-broker-brokers-0 -c kafka -- \
  bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
demo

More information on how to use Apache Kafka can be found here.

Apache NiFi

Apache NiFi provides a web interface and the easiest way to test it is to view this in a web browser. To access the web interface we first need to get the ip address and port Nifi is listening on. To get the IP address we need to connect to (in this case 172.18.0.2), run:

kubectl get nodes --selector=node=quickstart-1 -o wide

NAME                STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION      CONTAINER-RUNTIME
quickstart-worker   Ready    <none>   45m   v1.21.1   172.18.0.2    <none>        Ubuntu 21.04   5.15.0-25-generic   containerd://1.5.2

With the following command we get the port (in this case 30247):

kubectl get svc simple-nifi

NAME          TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
simple-nifi   NodePort   10.43.75.25   <none>        8443:30247/TCP   49m

Browse to the address of your Kubernetes node on port 30247 e.g. https://172.18.0.2:30247/nifi and you should see the NiFi login screen.

The Apache NiFi operator will automatically generate the admin user credentials with a random password and store it as a Kubernetes secret in order to provide some security out of the box. You can retrieve this password for the admin user with the following kubectl command.

kubectl get secrets nifi-admin-credentials-simple \
-o jsonpath="{.data.password}" | base64 -d && echo

Once you have these credentials you can log in and you should see a blank NiFi canvas.

More information on how to use Apache NiFi can be found here.

Note that we install and maintain only the Stackable operators' tools, but you have to install tools like Zookeeper, Kafka, Nifi, etc. before you can check that the tools are ready to use in the Cluster.

Overview

Managed Platform: The platform lets you deploy, scale, and manage your big data tools on IONOS Cloud servers in a fully automated way.

The open and modular approach of this distribution allows you to implement different data stacks on the same platform.

Key Features

Managed Service: A Kubernetes-powered stackable distribution that is ready to run any application available in the Stackable ecosystem.

Compatibility: Customize and manage data stacks flexibly based on the latest open-source software (e.g. Apache Kafka, Apache Spark, Apache NiFi).

Infrastructure as Data: The distribution allows you to configure and customize existing open-source software in a manageable manner.

Security: Managed Stackable adds common authentication and authorization to all delivered products, which is key to building successful customer data platforms.

Supported Tools

Apache Airflow

Apache Druid

Apache HBase

Apache Hadoop HDFS

Apache Hive

Apache Kafka

Apache NiFi

Apache Spark

Apache Superset

Apache ZooKeeper

Trino

Platform Management

Services offered by IONOS Cloud

Management API
Managed Kubernetes
Stackable Distribution
Security and Bug Fixes

Customer administration duties

Initial Cluster Setup
Stack Configuration
Processing Instructions
Data Storage

Platform Components

Managed Stackable consists of two components: DataPlatformCluster and DataPlatformNodePools.

DataPlatformCluster

In addition to the Kubernetes cluster, a small node pool is provided which is exclusively used to run the Stackable operators.

DataPlatformNodePool

Note: Nodes in node pools provisioned by the Managed Data Stack Solution Cloud API are read only in the customer's VDC and can only be modified or deleted via the API.

Use a Plain Cluster

Apache ZooKeeper

We can test ZooKeeper by running the ZooKeeper CLI shell. The easiest way to do this is to run the CLI shell on the pod that is running ZooKeeper.

kubectl exec -i -t simple-zk-server-primary-0 -- bin/zkCli.sh

[zk: localhost:2181(CONNECTED) 0] ls /
[nifi, znode-17b28a7e-0d45-450b-8209-871225c6efa1, zookeeper]

More information on how to use Apache Zookeeper can be found here.

Apache Kafka

To test Kafka we’ll create a topic, and verify that it was created. First create the topic with the following command:

kubectl exec -i -t simple-kafka-broker-brokers-0 -c kafka -- \
  bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic demo
Created topic demo.

Now let’s check if it was actually created:

kubectl exec -i -t simple-kafka-broker-brokers-0 -c kafka -- \
  bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
demo

More information on how to use Apache Kafka can be found here.

Apache NiFi

kubectl get nodes --selector=node=quickstart-1 -o wide

NAME                STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION      CONTAINER-RUNTIME
quickstart-worker   Ready    <none>   45m   v1.21.1   172.18.0.2    <none>        Ubuntu 21.04   5.15.0-25-generic   containerd://1.5.2

With the following command we get the port (in this case 30247):

kubectl get svc simple-nifi

NAME          TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
simple-nifi   NodePort   10.43.75.25   <none>        8443:30247/TCP   49m

Browse to the address of your Kubernetes node on port 30247 e.g. https://172.18.0.2:30247/nifi and you should see the NiFi login screen.

kubectl get secrets nifi-admin-credentials-simple \
-o jsonpath="{.data.password}" | base64 -d && echo

Once you have these credentials you can log in and you should see a blank NiFi canvas.

More information on how to use Apache NiFi can be found here.

Note that we install and maintain only the Stackable operators' tools, but you have to install tools like Zookeeper, Kafka, Nifi, etc. before you can check that the tools are ready to use in the Cluster.

Initial Cluster Setup

Note: To interact with this API, a user-specific authentication token is required. The IONOS CLI can generate this token.

Creating a new DataPlatformCluster

Before using the managed Stackable solution, you need to create a new DataPlatformCluster.

Request

{
  "properties": {
    "name": "my-cluster",
    "dataPlatformVersion": "23.4",
    "datacenterId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "maintenanceWindow": {
      "time": "16:30:59",
      "dayOfTheWeek": "Monday"
    }
  }
}

Response

You should see a response similar to this one:

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "type": "cluster",
  "href": "https://api.ionos.com/dataplatform/v1/clusters/498ae72f-411f-11eb-9d07-046c59cc737e",
  "metadata": {
    "ETag": "123aaa5d587dcd0d58f767d464abcdef",
    "createdDate": "2020-12-10T13:37:50Z",
    "createdBy": "john.doe@example.com",
    "createdByUserId": "12345693-9ae1-40c5-9b49-7c0afeabcdef",
    "createdInContractNumber": "12315812",
    "lastModifiedDate": "2020-12-11T13:37:50Z",
    "lastModifiedBy": "jane.doe@example.com",
    "lastModifiedByUserId": "98765493-9ae1-40c5-9b49-7c0afefedcba",
    "state": "DEPLOYING"
  },
  "properties": {
    "name": "my-cluster",
    "dataPlatformVersion": "23.4",
    "datacenterId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "maintenanceWindow": {
      "time": "16:30:59",
      "dayOfTheWeek": "Monday"
    }
  }
}

Adding a DataPlatformNodePool

To create a new node pool, use the Create DataPlatformNodepool endpoint. This creates a new node pool and assigns the node pool resources exclusively to the defined managed cluster.

Request

{
  "properties": {
    "name": "my-node-pool",
    "nodeCount": 2,
    "cpuFamily": "AUTO",
    "coresCount": 4,
    "ramSize": 4096,
    "availabilityZone": "AUTO",
    "storageType": "SSD",
    "storageSize": 20,
    "maintenanceWindow": {
      "time": "16:30:59",
      "dayOfTheWeek": "Monday"
    },
    "labels": {
      "foo": "bar"
    },
    "annotations": {
      "foo": "bar"
    }
  }
}

Response

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "type": "nodepool",
  "href": "https://api.ionos.com/dataplatform/v1/clusters/498ae72f-411f-11eb-9d07-046c59cc737e/nodepools/abcd7e2f-9876-1234-abcd-123459498ae7",
  "metadata": {
    "ETag": "123aaa5d587dcd0d58f767d464abcdef",
    "createdDate": "2020-12-10T13:37:50Z",
    "createdBy": "john.doe@example.com",
    "createdByUserId": "12345693-9ae1-40c5-9b49-7c0afeabcdef",
    "createdInContractNumber": "12315812",
    "lastModifiedDate": "2020-12-11T13:37:50Z",
    "lastModifiedBy": "jane.doe@example.com",
    "lastModifiedByUserId": "98765493-9ae1-40c5-9b49-7c0afefedcba",
    "state": "AVAILABLE"
  },
  "properties": {
    "name": "my-node-pool",
    "dataPlatformVersion": "23.4",
    "datacenterId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "nodeCount": 2,
    "cpuFamily": "AUTO",
    "coresCount": 4,
    "ramSize": 4096,
    "availabilityZone": "AUTO",
    "storageType": "SSD",
    "storageSize": 20,
    "maintenanceWindow": {
      "time": "16:30:59",
      "dayOfTheWeek": "Monday"
    },
    "labels": {
      "foo": "bar"
    },
    "annotations": {
      "foo": "bar"
    }
  }
}
Headers:
Name

Downloading kubectl Credentials (kubeconfig)

Once the DataPlatformCluster has been created, its kubeconfig can be accessed by the API. The kubeconfig allows interaction with the provided cluster as with any regular Kubernetes cluster.

The kubeconfig can be downloaded with the Get Kubeconfig endpoint. This call retrieves the Kubernetes configuration file (kubeconfig) for the specified DataPlatformCluster by its cluster ID.

To make the call, you need to use the cluster ID of the created DataPlatformCluster.