Once the basic infrastructure has been set up, you are ready to deploy services to the cluster. To do this, you must provide Kubernetes with service descriptions for each service you wish to deploy.
Deploy an Apache ZooKeeper instance to your cluster:
This deploys an Apache Kafka broker that depends on the ZooKeeper service you just deployed. The zookeeperReference property below points to the namespace and name you gave to the ZooKeeper service deployed previously.
The next step is to deploy an Apache NiFi server:
You can use kubectl get pods
to check the status of the services, but you must first install these tools (simple-kafka, simple-nifi, etc.). This will return the status of all pods that are currently running in the default namespace.
Note: The software download from the Stackable repository and deployment of the services will take time because this is the first time that each service has been deployed to these nodes. Your cluster is prepared for use once the pods are in the running state.
Prerequisites: To get your DataPlatformCluster up and running, please make sure you are working within a provisioned Data Center and you have the appropriate permissions. The data center must be created upfront and must be accessible and editable by the user issuing the request. Only Contract Owners, Administrators, or Users with the Manage Dataplatform permission can create a cluster.
Note: To interact with this API, a user-specific authentication token is required. The IONOS CLI can generate this token.
Before using the managed Stackable solution, you need to create a new DataPlatformCluster.
To create a cluster, use the Create DataPlatformCluster API endpoint. The cluster will be provisioned in the data center, matching the provided datacenterID
. The request for cluster creation expects a string value for the dataPlatformVersion
. Currently, you can use only the "23.4" Version. The other supported versions can be obtained from the versions API endpoint https://api.ionos.com/dataplatform/versions
.
You should see a response similar to this one:
In the response, the id
field represents the cluster ID of the Managed Stackable cluster that is being created. We will refer to the value of this field, in this case 3fa85f64-5717-4562-b3fc-2c963f66afa6
, when we will use the term cluster ID.
The provisioning of the cluster might take some time, that is until the cluster has the state AVAILABLE
. To check the current provisioning status, you can query the API by calling the Get Endpoint with the cluster ID.
To deploy and run a Stackable service, the cluster must have enough computational resources. The node pool that is provisioned along with the cluster is reserved for Stackable operators. You may create further node pools with resources tailored to your use case.
To create a new node pool, use the Create DataPlatformNodepool endpoint. This creates a new node pool and assigns the node pool resources exclusively to the defined managed cluster.
Once the DataPlatformCluster has been created, its kubeconfig can be accessed by the API. The kubeconfig allows interaction with the provided cluster as with any regular Kubernetes cluster.
To protect the deployment of the Stackable distribution, the kubeconfig does not provide you with admin rights for the cluster. What that means is, that your actions and deployments are limited to the default namespace.
If you still want to group your deployments, you have the option to create subnamespaces within the default namespace. This is made possible by the concept of Hierarchical Namespaces (HNS). For more information see Introducing Hierarchical Namespaces.
The kubeconfig can be downloaded with the Get Kubeconfig endpoint. This call retrieves the Kubernetes configuration file (kubeconfig) for the specified DataPlatformCluster by its cluster ID.
To make the call, you need to use the cluster ID of the created DataPlatformCluster
.
A plain IONOS cluster is ready when kubectl get pods, and operator deployment pods are up and running. If all goes well, you will have successfully deployed a Stackable cluster and used it to start three services that should now be ready for you.
We can test ZooKeeper by running the ZooKeeper CLI shell. The easiest way to do this is to run the CLI shell on the pod that is running ZooKeeper.
The shell should connect automatically to the ZooKeeper server running on the pod. You can run the ls /
command to see the list of znodes in the root path, which should include those created by Apache Kafka and Apache NiFi.
More information on how to use Apache Zookeeper can be found here.
To test Kafka we’ll create a topic, and verify that it was created. First create the topic with the following command:
Now let’s check if it was actually created:
More information on how to use Apache Kafka can be found here.
Apache NiFi provides a web interface and the easiest way to test it is to view this in a web browser. To access the web interface we first need to get the ip address and port Nifi is listening on. To get the IP address we need to connect to (in this case 172.18.0.2
), run:
With the following command we get the port (in this case 30247
):
Browse to the address of your Kubernetes node on port 30247
e.g. https://172.18.0.2:30247/nifi and you should see the NiFi login screen.
The Apache NiFi operator will automatically generate the admin user credentials with a random password and store it as a Kubernetes secret in order to provide some security out of the box. You can retrieve this password for the admin
user with the following kubectl command.
Once you have these credentials you can log in and you should see a blank NiFi canvas.
More information on how to use Apache NiFi can be found here.
Note that we install and maintain only the Stackable operators' tools, but you have to install tools like Zookeeper, Kafka, Nifi, etc. before you can check that the tools are ready to use in the Cluster.
Core Technology: The is a universal Big Data distribution system, which serves as the base technology to orchestrate, deploy, scale, and manage Big Data within your IONOS Cloud infrastructure.
Managed Platform: The platform lets you deploy, scale, and manage your big data tools on IONOS Cloud servers in a fully automated way.
The open and modular approach of this distribution allows you to implement different data stacks on the same platform.
Supported Tools: This platform is based on a preconfigured Kubernetes cluster with pre-installed and fully managed Stackable Operators. IONOS offers the best available open-source tools and lets you bundle them in a way that makes it easy to deploy a fully working software stack
You can directly configure and manage these services in order to build your desired application on top of them. Stackable offers the best available open-source tools and lets you bundle them in a way that makes it easy to deploy a fully working software stack.
Managed Service: A Kubernetes-powered stackable distribution that is ready to run any application available in the Stackable ecosystem.
Compatibility: Customize and manage data stacks flexibly based on the latest open-source software (e.g. Apache Kafka, Apache Spark, Apache NiFi).
Infrastructure as Data: The distribution allows you to configure and customize existing open-source software in a manageable manner.
Security: Managed Stackable adds common authentication and authorization to all delivered products, which is key to building successful customer data platforms.
Stackable Operators are components that translate service definitions deployed via Kubernetes into deployment services on worker nodes. They are pre-installed on a control plane node. Managed Stackable supports the following operators:
Apache Airflow | Apache Druid | Apache HBase |
---|
Management API
Managed Kubernetes
Stackable Distribution
Security and Bug Fixes
Initial Cluster Setup
Stack Configuration
Processing Instructions
Data Storage
Managed Stackable consists of two components: DataPlatformCluster and DataPlatformNodePools.
A DataPlatformCluster is the virtual instance of the customer services and operations that run managed Services like Stackable operators. A DataPlatformCluster is a Kubernetes Cluster in the VDC of the customer. Therefore, it is possible to integrate the cluster with other resources such as vLANs, for example, to shape the data center to the customer’s specifications and integrate the cluster within the topology the customer wants to build.
In addition to the Kubernetes cluster, a small node pool is provided which is exclusively used to run the Stackable operators.
A DataPlatformNodePool represents the physical machines a DataPlatformCluster is built on top. In terms of configuration, all nodes in a node pool are identical. The nodes of a pool are provisioned into virtual data centers at a location of your choice, and you can freely specify the properties of all the nodes or node pools at once before creation.
Note: Nodes in node pools provisioned by the Managed Data Stack Solution Cloud API are read only in the customer's VDC and can only be modified or deleted via the API.
Apache Hadoop HDFS | Apache Hive | Apache Kafka |
Apache NiFi | Apache Spark | Apache Superset |
Apache ZooKeeper | Trino |