1 of 8

Tutorials

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract now and get your AI journey started today!

The IONOS AI Model Hub offers powerful AI capabilities to meet various needs. Here are five pivotal use cases you can implement with this service:

Prerequisite: Authentication Tokens

The IONOS AI Model Hub uses authentication tokens to ensure that only users with corresponding permissions can make use of it.

Key Features:

Authentication tokens are bound to IONOS Public Cloud users.
Usage is billed via the IONOS Public Cloud contract owner responsible for these users.

The Access Management tutorial includes step-by-step instructions for generating an IONOS Public Cloud contract, users, and authentication tokens.

Use Case 1: Text Generation

Text generation models enable advanced language processing tasks, such as content creation, summarization, conversational responses, and question-answering. These models are pre-trained on extensive datasets, enabling high-quality text generation with minimal setup.

Key Features:

Access open-source Large Language Models (LLMs) via an OpenAI-compatible API.
Ensure data privacy with processing confined within Germany.

For step-by-step instructions on text generation, see the Text Generation tutorial.

Use Case 2: Image Generation

Image generation models allow you to create high-quality, detailed images from descriptive text prompts. These models can be used for applications in creative design, marketing visuals, and more.

Key Features:

Generate photorealistic or stylized images based on specific prompts.
Choose from models optimized for authenticity or creative and artistic outputs.

To learn how to implement image generation, see the Image Generation tutorial.

Use Case 3: Text Embeddings

Embedding models allow you to create numerical representations of texts, which are similar if the texts are semantically similar. These models are ideal for applications like text retrieval, comparison, ranking, etc.

Key Features:

Identify texts that answer a query based on semantic similarity between query and potential answer.
Compare texts to determine their semantic closeness or difference.

To learn how to derive embeddings and calculate similarity of texts, see the Text Embeddings tutorial.

Use Case 4: Document Collections

Vector databases enable you to store and query large collections of documents based on semantic similarity. Converting documents into embeddings allows you to perform effective similarity searches, making it ideal for applications like document retrieval and recommendation systems.

Key Features:

Persist documents and search for semantically similar content.
Manage document collections through simple API endpoints.

For detailed instructions, see Document Collections tutorial.

Use Case 5: Retrieval Augmented Generation (RAG)

RAG combines the strengths of foundation models and vector databases. It retrieves the most relevant documents from the database and uses them to augment the output of a foundation model. This approach enriches the responses, making them more accurate and context-aware.

Key Features:

Combine foundation models with additional context retrieved from document collections.
Enhance response accuracy and relevance for user queries.

To learn how to implement Retrieval Augmented Generation, see the Retrieval Augmented Generation tutorial.

Use Case 6: Tool Integration

The IONOS AI Model Hub can be seamlessly integrated into various frontend tools that use Large Language Models or text-to-image models through its OpenAI-compatible API. This integration allows you to leverage foundation models in applications without complex setups. For example, using the tool AnythingLLM, you can configure and connect to the IONOS AI Model Hub to serve as the backend for Large Language Model functionalities.

Key Features:

Easily connect to third-party tools with the OpenAI-compatible API.
Enable custom applications with IONOS-hosted foundation models.

For detailed guidance on integrating with tools, see the Tool Integration tutorial.

These tutorials will guide you through each use case, providing clear and actionable steps to integrate advanced AI capabilities into your applications using the IONOS AI Model Hub.

Access Management

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract using this tutorial now and get your AI journey started today!

The IONOS AI Model Hub grants access to users with authentication tokens. An authentication token is a unique string assigned to a specific user. Do not share your authentication tokens with others; each authentication token grants access to modifying or using the corresponding IONOS solutions.

The IONOS authentication tokens used for the AI Model Hub are bound to users of our public cloud offering. The central advantage of this approach is that existing users can use their authentication tokens to access the AI Model Hub with no or only minor changes. However, new users interested solely in the AI Model Hub must first create a public cloud contract.

This tutorial helps both new and existing users to get access to the AI Model Hub.

Overview

This tutorial is intended for users without prior programming knowledge.

By the end of this tutorial, you will be able to:

Create a new contract for the IONOS Public Cloud offering.
Add and edit users to gain access to the AI Model Hub.
Generate an authentication token for an existing user.

If you have an existing contract with IONOS Public Cloud, you can directly log into the DCD and proceed with adding and editing users to gain access to the AI Model Hub.

Create Public Cloud contract

To create a public cloud contract, proceed as follows:

Open the IONOS signup page.
Select the Country you are living in.
Enter your Email address and a Password you want to use.
Accept the Pricing and Terms and Conditions by marking the corresponding checkbox.
Click Test now.

The page with the details will look similar to this:

You are now informed that you will receive an email from IONOS. Remember to click on the hyperlink specified in this email to validate your email address.

After confirming the email address is valid, enter the following details:

Enter your First Name and Last Name.
Enter your Phone Number.
Click Test now.

Your IONOS Public Cloud contract is now created and can be used to log in.

Activate unlimited access

After logging in, you can with the new contract for the first time, you can neither create new users nor generate authentication tokens. To do this, you have to "activate unlimited access" by entering your contact data:

Click Get full access on the top of the screen.
Enter your Street address, ZIP and the City you life in.
CLick Save and Continue.

After entering the contact data, please specify your payment details:

Select whether to specify your Credit Card data our SEPA Direct Debit data.
Enter the relevant information.
Click Save and Continue

We now manually check whether the data you provided is correct. After this evaluation, we inform you and actiate unlimited access. This process can take up to 24 hours.

Grant users access to the AI Model Hub

Every contract owner has sufficient rights to access the AI Model Hub. In addition, the contract owner can use every API endpoint of the IONOS Public Cloud offering. That is, the contract owner can create new users, setup infrastructure in the IONOS cloud and configure existing setups.

We therefore suggest that you create a new user to be specifically used for the AI Model Hub and grant this user only the rights they will need.

You can create a corresponding user in a few simple steps.

Log into the DCD

You first need to log into the Data Center Designer (DCD) our frontend for our Public Cloud offering:

Open the URL https://dcd.ionos.com in a browser.
Enter the Email address and the Password you specified when creating your Public Cloud contract.
Click Sign in.

The filled screen is similar to:

Open IONOS User Manager

Next open the IONOS User Manager by clicking "Management" -> "Users & Groups".

Create an AI Model Hub user group

In the User Manager you create a new user group:

Click Groups -> + Create.
Enter a name for your user group in the field Group Name.
Click Create.

You have now created a new user group without any rights.

Grant AI Model Hub access rights to your user group

To grant your new user group the necessary access rights:

Select your user group in the left part of the screen.
Your Group Name is now displayed in the upper right area of the screen.
Scroll to the end of the list in the lower right part of the screen.
Select the checkbox for Access and manage AI Model Hub.

Your user group is now granted access to the AI Model Hub.

Create a new user

Next, create a user and add this user to the newly created user group:

Click Users -> + Create.
Enter First Name, Last Name, Email and Password for this user.
Click Create.

The new user is now created.

Grant access to the new user

To add the new user to the user group with access rights to the AI Model Hub:

Click Users in the IONOS User Manager.
Select the newly generated User in the left part of the screen.
Select Groups in the right area of the screen.
Click + Add to Group.
Click on the name of the newly created group.

Your user is now in the user group of the AI Model Hub and can access to corresponding service.

Generate an authentication token

To generate an authentication token log into the Data Center Designer (DCD) our frontend for our Public Cloud offering with the user for which you want to create an authentication token.

In our example, log out of the contract owner’s account and log in with the newly created user account to proceed with token generation.

After logging in, create a token as follows:

Click Management -> Token Management.
Select the Time to Live (TTL). This is the duration for which the authentication token is valid.
Click Generate Token.

On the next screen you are shown the authentication token. Click "Download" to save this token locally or mark and copy it.

Note:

You need to copy your authentication token in this step. It will not be displayed afterwards in any dialog and you have no chance to recover the authentication token, if you missed copying it here.

Summary

In this tutorial, you learned how to create an authentication token for the IONOS AI Model Hub.

Namely, you learned how to:

Create a new contract for the IONOS Public Cloud.
Create a new user with the access rights to use the AI Model Hub.
Create an authentication token to use with the API of the AI Model Hub.

Text Generation

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract now and get your AI journey started today!

The IONOS AI Model Hub offers an OpenAI-compatible API that enables powerful text generation capabilities through foundation models. These Large Language Models (LLMs) can perform a wide variety of tasks, such as generating conversational responses, summaries, and contextual answers, without requiring you to manage hardware or extensive infrastructure.

Supported Text Generation Models

The following models are currently available for text generation, each suited to different applications:

Model Provider

Model Name

Purpose

Teuken Commercial v0.4 Instruct (7B)

Ideal for dialogue use cases and natural language tasks in all 24 EU languages.

Llama 3.3 Instruct (70B)

Ideal for dialogue use cases and natural language tasks: conversational agents, virtual assistants, and chatbots. Exceeds the capabilities of Llama 3.1 Instruct in quality (Llama 3.1 70B) and performance (Llama 3.1 405B).

Llama 3.1 Instruct (8B, 70B and 405B)

Ideal for dialogue use cases and natural language tasks: conversational agents, virtual assistants, and chatbots.

Code Llama Instruct HF (13B)

Focuses on generating different kinds of computer code, understands programming languages.

Mistral Instruct v0.3 (7B), Mixtral (8x7B)

Ideal for: Conversational agents, virtual assistants, and chatbots; Comparison to Llama 3: better with European languages; supports longer context length.

Overview

In this tutorial, you will learn how to generate text using foundation models via the IONOS API. This tutorial is intended for developers with basic knowledge of:

REST APIs
A programming language for handling REST API endpoints (Python and Bash examples are provided)

By the end, you will be able to: 1. Retrieve a list of text generation models available in the IONOS AI Model Hub. 2. Apply prompts to these models to generate text responses, supporting applications like virtual assistants and content creation.

Getting Started with Text Generation

To use text generation models, first set up your environment and authenticate using the OpenAI-compatible API endpoints.

Download the respective code files to easily access text generation-specific scripts and examples and generate the intended output:

Download this Python Notebook file to easily access text generation-specific scripts and examples and generate the intended output.

Download this Python code file to easily access text generation-specific scripts and examples and generate the intended output.

Download this Bash code file to easily access text generation-specific scripts and examples and generate the intended output.

Step 1: Retrieve Available Models

Fetch a list of models to see which are available for your use case:

# Python example to retrieve available models
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"

endpoint = "https://openai.inference.de-txl.ionos.com/v1/models"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}", 
    "Content-Type": "application/json"
}
requests.get(endpoint, headers=header).json()

#!/bin/bash

IONOS_API_TOKEN=[YOUR API TOKEN HERE]

curl -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
     --get https://openai.inference.de-txl.ionos.com/v1/models

This query returns a JSON document listing each models name, which you’ll use to specify a model for text generation in later steps.

Step 2: Generate Text with Your Prompt

To generate text, send a prompt to the chat/completions endpoint.

# Python example for text generation
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"
MODEL_NAME = "[MODEL NAME HERE]"
PROMPT = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]

endpoint = "https://openai.inference.de-txl.ionos.com/v1/chat/completions"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}", 
    "Content-Type": "application/json"
}
body = {
    "model": MODEL_NAME,
    "messages": PROMPT,
}
requests.post(endpoint, json=body, headers=header).json()

#!/bin/bash

IONOS_API_TOKEN=[YOUR API TOKEN HERE]
MODEL_NAME=meta-llama/Meta-Llama-3.1-8B-Instruct 
PROMPT='[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]'

BODY="{ 
    \"model\": \"$MODEL_NAME\",
    \"messages\": $PROMPT
}"
echo $BODY

curl -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     https://openai.inference.de-txl.ionos.com/v1/chat/completions

Step 3: Extract and Interpret the Result

The returned JSON includes several key fields, most importantly:

choices.[].message.content: The generated text based on your prompt.
usage.prompt_tokens: Token count for the input prompt.
usage.completion_tokens: Token count for the generated output.

Summary

In this tutorial, you learned how to:

Access available text generation models.
Use prompts to generate text responses, ideal for applications such as conversational agents, content creation, and more.

For information on image generation, refer to our dedicated tutorial on text-to-image models.

Image Generation

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract now and get your AI journey started today!

The IONOS AI Model Hub provides an OpenAI-compatible API that enables high-quality image generation using state-of-the-art foundation models. By inputting descriptive prompts, users can create detailed images directly through the API, without the need for managing underlying hardware or infrastructure.

Supported Image Generation Models

The following models are currently available for image generation, each suited to different types of visual outputs:

Model Provider

Model Name

Purpose

Stable Diffusion XL

Generates photorealistic images, ideal for marketing visuals, product mockups, and natural scenes.

FLUX.1-schnell

Generates artistic, stylized images, well-suited for creative projects, digital art, and unique concept designs.

Overview

In this tutorial, you will learn how to generate images using foundation models via the IONOS API. This tutorial is intended for developers with basic knowledge of:

REST APIs
A programming language for handling REST API endpoints (Python and Bash examples are provided)

By the end, you will be able to:

Retrieve a list of available image generation models in the IONOS AI Model Hub.
Use prompts to generate images with these models.

Getting Started with Image Generation

To use image generation models, first set up your environment and authenticate using the OpenAI-compatible API endpoints.

Download the respective code files to easily access image generation-specific scripts and examples and generate the intended output:

Download this Python Notebook file to easily access image generation-specific scripts and examples and generate the intended output.

Download this Python code file to easily access image generation-specific scripts and examples and generate the intended output.

Download this Bash code file to easily access image generation-specific scripts and examples and generate the intended output.

Step 1: Retrieve Available Models

Fetch a list of models to see which are available for your use case:

# Python example to retrieve available models
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"

endpoint = "https://openai.inference.de-txl.ionos.com/v1/models"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}", 
    "Content-Type": "application/json"
}
requests.get(endpoint, headers=header).json()

#!/bin/bash

IONOS_API_TOKEN=[YOUR API TOKEN HERE]

curl -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
        --get https://openai.inference.de-txl.ionos.com/v1/models

This query returns a JSON document listing each model's name, which you’ll use to specify a model for image generation in later steps.

Step 2: Generate an Image with Your Prompt

To generate an image, send a prompt to the /images/generations endpoint. Customize parameters like size for the resolution of the output image.

# Python example for image generation
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"
MODEL_NAME = "[MODEL NAME HERE]"
PROMPT = "A futuristic cityscape at sunset, highly detailed"

endpoint = "https://openai.inference.de-txl.ionos.com/v1/images/generations"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}", 
    "Content-Type": "application/json"
}
body = {
    "model": MODEL_NAME,
    "prompt": PROMPT,
    "size": "1024x1024"
}
requests.post(endpoint, json=body, headers=header)

#!/bin/bash

IONOS_API_TOKEN=[YOUR API TOKEN HERE]
MODEL_NAME=[MODEL NAME HERE]
PROMPT="A futuristic cityscape at sunset, highly detailed"

BODY="{ 
    \"model\": \"$MODEL_NAME\",
    \"prompt\": \"$PROMPT\",
    \"size\": \"1024x1024\"
}"

curl -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     https://openai.inference.de-txl.ionos.com/v1/images/generations

Step 3: Extract and Interpret the Result

The returned JSON includes several key fields, most importantly:

data.[].b64_json: The generated image in base64 format.
usage.prompt_tokens: Token count for the input prompt.
usage.total_tokens: Token count for the entire process (usually zero for image generation, as billing is per image).

Summary

In this tutorial, you learned how to:

Access available image generation models.
Use descriptive prompts to generate high-quality images, ideal for applications in design, creative work, and more.

For information on text generation, refer to our dedicated tutorial on text generation models.

Text Embeddings

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract now and get your AI journey started today!

The IONOS AI Model Hub provides an OpenAI-compatible API that enables embedding generation for text input using state-of-the-art embedding models. Embeddings are multi-dimensional vectors, i.e. lists of numerical values. The more semantically similar the text input, the more similar the embeddings.

Supported embedding models

The following models are currently available for embedding generation in the IONOS AI Model Hub, each suited for different use cases:

Model Source

Model Name

Description

Paraphrase Multilingual MPNet base v2

Transformer model supporting several different languages with high performance and short input length (128 tokens).

BAAI Large EN V1.5

Embedding model specific for english, medium sized inputs (512 tokens).

BAAI M3

Multipurpose embedding model for multilingual text(100 working languages) and large documents (8192 tokens).

Embeddings and semantic similarity

One trivial task for each of us is identifying semantically similar concepts. Think, for example, of the following three texts:

"AI Model Hub"
"Micheal Jackson"
"best selling music artists"

For a human, it would be trivial to decide that "Michael Jackson" and "best selling music artists" are semantically similar, while "AI Model Hub" is not.

Embeddings, a central concept in modern foundation models, offer an effective solution for identifying semantically similar texts. An embedding is a numerical vector of the text with one central property: Semantically similar texts have similar vectors.

In this sense, the three texts above could be transferred into embeddings:

"AI Model Hub": (0.10; 0.10)
"Michael Jackson": (0.95; 0.90)
"best selling music artists": (0.96; 0.87)

Embedding vectors typically have dozens to thousands of dimensions, but for simplicity, we use 2D vectors in this example. One could illustrate these embeddings in a chart as follows:

As you can see, the texts "Michael Jackson" and "best selling music arists" are close to each other, while "AI Model Hub" is not.

Embedding models are models which transfer texts into such embeddings. They are available for texts in a single language, multiple languages, images, spoken language, and more. The IONOS AI Model Hub currently supports embedding models for texts in English as well as models for multiple languages.

Additional information

Mathematically, to compare the similarity of two texts, one has to define a "similarity score". A frequently used "similarity score" is the dot product, sometimes called "cosine similarity". Cosine similarity measures the cosine of the angle between two vectors. It returns values between -1 and 1, where concepts semantically similar are close to 1, while concepts very different are close to -1.

Overview

In this tutorial, you will learn how to generate embeddings via the OpenAI compatible API. This tutorial is intended for developers with basic knowledge of:

REST APIs
A programming language for handling REST API endpoints (Python and Bash examples are provided)

By the end, you will be able to:

Retrieve a list of available embedding models in the IONOS AI Model Hub.
Use the API to generate embeddings with these models.
Use the generated embeddings as input to calculate similarity scores.

Getting Started with Embedding Generation

To use embedding models, first set up your environment and authenticate using the OpenAI-compatible API endpoints.

Download the respective code files to easily access embedding-specific scripts and examples and generate the intended output:

Download this Python Notebook file to easily access embedding-specific scripts and examples and generate the intended output.

Download this Python code file to easily access embedding-specific scripts and examples and generate the intended output.

Download this Bash code file to easily access embedding-specific scripts and examples and generate the intended output.

Step 1: Retrieve Available Models

Fetch a list of embedding models to see which models are available for your use case:

# Python example to retrieve available models
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"

endpoint = "https://openai.inference.de-txl.ionos.com/v1/models"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}", 
    "Content-Type": "application/json"
}
requests.get(endpoint, headers=header).json()

#!/bin/bash

IONOS_API_TOKEN=[YOUR API TOKEN HERE]

curl -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
        --get https://openai.inference.de-txl.ionos.com/v1/models

Output

      {
         "id":"sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
         "object":"model",
         "created":1677610602,
      },
      {
         "id":"BAAI/bge-m3",
         "object":"model",
         "created":1677610602,
      },
      {
         "id":"BAAI/bge-large-en-v1.5",
         "object":"model",
         "created":1677610602,
      },

This query returns a JSON document listing each model's name, which you’ll use to specify a model for embedding generation in later steps.

Step 2: Generate Embeddings with Your Prompt

To generate an embedding, send the text to the /embeddings endpoint.

# Python example for embedding generation
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"
MODEL_NAME = "[MODEL NAME HERE]"
INPUT = ["Michael Jackson", "Metallica"]

endpoint = "https://openai.inference.de-txl.ionos.com/v1/embeddings"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}", 
    "Content-Type": "application/json"
}
body = {
    "model": MODEL_NAME,
    "input": INPUT
}
result = requests.post(endpoint, json=body, headers=header)

#!/bin/bash

IONOS_API_TOKEN=[YOUR API TOKEN HERE]
MODEL_NAME=[MODEL NAME HERE]
INPUT='["Michael Jackson", "Metallica"]'

BODY='{
    "model": "'$MODEL_NAME'",
    "input": '$INPUT'
}'

curl -X POST -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     https://openai.inference.de-txl.ionos.com/v1/embeddings

Step 3: Calculate Similarity Scores

The returned JSON includes several key fields, most importantly:

data.[..].embedding: The generated embedding as a vector of numeric values.
usage.prompt_tokens: Token count for the input prompt.
usage.total_tokens: Token count for the entire process.

Using python, you can calculate the similarity of two results:

# Python example for similarity scoring
import numpy as np
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"
MODEL_NAME = "sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
INPUT = ["Michael Jackson", "Metallica"]

endpoint = "https://openai.inference.de-txl.ionos.com/v1/embeddings"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}", 
    "Content-Type": "application/json"
}
body = {
    "model": MODEL_NAME,
    "input": INPUT
}
result = requests.post(endpoint, json=body, headers=header).json()

embedding_1 = result['data'][0]['embedding']
embedding_2 = result['data'][1]['embedding']

similarity = np.dot(embedding_1, embedding_2)

# 0.18887

The Embeddings API uses standard HTTP error codes to indicate the outcome of a request. The error codes and their description are as below:

200 OK: The request was successful.
401 Unauthorized: The request was unauthorized.
404 Not Found: The requested resource was not found.
500 Internal Server Error: An internal server error occurred.

Summary

In this tutorial, you learned how to:

Access available embedding models.
Generate embeddings with these models.
Calculate similarity scores using the numpy library.

For information on how to use embeddings in document collections, refer to our dedicated tutorial on Document Collections.

Document Collections

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract now and get your AI journey started today!

The IONOS AI Model Hub API allows you to access vector databases to persist your document collections and find semantically similar documents.

The vector database is used to persist documents in document collections. Each document is any form of pure text. In the document collection not only the input text is persisted, but also a transformation of the input text into an embedding. Each embedding is a vector of numbers. Input texts which are semantically similar have similar embeddings. A similarity search on a document collection finds the most similar embeddings for a given input text. These embeddings and the corresponding input text are returned to the user.

Overview

This tutorial is intended for developers. It assumes you have basic knowledge of:

REST APIs and how to call them
A programming language to handle REST API endpoints (for illustration purposes, the tutorials uses Python and Bash scripting)

By the end of this tutorial, you'll be able to:

Create, delete and query a document collection in the IONOS vector database
Save, delete and modify documents in the document collection and
Answer customer queries using the document collection.

Background

The IONOS AI Model Hub API offers a vector database that you can use to persist text in document collections without having to manage corresponding hardware yourself.
Our AI Model Hub API provides all required functionality without your data being transfered out of Germany.

Before you begin

To get started, you should open your IDE to enter Python code.

Next generate a header document to authenticate yourself against the endpoints of our REST API:

# Python example to specify header

API_TOKEN = [YOUR API TOKEN HERE]
header = {
    "Authorization": f"Bearer {API_TOKEN}", 
    "Content-Type": "application/json"
}

After this step, you have one variable header you can use to access our vector database.

To get started, you should open a terminal and ensure that curl and jq is installed. While curl is essential for communicating with our API service, we use jq throughout our examples the improve the readability of the results of our API.

Download the respective code files to easily access document collection-specific scripts and examples and generate the intended output:

Download this Python Notebook file to easily access document collection-specific scripts and examples and generate the intended output.

Download this Python code file to easily access document collection-specific scripts and examples and generate the intended output.

Download this Bash code file to easily access document collection-specific scripts and examples and generate the intended output.

Manage document collections

In this section you learn how to create a document collection. We will use this document collection to fill it with the data from your knowledge base in the next step.

To track, if something went wrong this section also shows how to:

List existing document collections
Remove document collections
Get meta data of a document collection

Create document collections

To create a document collection, you have to specify the name of the collection and a description and invoke the endpoint to generate document collections:

# Python example to specify header
import requests

COLLECTION_NAME = [ YOUR COLLECTION NAME HERE ]
COLLECTION_DESCRIPTION = [ YOUR COLLECTION DESCRIPTION HERE ]
CHUNK_OVERLAP = [ NUMBER OF OVERLAPPING TOKENS ]
CHUNK_SIZE = [ MAXIMUM NUMBER OF TOKENS PER DOCUMENT ]
EMBEDDING_MODEL = [ YOUR EMBEDDING MODEL HERE ]
DATA_BACKEND = [ THE BACKEND TO USE ]
endpoint = "https://inference.de-txl.ionos.com/collections"
body = {
    "properties": {
        "name": COLLECTION_NAME,
        "description": COLLECTION_DESCRIPTION,
        "chunking": {
            "enabled": True,
            "strategy": {
                "config": {
                    "chunk_overlap": CHUNK_OVERLAP, 
                    "chunk_size": CHUNK_SIZE
                }
            }
        },
        "embedding": {
            "model": EMBEDDING_MODEL
        },
        "engine": {
            "db_type": DATA_BACKEND
        }
    }
}
requests.post(endpoint, json=body, headers=header)

#!/bin/bash

COLLECTION_NAME=[ YOUR COLLECTION NAME HERE ]
COLLECTION_DESCRIPTION=[ YOUR COLLECTION DESCRIPTION HERE ]
CHUNK_OVERLAP=[ NUMBER OF OVERLAPPING TOKENS ]
CHUNK_SIZE=[ MAXIMUM NUMBER OF TOKENS PER DOCUMENT ]
EMBEDDING_MODEL=[ YOUR EMBEDDING MODEL HERE ]
DATA_BACKEND=[ THE BACKEND TO USE ]
BODY='{ 
    "properties": { 
        "name": "'${COLLECTION_NAME}'", 
        "description": "'${COLLECTION_DESCRIPTION}'",
        "chunking": {
            "enabled": True,
            "strategy": {
                "config": {
                    "chunk_overlap": "'${CHUNK_OVERLAP}'", 
                    "chunk_size": "'${CHUNK_SIZE}'"
                }
            }
        },
        "embedding": {
            "model": "'${EMBEDDING_MODEL}'"
        },
        "engine": {
            "db_type": "'${DATA_BACKEND}'"
        }
    } 
}'

curl -H "Authorization: Bearer ${API_TOKEN}" \
        -H "Content-Type: application/json" \
        -d "$BODY" \
        https://inference.de-txl.ionos.com/collections

You can specify the following parameters when you create your document collection:

Chunking

The AI Model Hub supports fixed-length chunking. If you apply chunking, long documents are split before being uploaded into the document collection. Is is beneficial in the following cases:

If your document exceeds the length of the text, your embedding model can cope with.
If your document spans over different semantic topics.

You can control chunking using:

CHUNK_OVERLAP: The number of overlaping tokens in two subsequent chunks.
CHUNK_SIZE: The maximum number of tokens per chunk.

Embedding model

The AI Model Hub supports different embedding models. You can use any of them when saving your documents to the document collection by setting the parameter EMBEDDING_MODEL.

Database engine

The AI Model Hub supports different databases in the backend to persist your data. Upon creation of your collection, you can choose which of them to use by setting DATA_BACKEND to:

pgvector: PGVector uses a PostgreSQL database as the backend to persist the document collection. We offer the corresponding PostgreSQL as a Database as a Service (DBaaS) offering. This allows you to scale as your demands grow by switching from the managed PostgreSQL to your own DBaaS instance.
chromadb: ChromaDB is a state-of-the-art database optimized for persisting document collections. It is strongly optimized but does not support relational database features like PostgreSQL.

If you remove the chunking, embedding, and engine sections from the body of your request, we will create a document collection with default parameters. Our approach does not apply chunking; sentence transformers will be used as the embedding models, and the data will be stored in our managed ChromaDB.

If the creation of the document collection was successful, the status code of the request is 201 and it returns a JSON document with all relevant information concerning the document collection.

To modify the document collection you need its identifier. You can extract it from the returned JSON document in the variable id.

List existing document collections

To ensure that the previous step went as expected, you can list the existing document collections.

To retrieve a list of all document collections saved by you:

# Python example to list existing document collections
import requests

endpoint = "https://inference.de-txl.ionos.com/collections"
requests.get(endpoint, headers=header).json()

#!/bin/bash

curl -H "Authorization: Bearer ${API_TOKEN}" \
        -H "Content-Type: application/json" \
        https://inference.de-txl.ionos.com/collections

This query returns a JSON document consisting of your document collections and corresponding meta information

The result consists of 8 attributes per collection of which 3 are relevant for you:

id: The identifier of the document collection
properties.description: The textual description of the document collection
properties.documentsCount: The number of documents persisted in the document collection

If you have not created a collection yet, the field items is an empty list.

Remove a document collection

If the list of document collections consists of document collections you do not need anymore, you can remove a document collection by invoking:

# Python example to delete collections
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
requests.delete(endpoint, headers=header)

#!/bin/bash
COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
curl -H "Authorization: Bearer ${API_TOKEN}" \
     -X DELETE \
     -i https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}

This query returns a status code which indicates whether the deletion was successful:

204: Status code for successfull deletion
404: Status code given the collection did not exist

Get meta data for a document collection

If you are interested in the meta data of a collection, you can extract it by invoking:

# Python example to extract meta information
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
result = requests.get(endpoint, headers=header)
result.status_code

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]

curl -H "Authorization: Bearer ${API_TOKEN}" \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}

This query returns a status code which indicates whether the collection exists:

200: Status code if the collection exists
404: Status code given the collection does not exist

The body of the request consists of all meta data of the document collection.

Manage documents in document collection

In this section, you learn how to add documents to the newly created document collection. To validate your insertion, this section also shows how to

List the documents in the document collection,
Get meta data for a document,
Update an existing document and
Prune a document collection.

Add documents to document collection

To add an entry to the document collection, you need to at least specify the content, the name of the content and the contentType:

# Python example to add documents to a collection
import requests
import base64

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
CONTENT = [ YOUR CONTENT HERE ]
NAME = [ YOUR NAME HERE]
content_base64 = base64.b64encode(CONTENT.encode('utf-8')).decode("utf-8")
body = { 
    "items": [{ 
        "properties": { 
            "name": NAME, 
            "contentType": "text/plain", 
            "content": content_base64
        }
    }]
}
endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/documents"
requests.put(endpoint, json=body, headers=header)

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
CONTENT=[ YOUR CONTENT HERE ]
NAME=[ YOUR NAME HERE]

CONTENT_BASE64=$(echo -n ${CONTENT} | base64)
BODY='{
    "items": [{
        "properties": {
            "name": "'${NAME}'", 
            "contentType": "text/plain", 
            "content": "'${CONTENT_BASE64}'"
        }
    }]
}'

curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     -X PUT \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents

Note:

You need to encode your content using base64 prior to adding it to the document collection. This is done here in line 7 of the source code. We imply a document limit of 65535 characters for each document you upload. Please ensure that your documents do not exceed this limit.

This request returns a status code 200 if adding the document to the document collection was successful.

List existing documents in document collection

To ensure that the previous step went as expected, you can list the existing documents of your document collection.

To retrieve a list of all documents in the document collection saved by you:

# Python example to list all existing documents in a document collections
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]

endpoint_coll = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
endpoint = f"{endpoint_coll}/documents"

requests.get(endpoint, headers=header).json()

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]

curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents

This query returns a JSON document consisting of your documents in the document collection and corresponding meta information

The result has a field items with all documents in the collection. This field consists of 10 attributes per entry of which 5 are relevant for you:

id: The identifier of the document
properties.content: The base64 encoded content of the document
properties.name: The name of the document
properties.description: The description of the document
properties.labels.number_of_tokens: The number of tokens in the document

If you have not created the collection yet, the request will return a status code 404. It will return a JSON document with the field items set to an empty list if no documents were added yet.

Get meta data for a document

If you are interested in the metadata of a document, you can extract it by invoking:

# Python example to access meta data from a document
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
DOCUMENT_ID = [ YOUR DOCUMENT ID HERE ]

endpoint_coll = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
endpoint = f"{endpoint_coll}/documents/{DOCUMENT_ID}"

requests.get(endpoint, headers=header).json()

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
DOCUMENT_ID=[ YOUR DOCUMENT ID HERE ]

curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents/${DOCUMENT_ID}

This query returns a status code which indicates whether the document exists:

200: Status code if the document exists
404: Status code given the document does not exist

The body of the request consists of all meta data of the document.

Update a document

If you want to update a document, invoke:

# Python example to update a document
import requests
import base64

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
DOCUMENT_ID = [ YOUR DOCUMENT ID HERE ]
CONTENT = [ YOUR CONTENT HERE ]
NAME = [ YOUR NAME HERE ]

endpoint_coll = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
endpoint = f"{endpoint_coll}/documents/{DOCUMENT_ID}"
content_base64 = base64.b64encode(CONTENT.encode('utf-8')).decode("utf-8")
body = { 
    "properties": { 
        "id": DOCUMENT_ID, 
        "name": NAME, 
        "contentType": 
        "text/plain", 
        "content": content_base64
    }
}

requests.put(endpoint, json=body, headers=header)

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
DOCUMENT_ID=[ YOUR DOCUMENT ID HERE ]
CONTENT=[ YOUR CONTENT HERE ]
NAME=[ YOUR NAME HERE ]
CONTENT_BASE64=$(echo -n ${CONTENT} | base64)
BODY='{
    "properties": {
        "name": "'${NAME}'", 
        "id": "'${DOCUMENT_ID}'", 
        "contentType": "text/plain", 
        "content": "'${CONTENT_BASE64}'"
    }
}'
curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     -X PUT \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents/${DOCUMENT_ID}

This will replace the existing entry in the document collection with the given id by the payload of this request.

Prune a document collection

If you want to remove all documents from a document collection invoke:

# Python example to prune a document collection
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
endpoint_coll = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
endpoint = f"{endpoint_coll}/documents"
requests.delete(endpoint, headers=header)

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
curl -H "Authorization: Bearer ${API_TOKEN}" \
     -X DELETE \
     -i https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents

This query returns the status code 204 if pruning the document collection was successful.

Query documents in the document collection

Finally, this section shows how to use the document collection and the contained documents to answer a user query.

To retrieve the documents relevant for answering the user query, invoke the query endpoint as follows:

# Python example to retrieve documents for answering a user query
import requests
import base64

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
USER_QUERY = [ USER QUERY HERE ]
NUM_OF_DOCUMENTS = [ NUMBER OF DOCUMENTS TO CONSIDER HERE ]

endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/query"
body = {"query": USER_QUERY, "limit": NUM_OF_DOCUMENTS }

relevant_documents = requests.post(endpoint, json=body, headers=header)

[
    base64.b64decode(entry['document']['properties']['content']).decode()
    for entry in relevant_documents.json()['properties']['matches']
]

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
USER_QUERY=[ USER QUERY HERE ]
NUM_OF_DOCUMENTS=[ NUMBER OF DOCUMENTS TO CONSIDER HERE ]
BODY='{"query": "'${USER_QUERY}'", "limit": "'${NUM_OF_DOCUMENTS}'" }'

RESULT=$(curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     -X POST \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/query)

echo "$RESULT" | \
jq -r '.["properties"]["matches"][].["document"]["properties"]["content"]' | \
base64 --decode

This will return a list of the NUM_OF_DOCUMENTS most relevant documents in your document collection for answering the user query.

Summary

In this tutorial you learned how to use the IONOS AI Model Hub API to conduct semantic similarity searches using our vector database.

Namely, you learned how to:

Create a necessary document collection in the vector database and modify it
Insert your documents into the document collection and modify the documents
Conduct semantic similarity searches using your document collection.

Retrieval Augmented Generation

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract now and get your AI journey started today!

The IONOS AI Model Hub allows you to combine Large Language Models and a vector database to implement Retrieval Augmented Generation use cases.

Retrieval Augmented Generation is an approach that allows you to teach an existing Large Language Model, such as LLama or Mistral, to answer not only based on the knowledge the model learned during training, but also based on the knowledge you specified yourself.

Retrieval Augmented Generation uses two components:

a Large Language Model (we offer corresponding models for text generation) and
Document Collections

If one of your users queries your Retrieval Augmented Generation system, you first get the most similar documents from the corresponding document collection. Second, you ask the Large Language Model to answer the query by using both the knowledge it was trained on and the most similar documents from your document collection.

Overview

This tutorial is intended for developers. It assumes you have basic knowledge of:

REST APIs and how to call them
A programming language to handle REST API endpoints (for illustration purposes, the tutorials use Python and Bash scripting)

You should also be familiar with the IONOS:

Text Generation
Document Collections

By the end of this tutorial, you'll be able to: Answer customer queries using a Large Language Model which adds data from your document collections to the answers.

Background

The IONOS AI Model Hub API offers both document collections and Large Language Models that you can use to implement retrieval augmented generation without having to manage corresponding hardware yourself.
Our AI Model Hub API provides all required functionality without your data being transferred out of Germany.

Before you begin

To get started, set up a document collection using Document Collections and get the identifier of this document collection.

You will need this identifier in the subsequent steps.

To get started, you should open your IDE to enter Python code.

Next generate a header document to authenticate yourself against the endpoints of our REST API:

# Python example to specify header

API_TOKEN = [YOUR API TOKEN HERE]
header = {
    "Authorization": f"Bearer {API_TOKEN}", 
    "Content-Type": "application/json"
}

After this step, you have one variable header you can use to access our vector database.

To get started, you should open a terminal and ensure that curl and jq are installed. While curl is essential for communicating with our API service, we use jq throughout our examples the improve the readability of the results of our API.

Download the respective code files to easily access retrieval augmented generation-specific scripts and examples and generate the intended output:

Download this Python Notebook file to easily access retrieval augmented generation-specific scripts and examples and generate the intended output.

Download this Python code file to easily access retrieval augmented generation-specific scripts and examples and generate the intended output.

Download this Bash code file to easily access retrieval augmented generation-specific scripts and examples and generate the intended output.

Access list of available Large Language Models

To retrieve a list of Large Language Models supported by the IONOS AI Model Hub API enter:

# Python example to get all available models
import requests

API_TOKEN = [ YOUR API TOKEN HERE ]
header = {
    "Authorization": f"Bearer {API_TOKEN}", 
    "Content-Type": "application/json"
}
requests.get("https://inference.de-txl.ionos.com/models", headers=header).json()

#!/bin/bash

API_TOKEN=[ YOUR API TOKEN HERE ]

curl -H "Authorization: Bearer ${API_TOKEN}" \
     --get https://inference.de-txl.ionos.com/models

This query returns a JSON document consisting of all foundation models and corresponding meta information.

The JSON document consists an entry items*. This is a list of all available foundation models. Of the 7 attributes per foundation model 3 are relevant for you:

id: The identifier of the foundation model
properties.description: The textual description of the model
properties.name: The name of the model

Note:

The identifiers for the foundation models differ between our API for Retrival Augmented Generation and for the image generation and text generation endpoints compatible with OpenAI.

From the list you generated in the previous step, choose the model you want to use and the id. You will use this id in the next step to use the foundation model.

Manual retrieval augmented generation

This section shows how to use the document collection and the contained documents to answer a user query.

Step 1: Retrieve relevant documents

To retrieve the documents relevant to answering the user query, invoke the query endpoint as follows:

# Python example to retrieve relevant documents
import requests
import base64

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
USER_QUERY = [ USER QUERY HERE ]
NUM_OF_DOCUMENTS = [ NUMBER OF DOCUMENTS TO CONSIDER HERE ]

endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/query"
body = {"query": USER_QUERY, "limit": NUM_OF_DOCUMENTS }
relevant_documents = requests.post(endpoint, json=body, headers=header)

relevant_documents_decoded = [
    base64.b64decode(entry['document']['properties']['content']).decode()
    for entry in relevant_documents.json()['properties']['matches']
]

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
USER_QUERY=[ USER QUERY HERE ]
NUM_OF_DOCUMENTS=[ NUMBER OF DOCUMENTS TO CONSIDER HERE ]

BODY='{"query": "'$USER_QUERY'", "limit": '$NUM_OF_DOCUMENTS' }'
RESULT=$(\
    curl -H "Authorization: Bearer ${API_TOKEN}" \
         -H "Content-Type: application/json" \
         -d "$BODY" \
         https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/query \
)
RELEVANT_DOCUMENTS_DECODED=$( \
    echo "$RESULT" | \
    jq -r '.["properties"]["matches"][].["document"]["properties"]["content"]' | \
    base64 --decode \
)

This will return a list of the NUM_OF_DOCUMENTS most relevant documents in your document collection for answering the user query.

Step 2: Generate final answer

Now, combine the user query and the result from the document collection in one prompt:

# Python example to retrieve relevant documents
import requests

MODEL_ID = [ YOUR MODEL ID HERE]
endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
prompt = f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be an honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {"; ".join(relevant_documents_decoded)}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {USER_QUERY}<|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
"""
body = { "properties": {"input": prompt} }
requests.post(endpoint, json=body, headers=header).json()

#!/bin/bash

MODEL_ID=[ YOUR MODEL ID HERE]
QUERY="<|begin_of_text|><|start_header_id|>system<|end_header_id|> \
    Please use the information specified as context to answer the question. \
    Formulate you answer in one sentence and be an honest AI.<|eot_id|> \
    <|begin_of_text|><|start_header_id|>context<|end_header_id|> \
    ${RELEVANT_DOCUMENTS_DECODED}<|eot_id|> \
    <|start_header_id|>user<|end_header_id|> \
    ${USER_QUERY}<|eot_id|> \
    <|start_header_id|>assistant<|end_header_id|>"
BODY='{ "properties": {"input": "'${QUERY}'"} }'
RESULT=$( \
    curl -H "Authorization: Bearer ${API_TOKEN}" \
            -H "Content-Type: application/json" \
            -d "$BODY" \
            https://inference.de-txl.ionos.com/models/${MODEL_ID}/predictions \
)

The result will be a JSON-Document consisting of the answer to the customer and some meta information. You can access it in the field at properties.output

Note:

The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.

Automated Retrieval Augmented Generation

The IONOS AI Model Hub allows for automating the process described above. Namely, by specifying the collection ID and the collection query directly to our foundation model endpoint, it first queries the document collection and returns it in a variable which you can then directly use in your prompt. This section describes how to do this.

Apply combined retrieval augmented generation prompt to foundation model

To implement a Retrieval Augmented Generation use case with only one prompt, you have to invoke the /predictions endpoint of the Large Language Model you want to use and send the prompt as part of the body of this query:

# Python example to retrieve relevant documents
import requests

MODEL_ID = [ YOUR MODEL ID HERE]
COLLECTION_ID = [ YOUR COLLECTION ID HERE]
USER_QUERY = [ USER QUERY HERE ]

endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
body = { "properties": {
    "input": f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be a honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {{{{.context}}}}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {{{{.collection_query}}}} <|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    """,
    "collectionId": COLLECTION_ID,
    "collectionQuery": USER_QUERY,
    "options": {  
        "max_length": "500",  
        "temperature": "0.01"
    }  
}}
requests.post(endpoint, json=body, headers=header).json()

#!/bin/bash

MODEL_ID=[ YOUR MODEL ID HERE]
COLLECTION_ID=[ YOUR COLLECTION ID HERE]
USER_QUERY=[ USER QUERY HERE ]

QUERY="<|begin_of_text|><|start_header_id|>system<|end_header_id|> \
    Please use the information specified as context to answer the question. \
    Formulate your answer in one sentence and be an honest AI.<|eot_id|> \
    <|begin_of_text|><|start_header_id|>context<|end_header_id|> \
    {{.context}}<|eot_id|> \
    <|start_header_id|>user<|end_header_id|> \
    {{.collection_query}} <|eot_id|> \
    <|start_header_id|>assistant<|end_header_id|>"
BODY='{ "properties": { 
    "input": "'${QUERY}'", 
    "collectionId": "'${COLLECTION_ID}'",
    "collectionQuery": "'${USER_QUERY}'",
    "options": { 
        "max_length": "500", 
        "temperature": "0.01"
    } 
}}'

curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     https://inference.de-txl.ionos.com/models/${MODEL_ID}/predictions

This query conducts all steps necessary to answer a user query using Retrieval Augmented Generation:

The user query (saved at collectionQuery) is sent to the collection (specified at collectionId).
The results of this query are saved in a variable .context, while the user query is saved in a variable .collection_query. You can use both variables in your prompt.
The example prompt uses the variables .context and .collection_query to answer the customer query.

Note:

The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.

Summary

In this tutorial, you learned how to use the IONOS AI Model Hub API to implement Retrieval Augmented Generation use cases.

Namely, you learned how to: Derive answers to user queries using the content of your document collection and one of the IONOS foundation models.

Tool Integration

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract now and get your AI journey started today!

The IONOS AI Model Hub provides an OpenAI-compatible API, allowing seamless integration with various frontend tools that use Large Language Models (LLMs). This guide walks you through the setup process, using AnythingLLM as an example tool.

By the end of this tutorial, you will be able to configure AnythingLLM to use the IONOS AI Model Hub as its backend for AI-powered responses.

Step 1: Get an Authentication Token

You will need an authentication token to access the IONOS AI Model Hub. For more information about how to generate your token in the IONOS DCD, see Generate authentication token.

Save this token in a secure place, as you’ll need to enter it into AnythingLLM during setup.

Step 2: Select a Language Model

The IONOS AI Model Hub offers a variety of Large Language Models to suit different needs. Choose the model that best fits your use case from the table below:

Foundation Model

Model Name

Purpose

Teuken v0.4 Instruct, 7B

openGPT-X/Teuken-7B-instruct-commercial-v0.4

Ideal for conversational agents and virtual assistants in any of the 24 EU languages.

Llama 3.3 Instruct, 70B

meta-llama/Llama-3.3-70B-Instruct

Ideal for more complex conversational agents and virtual assistants.

Llama 3.1 Instruct, 8B

meta-llama/Meta-Llama-3.1-8B-Instruct

Suitable for general-purpose dialogue and language tasks.

Llama 3.1 Instruct, 70B

meta-llama/Meta-Llama-3.1-70B-Instruct

Ideal for more complex conversational agents and virtual assistants.

Llama 3.1 Instruct, 405B

meta-llama/Meta-Llama-3.1-405B-Instruct-FP8

Optimized for extensive dialogue tasks, supporting large context windows.

Mistral Instruct v0.3, 7B

mistralai/Mistral-7B-Instruct-v0.3

Designed for conversational agents, with enhanced European language support.

Mixtral, 8x7B

mistralai/Mixtral-8x7B-Instruct-v0.1

Supports multilingual interactions and is optimized for diverse contexts.

During setup, you’ll enter the model’s "Model Name" value into AnythingLLM’s configuration.

Step 3: Obtain the Base URL

For connecting to the IONOS AI Model Hub, use the following Base URL for the OpenAI-compatible API:

    https://openai.inference.de-txl.ionos.com/v1

You will enter this URL in the configuration settings of AnythingLLM.

Step 4: Configure AnythingLLM

With your authentication token, selected model name, and base URL in hand, you’re ready to set up AnythingLLM:

Open AnythingLLM and go to the configuration page for the Large Language Model (LLM) settings.
- In AnythingLLM, this can be accessed by clicking the wrench icon in the lower left corner, then navigating to AI Providers -> LLM.
Choose Generic OpenAI as the provider.
Enter the following information in the respective fields:
- API Key: Your IONOS authentication token.
- Model Name: The name of the model you selected from the table (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct).
- Base URL: https://openai.inference.de-txl.ionos.com/v1

Your screen should look similar to the image below:

Click Save Changes to apply the settings.

From now on, AnythingLLM will use the IONOS AI Model Hub as its backend, enabling AI-powered functionality based on your chosen Large Language Model.

Summary

This guide provides a straightforward path for integrating the IONOS AI Model Hub into third-party frontend tools using the OpenAI-compatible API. For other tools and more advanced configurations, the steps will be similar: generate an API key, select a model, and configure the tool’s API settings.

Document Collections

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract now and get your AI journey started today!

The IONOS AI Model Hub API allows you to access vector databases to persist your document collections and find semantically similar documents.

Overview

This tutorial is intended for developers. It assumes you have basic knowledge of:

REST APIs and how to call them
A programming language to handle REST API endpoints (for illustration purposes, the tutorials uses Python and Bash scripting)

By the end of this tutorial, you'll be able to:

Create, delete and query a document collection in the IONOS vector database
Save, delete and modify documents in the document collection and
Answer customer queries using the document collection.

Background

The IONOS AI Model Hub API offers a vector database that you can use to persist text in document collections without having to manage corresponding hardware yourself.
Our AI Model Hub API provides all required functionality without your data being transfered out of Germany.

Before you begin

To get started, you should open your IDE to enter Python code.

Next generate a header document to authenticate yourself against the endpoints of our REST API:

# Python example to specify header

API_TOKEN = [YOUR API TOKEN HERE]
header = {
    "Authorization": f"Bearer {API_TOKEN}", 
    "Content-Type": "application/json"
}

After this step, you have one variable header you can use to access our vector database.

Download the respective code files to easily access document collection-specific scripts and examples and generate the intended output:

Download this Python Notebook file to easily access document collection-specific scripts and examples and generate the intended output.

44KB

ai-model-hub-document-collections.ipynb

Download this Python code file to easily access document collection-specific scripts and examples and generate the intended output.

5KB

ai-model-hub-document-collections.py

Download this Bash code file to easily access document collection-specific scripts and examples and generate the intended output.

5KB

ai-model-hub-document-collections.sh

Manage document collections

In this section you learn how to create a document collection. We will use this document collection to fill it with the data from your knowledge base in the next step.

To track, if something went wrong this section also shows how to:

List existing document collections
Remove document collections
Get meta data of a document collection

Create document collections

To create a document collection, you have to specify the name of the collection and a description and invoke the endpoint to generate document collections:

# Python example to specify header
import requests

COLLECTION_NAME = [ YOUR COLLECTION NAME HERE ]
COLLECTION_DESCRIPTION = [ YOUR COLLECTION DESCRIPTION HERE ]
CHUNK_OVERLAP = [ NUMBER OF OVERLAPPING TOKENS ]
CHUNK_SIZE = [ MAXIMUM NUMBER OF TOKENS PER DOCUMENT ]
EMBEDDING_MODEL = [ YOUR EMBEDDING MODEL HERE ]
DATA_BACKEND = [ THE BACKEND TO USE ]
endpoint = "https://inference.de-txl.ionos.com/collections"
body = {
    "properties": {
        "name": COLLECTION_NAME,
        "description": COLLECTION_DESCRIPTION,
        "chunking": {
            "enabled": True,
            "strategy": {
                "config": {
                    "chunk_overlap": CHUNK_OVERLAP, 
                    "chunk_size": CHUNK_SIZE
                }
            }
        },
        "embedding": {
            "model": EMBEDDING_MODEL
        },
        "engine": {
            "db_type": DATA_BACKEND
        }
    }
}
requests.post(endpoint, json=body, headers=header)

#!/bin/bash

COLLECTION_NAME=[ YOUR COLLECTION NAME HERE ]
COLLECTION_DESCRIPTION=[ YOUR COLLECTION DESCRIPTION HERE ]
CHUNK_OVERLAP=[ NUMBER OF OVERLAPPING TOKENS ]
CHUNK_SIZE=[ MAXIMUM NUMBER OF TOKENS PER DOCUMENT ]
EMBEDDING_MODEL=[ YOUR EMBEDDING MODEL HERE ]
DATA_BACKEND=[ THE BACKEND TO USE ]
BODY='{ 
    "properties": { 
        "name": "'${COLLECTION_NAME}'", 
        "description": "'${COLLECTION_DESCRIPTION}'",
        "chunking": {
            "enabled": True,
            "strategy": {
                "config": {
                    "chunk_overlap": "'${CHUNK_OVERLAP}'", 
                    "chunk_size": "'${CHUNK_SIZE}'"
                }
            }
        },
        "embedding": {
            "model": "'${EMBEDDING_MODEL}'"
        },
        "engine": {
            "db_type": "'${DATA_BACKEND}'"
        }
    } 
}'

curl -H "Authorization: Bearer ${API_TOKEN}" \
        -H "Content-Type: application/json" \
        -d "$BODY" \
        https://inference.de-txl.ionos.com/collections

You can specify the following parameters when you create your document collection:

Chunking

The AI Model Hub supports fixed-length chunking. If you apply chunking, long documents are split before being uploaded into the document collection. Is is beneficial in the following cases:

If your document exceeds the length of the text, your embedding model can cope with.
If your document spans over different semantic topics.

You can control chunking using:

CHUNK_OVERLAP: The number of overlaping tokens in two subsequent chunks.
CHUNK_SIZE: The maximum number of tokens per chunk.

Embedding model

The AI Model Hub supports different embedding models. You can use any of them when saving your documents to the document collection by setting the parameter EMBEDDING_MODEL.

Database engine

The AI Model Hub supports different databases in the backend to persist your data. Upon creation of your collection, you can choose which of them to use by setting DATA_BACKEND to:

pgvector: PGVector uses a PostgreSQL database as the backend to persist the document collection. We offer the corresponding PostgreSQL as a Database as a Service (DBaaS) offering. This allows you to scale as your demands grow by switching from the managed PostgreSQL to your own DBaaS instance.
chromadb: ChromaDB is a state-of-the-art database optimized for persisting document collections. It is strongly optimized but does not support relational database features like PostgreSQL.

If the creation of the document collection was successful, the status code of the request is 201 and it returns a JSON document with all relevant information concerning the document collection.

To modify the document collection you need its identifier. You can extract it from the returned JSON document in the variable id.

List existing document collections

To ensure that the previous step went as expected, you can list the existing document collections.

To retrieve a list of all document collections saved by you:

# Python example to list existing document collections
import requests

endpoint = "https://inference.de-txl.ionos.com/collections"
requests.get(endpoint, headers=header).json()

#!/bin/bash

curl -H "Authorization: Bearer ${API_TOKEN}" \
        -H "Content-Type: application/json" \
        https://inference.de-txl.ionos.com/collections

This query returns a JSON document consisting of your document collections and corresponding meta information

The result consists of 8 attributes per collection of which 3 are relevant for you:

id: The identifier of the document collection
properties.description: The textual description of the document collection
properties.documentsCount: The number of documents persisted in the document collection

If you have not created a collection yet, the field items is an empty list.

Remove a document collection

If the list of document collections consists of document collections you do not need anymore, you can remove a document collection by invoking:

# Python example to delete collections
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
requests.delete(endpoint, headers=header)

#!/bin/bash
COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
curl -H "Authorization: Bearer ${API_TOKEN}" \
     -X DELETE \
     -i https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}

This query returns a status code which indicates whether the deletion was successful:

204: Status code for successfull deletion
404: Status code given the collection did not exist

Get meta data for a document collection

If you are interested in the meta data of a collection, you can extract it by invoking:

# Python example to extract meta information
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
result = requests.get(endpoint, headers=header)
result.status_code

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]

curl -H "Authorization: Bearer ${API_TOKEN}" \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}

This query returns a status code which indicates whether the collection exists:

200: Status code if the collection exists
404: Status code given the collection does not exist

The body of the request consists of all meta data of the document collection.

Manage documents in document collection

In this section, you learn how to add documents to the newly created document collection. To validate your insertion, this section also shows how to

List the documents in the document collection,
Get meta data for a document,
Update an existing document and
Prune a document collection.

Add documents to document collection

To add an entry to the document collection, you need to at least specify the content, the name of the content and the contentType:

# Python example to add documents to a collection
import requests
import base64

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
CONTENT = [ YOUR CONTENT HERE ]
NAME = [ YOUR NAME HERE]
content_base64 = base64.b64encode(CONTENT.encode('utf-8')).decode("utf-8")
body = { 
    "items": [{ 
        "properties": { 
            "name": NAME, 
            "contentType": "text/plain", 
            "content": content_base64
        }
    }]
}
endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/documents"
requests.put(endpoint, json=body, headers=header)

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
CONTENT=[ YOUR CONTENT HERE ]
NAME=[ YOUR NAME HERE]

CONTENT_BASE64=$(echo -n ${CONTENT} | base64)
BODY='{
    "items": [{
        "properties": {
            "name": "'${NAME}'", 
            "contentType": "text/plain", 
            "content": "'${CONTENT_BASE64}'"
        }
    }]
}'

curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     -X PUT \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents

Note:

This request returns a status code 200 if adding the document to the document collection was successful.

List existing documents in document collection

To ensure that the previous step went as expected, you can list the existing documents of your document collection.

To retrieve a list of all documents in the document collection saved by you:

# Python example to list all existing documents in a document collections
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]

endpoint_coll = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
endpoint = f"{endpoint_coll}/documents"

requests.get(endpoint, headers=header).json()

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]

curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents

This query returns a JSON document consisting of your documents in the document collection and corresponding meta information

The result has a field items with all documents in the collection. This field consists of 10 attributes per entry of which 5 are relevant for you:

id: The identifier of the document
properties.content: The base64 encoded content of the document
properties.name: The name of the document
properties.description: The description of the document
properties.labels.number_of_tokens: The number of tokens in the document

If you have not created the collection yet, the request will return a status code 404. It will return a JSON document with the field items set to an empty list if no documents were added yet.

Get meta data for a document

If you are interested in the metadata of a document, you can extract it by invoking:

# Python example to access meta data from a document
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
DOCUMENT_ID = [ YOUR DOCUMENT ID HERE ]

endpoint_coll = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
endpoint = f"{endpoint_coll}/documents/{DOCUMENT_ID}"

requests.get(endpoint, headers=header).json()

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
DOCUMENT_ID=[ YOUR DOCUMENT ID HERE ]

curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents/${DOCUMENT_ID}

This query returns a status code which indicates whether the document exists:

200: Status code if the document exists
404: Status code given the document does not exist

The body of the request consists of all meta data of the document.

Update a document

If you want to update a document, invoke:

# Python example to update a document
import requests
import base64

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
DOCUMENT_ID = [ YOUR DOCUMENT ID HERE ]
CONTENT = [ YOUR CONTENT HERE ]
NAME = [ YOUR NAME HERE ]

endpoint_coll = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
endpoint = f"{endpoint_coll}/documents/{DOCUMENT_ID}"
content_base64 = base64.b64encode(CONTENT.encode('utf-8')).decode("utf-8")
body = { 
    "properties": { 
        "id": DOCUMENT_ID, 
        "name": NAME, 
        "contentType": 
        "text/plain", 
        "content": content_base64
    }
}

requests.put(endpoint, json=body, headers=header)

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
DOCUMENT_ID=[ YOUR DOCUMENT ID HERE ]
CONTENT=[ YOUR CONTENT HERE ]
NAME=[ YOUR NAME HERE ]
CONTENT_BASE64=$(echo -n ${CONTENT} | base64)
BODY='{
    "properties": {
        "name": "'${NAME}'", 
        "id": "'${DOCUMENT_ID}'", 
        "contentType": "text/plain", 
        "content": "'${CONTENT_BASE64}'"
    }
}'
curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     -X PUT \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents/${DOCUMENT_ID}

This will replace the existing entry in the document collection with the given id by the payload of this request.

Prune a document collection

If you want to remove all documents from a document collection invoke:

# Python example to prune a document collection
import requests

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
endpoint_coll = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}"
endpoint = f"{endpoint_coll}/documents"
requests.delete(endpoint, headers=header)

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
curl -H "Authorization: Bearer ${API_TOKEN}" \
     -X DELETE \
     -i https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/documents

This query returns the status code 204 if pruning the document collection was successful.

Query documents in the document collection

Finally, this section shows how to use the document collection and the contained documents to answer a user query.

To retrieve the documents relevant for answering the user query, invoke the query endpoint as follows:

# Python example to retrieve documents for answering a user query
import requests
import base64

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
USER_QUERY = [ USER QUERY HERE ]
NUM_OF_DOCUMENTS = [ NUMBER OF DOCUMENTS TO CONSIDER HERE ]

endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/query"
body = {"query": USER_QUERY, "limit": NUM_OF_DOCUMENTS }

relevant_documents = requests.post(endpoint, json=body, headers=header)

[
    base64.b64decode(entry['document']['properties']['content']).decode()
    for entry in relevant_documents.json()['properties']['matches']
]

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
USER_QUERY=[ USER QUERY HERE ]
NUM_OF_DOCUMENTS=[ NUMBER OF DOCUMENTS TO CONSIDER HERE ]
BODY='{"query": "'${USER_QUERY}'", "limit": "'${NUM_OF_DOCUMENTS}'" }'

RESULT=$(curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     -X POST \
     https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/query)

echo "$RESULT" | \
jq -r '.["properties"]["matches"][].["document"]["properties"]["content"]' | \
base64 --decode

This will return a list of the NUM_OF_DOCUMENTS most relevant documents in your document collection for answering the user query.

Summary

In this tutorial you learned how to use the IONOS AI Model Hub API to conduct semantic similarity searches using our vector database.

Namely, you learned how to:

Create a necessary document collection in the vector database and modify it
Insert your documents into the document collection and modify the documents
Conduct semantic similarity searches using your document collection.

Retrieval Augmented Generation

AI Model Hub for free: From December 1, 2024 until March 31, 2025, IONOS offers all foundation models of the AI Model Hub for free. Create your contract now and get your AI journey started today!

The IONOS AI Model Hub allows you to combine Large Language Models and a vector database to implement Retrieval Augmented Generation use cases.

Retrieval Augmented Generation uses two components:

a Large Language Model (we offer corresponding models for text generation) and
Document Collections

Overview

This tutorial is intended for developers. It assumes you have basic knowledge of:

REST APIs and how to call them
A programming language to handle REST API endpoints (for illustration purposes, the tutorials use Python and Bash scripting)

You should also be familiar with the IONOS:

Text Generation
Document Collections

By the end of this tutorial, you'll be able to: Answer customer queries using a Large Language Model which adds data from your document collections to the answers.

Background

The IONOS AI Model Hub API offers both document collections and Large Language Models that you can use to implement retrieval augmented generation without having to manage corresponding hardware yourself.
Our AI Model Hub API provides all required functionality without your data being transferred out of Germany.

Before you begin

To get started, set up a document collection using Document Collections and get the identifier of this document collection.

You will need this identifier in the subsequent steps.

To get started, you should open your IDE to enter Python code.

Next generate a header document to authenticate yourself against the endpoints of our REST API:

# Python example to specify header

API_TOKEN = [YOUR API TOKEN HERE]
header = {
    "Authorization": f"Bearer {API_TOKEN}", 
    "Content-Type": "application/json"
}

After this step, you have one variable header you can use to access our vector database.

Download the respective code files to easily access retrieval augmented generation-specific scripts and examples and generate the intended output:

Download this Python Notebook file to easily access retrieval augmented generation-specific scripts and examples and generate the intended output.

19KB

ai-model-hub-retrieval-augmented-generation.ipynb

Download this Python code file to easily access retrieval augmented generation-specific scripts and examples and generate the intended output.

4KB

ai-model-hub-retrieval-augmented-generation.py

Download this Bash code file to easily access retrieval augmented generation-specific scripts and examples and generate the intended output.

4KB

ai-model-hub-retrieval-augmented-generation.sh

Access list of available Large Language Models

To retrieve a list of Large Language Models supported by the IONOS AI Model Hub API enter:

# Python example to get all available models
import requests

API_TOKEN = [ YOUR API TOKEN HERE ]
header = {
    "Authorization": f"Bearer {API_TOKEN}", 
    "Content-Type": "application/json"
}
requests.get("https://inference.de-txl.ionos.com/models", headers=header).json()

#!/bin/bash

API_TOKEN=[ YOUR API TOKEN HERE ]

curl -H "Authorization: Bearer ${API_TOKEN}" \
     --get https://inference.de-txl.ionos.com/models

This query returns a JSON document consisting of all foundation models and corresponding meta information.

The JSON document consists an entry items*. This is a list of all available foundation models. Of the 7 attributes per foundation model 3 are relevant for you:

id: The identifier of the foundation model
properties.description: The textual description of the model
properties.name: The name of the model

Note:

The identifiers for the foundation models differ between our API for Retrival Augmented Generation and for the image generation and text generation endpoints compatible with OpenAI.

From the list you generated in the previous step, choose the model you want to use and the id. You will use this id in the next step to use the foundation model.

Manual retrieval augmented generation

This section shows how to use the document collection and the contained documents to answer a user query.

Step 1: Retrieve relevant documents

To retrieve the documents relevant to answering the user query, invoke the query endpoint as follows:

# Python example to retrieve relevant documents
import requests
import base64

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
USER_QUERY = [ USER QUERY HERE ]
NUM_OF_DOCUMENTS = [ NUMBER OF DOCUMENTS TO CONSIDER HERE ]

endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/query"
body = {"query": USER_QUERY, "limit": NUM_OF_DOCUMENTS }
relevant_documents = requests.post(endpoint, json=body, headers=header)

relevant_documents_decoded = [
    base64.b64decode(entry['document']['properties']['content']).decode()
    for entry in relevant_documents.json()['properties']['matches']
]

#!/bin/bash

COLLECTION_ID=[ YOUR COLLECTION ID HERE ]
USER_QUERY=[ USER QUERY HERE ]
NUM_OF_DOCUMENTS=[ NUMBER OF DOCUMENTS TO CONSIDER HERE ]

BODY='{"query": "'$USER_QUERY'", "limit": '$NUM_OF_DOCUMENTS' }'
RESULT=$(\
    curl -H "Authorization: Bearer ${API_TOKEN}" \
         -H "Content-Type: application/json" \
         -d "$BODY" \
         https://inference.de-txl.ionos.com/collections/${COLLECTION_ID}/query \
)
RELEVANT_DOCUMENTS_DECODED=$( \
    echo "$RESULT" | \
    jq -r '.["properties"]["matches"][].["document"]["properties"]["content"]' | \
    base64 --decode \
)

This will return a list of the NUM_OF_DOCUMENTS most relevant documents in your document collection for answering the user query.

Step 2: Generate final answer

Now, combine the user query and the result from the document collection in one prompt:

# Python example to retrieve relevant documents
import requests

MODEL_ID = [ YOUR MODEL ID HERE]
endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
prompt = f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be an honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {"; ".join(relevant_documents_decoded)}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {USER_QUERY}<|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
"""
body = { "properties": {"input": prompt} }
requests.post(endpoint, json=body, headers=header).json()

#!/bin/bash

MODEL_ID=[ YOUR MODEL ID HERE]
QUERY="<|begin_of_text|><|start_header_id|>system<|end_header_id|> \
    Please use the information specified as context to answer the question. \
    Formulate you answer in one sentence and be an honest AI.<|eot_id|> \
    <|begin_of_text|><|start_header_id|>context<|end_header_id|> \
    ${RELEVANT_DOCUMENTS_DECODED}<|eot_id|> \
    <|start_header_id|>user<|end_header_id|> \
    ${USER_QUERY}<|eot_id|> \
    <|start_header_id|>assistant<|end_header_id|>"
BODY='{ "properties": {"input": "'${QUERY}'"} }'
RESULT=$( \
    curl -H "Authorization: Bearer ${API_TOKEN}" \
            -H "Content-Type: application/json" \
            -d "$BODY" \
            https://inference.de-txl.ionos.com/models/${MODEL_ID}/predictions \
)

The result will be a JSON-Document consisting of the answer to the customer and some meta information. You can access it in the field at properties.output

Note:

The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.

Automated Retrieval Augmented Generation

Apply combined retrieval augmented generation prompt to foundation model

# Python example to retrieve relevant documents
import requests

MODEL_ID = [ YOUR MODEL ID HERE]
COLLECTION_ID = [ YOUR COLLECTION ID HERE]
USER_QUERY = [ USER QUERY HERE ]

endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
body = { "properties": {
    "input": f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be a honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {{{{.context}}}}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {{{{.collection_query}}}} <|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    """,
    "collectionId": COLLECTION_ID,
    "collectionQuery": USER_QUERY,
    "options": {  
        "max_length": "500",  
        "temperature": "0.01"
    }  
}}
requests.post(endpoint, json=body, headers=header).json()

#!/bin/bash

MODEL_ID=[ YOUR MODEL ID HERE]
COLLECTION_ID=[ YOUR COLLECTION ID HERE]
USER_QUERY=[ USER QUERY HERE ]

QUERY="<|begin_of_text|><|start_header_id|>system<|end_header_id|> \
    Please use the information specified as context to answer the question. \
    Formulate your answer in one sentence and be an honest AI.<|eot_id|> \
    <|begin_of_text|><|start_header_id|>context<|end_header_id|> \
    {{.context}}<|eot_id|> \
    <|start_header_id|>user<|end_header_id|> \
    {{.collection_query}} <|eot_id|> \
    <|start_header_id|>assistant<|end_header_id|>"
BODY='{ "properties": { 
    "input": "'${QUERY}'", 
    "collectionId": "'${COLLECTION_ID}'",
    "collectionQuery": "'${USER_QUERY}'",
    "options": { 
        "max_length": "500", 
        "temperature": "0.01"
    } 
}}'

curl -H "Authorization: Bearer ${API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     https://inference.de-txl.ionos.com/models/${MODEL_ID}/predictions

This query conducts all steps necessary to answer a user query using Retrieval Augmented Generation:

The user query (saved at collectionQuery) is sent to the collection (specified at collectionId).
The results of this query are saved in a variable .context, while the user query is saved in a variable .collection_query. You can use both variables in your prompt.
The example prompt uses the variables .context and .collection_query to answer the customer query.

Note:

The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.

Summary

In this tutorial, you learned how to use the IONOS AI Model Hub API to implement Retrieval Augmented Generation use cases.

Namely, you learned how to: Derive answers to user queries using the content of your document collection and one of the IONOS foundation models.