The IONOS AI Model Hub allows you to combine Large Language Models and a vector database to implement Retrieval Augmented Generation use cases.
Retrieval Augmented Generation is an approach that allows you to teach an existing Large Language Model, such as LLama or Mistral, to answer not only based on the knowledge the model learned during training, but also based on the knowledge you specified yourself.
Retrieval Augmented Generation uses two components:
a Large Language Model (we offer corresponding models for text generation) and
If one of your users queries your Retrieval Augmented Generation system, you first get the most similar documents from the corresponding document collection. Second, you ask the Large Language Model to answer the query by using both the knowledge it was trained on and the most similar documents from your document collection.
This tutorial is intended for developers. It assumes you have basic knowledge of:
REST APIs and how to call them
A programming language to handle REST API endpoints (for illustration purposes, the tutorials use Python and Bash scripting)
You should also be familiar with the IONOS:
By the end of this tutorial, you'll be able to: Answer customer queries using a Large Language Model which adds data from your document collections to the answers.
The IONOS AI Model Hub API offers both document embeddings and Large Language Models that you can use to implement retrieval augmented generation without having to manage corresponding hardware yourself.
Our AI Model Hub API provides all required functionality without your data being transferred out of Germany.
To get started, set up a document collection using Document Embeddings and get the identifier of this document collection.
You will need this identifier in the subsequent steps.
To get started, you should open your IDE to enter Python code.
Next generate a header document to authenticate yourself against the endpoints of our REST API:
After this step, you have one variable header you can use to access our vector database.
To get started, you should open a terminal and ensure that curl
and jq
are installed. While curl
is essential for communicating with our API service, we use jq
throughout our examples the improve the readability of the results of our API.
To retrieve a list of Large Language Models supported by the IONOS AI Model Hub API enter:
This query returns a JSON document consisting of all foundation models and corresponding meta information.
The JSON document consists an entry items*. This is a list of all available foundation models. Of the 7 attributes per foundation model 3 are relevant for you:
id: The identifier of the foundation model
properties.description: The textual description of the model
properties.name: The name of the model
Note:
The identifiers for the foundation models differ between our API for Retrival Augmented Generation and for the image generation and text generation endpoints compatible with OpenAI.
From the list you generated in the previous step, choose the model you want to use and the id. You will use this id in the next step to use the foundation model.
This section shows how to use the document collection and the contained documents to answer a user query.
To retrieve the documents relevant to answering the user query, invoke the query endpoint as follows:
This will return a list of the NUM_OF_DOCUMENTS
most relevant documents in your document collection for answering the user query.
Now, combine the user query and the result from the document collection in one prompt:
The result will be a JSON-Document
consisting of the answer to the customer and some meta information. You can access it in the field at properties.output
Note:
The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.
The IONOS AI Model Hub allows for automating the process described above. Namely, by specifying the collection ID and the collection query directly to our foundation model endpoint, it first queries the document collection and returns it in a variable which you can then directly use in your prompt. This section describes how to do this.
To implement a Retrieval Augmented Generation use case with only one prompt, you have to invoke the /predictions endpoint of the Large Language Model you want to use and send the prompt as part of the body of this query:
This query conducts all steps necessary to answer a user query using Retrieval Augmented Generation:
The user query (saved at collectionQuery) is sent to the collection (specified at collectionId).
The results of this query are saved in a variable .context, while the user query is saved in a variable .collection_query. You can use both variables in your prompt.
The example prompt uses the variables .context and .collection_query to answer the customer query.
Note:
The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.
In this tutorial, you learned how to use the IONOS AI Model Hub API to implement Retrieval Augmented Generation use cases.
Namely, you learned how to: Derive answers to user queries using the content of your document collection and one of the IONOS foundation models.