Retrieval Augmented Generation

The IONOS AI Model Hub allows you to combine Large Language Models and a vector database to implement Retrieval Augmented Generation use cases.

Retrieval Augmented Generation is an approach that allows you to teach an existing Large Language Model, such as LLama or Mistral, to answer not only based on the knowledge the model learned during training, but also based on the knowledge you specified yourself.

Retrieval Augmented Generation uses two components:

If one of your users queries your Retrieval Augmented Generation system, you first get the most similar documents from the corresponding document collection. Second, you ask the Large Language Model to answer the query by using both the knowledge it was trained on and the most similar documents from your document collection.

Overview

This tutorial is intended for developers. It assumes you have basic knowledge of:

  • REST APIs and how to call them

  • A programming language to handle REST API endpoints (for illustration purposes, the tutorials use Python and Bash scripting)

You should also be familiar with the IONOS:

By the end of this tutorial, you'll be able to: Answer customer queries using a Large Language Model which adds data from your document collections to the answers.

Background

  • The IONOS AI Model Hub API offers both document embeddings and Large Language Models that you can use to implement retrieval augmented generation without having to manage corresponding hardware yourself.

  • Our AI Model Hub API provides all required functionality without your data being transferred out of Germany.

Before you begin

To get started, set up a document collection using Document Embeddings and get the identifier of this document collection.

You will need this identifier in the subsequent steps.

To get started, you should open your IDE to enter Python code.

Next generate a header document to authenticate yourself against the endpoints of our REST API:

# Python example to specify header

API_TOKEN = [YOUR API TOKEN HERE]
header = {
    "Authorization": f"Bearer {API_TOKEN}", 
    "Content-Type": "application/json"
}

After this step, you have one variable header you can use to access our vector database.

Access list of available Large Language Models

To retrieve a list of Large Language Models supported by the IONOS AI Model Hub API enter:

# Python example to get all available models
import requests

API_TOKEN = [ YOUR API TOKEN HERE ]
header = {
    "Authorization": f"Bearer {API_TOKEN}", 
    "Content-Type": "application/json"
}
requests.get("https://inference.de-txl.ionos.com/models", headers=header).json()

This query returns a JSON document consisting of all foundation models and corresponding meta information.

The JSON document consists an entry items*. This is a list of all available foundation models. Of the 7 attributes per foundation model 3 are relevant for you:

  • id: The identifier of the foundation model

  • properties.description: The textual description of the model

  • properties.name: The name of the model

Note:

The identifiers for the foundation models differ between our API for Retrival Augmented Generation and for the image generation and text generation endpoints compatible with OpenAI.

From the list you generated in the previous step, choose the model you want to use and the id. You will use this id in the next step to use the foundation model.

Manual retrieval augmented generation

This section shows how to use the document collection and the contained documents to answer a user query.

Step 1: Retrieve relevant documents

To retrieve the documents relevant to answering the user query, invoke the query endpoint as follows:

# Python example to retrieve relevant documents
import requests
import base64

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
USER_QUERY = [ USER QUERY HERE ]
NUM_OF_DOCUMENTS = [ NUMBER OF DOCUMENTS TO CONSIDER HERE ]

endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/query"
body = {"query": USER_QUERY, "limit": NUM_OF_DOCUMENTS }
relevant_documents = requests.post(endpoint, json=body, headers=header)

relevant_documents_decoded = [
    base64.b64decode(entry['document']['properties']['content']).decode()
    for entry in relevant_documents.json()['properties']['matches']
]

This will return a list of the NUM_OF_DOCUMENTS most relevant documents in your document collection for answering the user query.

Step 2: Generate final answer

Now, combine the user query and the result from the document collection in one prompt:

# Python example to retrieve relevant documents
import requests

MODEL_ID = [ YOUR MODEL ID HERE]
endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
prompt = f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be an honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {"; ".join(relevant_documents_decoded)}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {USER_QUERY}<|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
"""
body = { "properties": {"input": prompt} }
requests.post(endpoint, json=body, headers=header).json()

The result will be a JSON-Document consisting of the answer to the customer and some meta information. You can access it in the field at properties.output

Note:

The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.

Automated Retrieval Augmented Generation

The IONOS AI Model Hub allows for automating the process described above. Namely, by specifying the collection ID and the collection query directly to our foundation model endpoint, it first queries the document collection and returns it in a variable which you can then directly use in your prompt. This section describes how to do this.

Apply combined retrieval augmented generation prompt to foundation model

To implement a Retrieval Augmented Generation use case with only one prompt, you have to invoke the /predictions endpoint of the Large Language Model you want to use and send the prompt as part of the body of this query:

# Python example to retrieve relevant documents
import requests

MODEL_ID = [ YOUR MODEL ID HERE]
COLLECTION_ID = [ YOUR COLLECTION ID HERE]
USER_QUERY = [ USER QUERY HERE ]

endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
body = { "properties": {
    "input": f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be a honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {{{{.context}}}}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {{{{.collection_query}}}} <|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    """,
    "collectionId": COLLECTION_ID,
    "collectionQuery": USER_QUERY,
    "options": {  
        "max_length": "500",  
        "temperature": "0.01"
    }  
}}
requests.post(endpoint, json=body, headers=header).json()

This query conducts all steps necessary to answer a user query using Retrieval Augmented Generation:

  • The user query (saved at collectionQuery) is sent to the collection (specified at collectionId).

  • The results of this query are saved in a variable .context, while the user query is saved in a variable .collection_query. You can use both variables in your prompt.

  • The example prompt uses the variables .context and .collection_query to answer the customer query.

Note:

The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.

Summary

In this tutorial, you learned how to use the IONOS AI Model Hub API to implement Retrieval Augmented Generation use cases.

Namely, you learned how to: Derive answers to user queries using the content of your document collection and one of the IONOS foundation models.

Last updated