Retrieval Augmented Generation

The IONOS AI Model Hub allows you to combine foundation models and a vector database to implement retrieval augmented generation use cases.

Retrieval augmented generation is an approach that allows you to teach an existing Large Language Model, such as LLama or Mistral, to answer not only based on the knowledge the model learned during training, but also based on the knowledge you specified yourself.

Retrieval augmented generation uses two components:

If one of your users queries your retrieval augmented generation system, you first get the most similar documents from the corresponding document collection. Second, you ask the Large Language Model to answer the query by using both the knowledge it was trained on and the most similar documents from your document collection.

Overview

This tutorial is intended for developers. It assumes you have basic knowledge of:

  • REST APIs and how to call them

  • A programming language to handle REST API endpoints (for illustration purposes, the tutorials use Python and Bash scripting)

You should also be familiar with the IONOS:

By the end of this tutorial, you'll be able to: Answer customer queries using a Large Language Model which adds data from your document collections to the answers.

Background

  • The IONOS AI Model Hub API offers both document embeddings and Large Language Models that you can use to implement retrieval augmented generation without having to manage corresponding hardware yourself.

  • Our AI Model Hub API provides all required functionality without your data being transferred out of Germany.

Before you begin

To get started,

  • set up a document collection using Document Embeddings and get the identifier of this document collection.

  • choose a Large Language Model out of our Foundation Models and derive the identifier of this Large Language Model.

You will need both identifiers in the subsequent steps.

Next, you should open your IDE to enter Python code.

  1. Install required libraries

You need to install the modules requests and pandas to your Python environment:

!pip install requests
!pip install pandas

2. Import required libraries

You need to import the following modules:

import requests
import pandas as pd
import base64

3. Generate header for API requests

Next, generate a header document to authenticate yourself against the REST API:

    API_TOKEN = [YOUR API TOKEN HERE]
    header = {
    "Authorization": f"Bearer {API_TOKEN}", 
    "Content-Type": "application/json"
}

After this step, you have installed all Python modules and have one variable header you can use to implement your first retrieval augmented generation use case.

Manual retrieval augmented generation

This section shows how to use the document collection and the contained documents to answer a user query.

  1. Retrieve documents relevant for querying

To retrieve the documents relevant to answering the user query, invoke the query endpoint as follows:

COLLECTION_ID = [ YOUR COLLECTION ID HERE ]
USER_QUERY = [ USER QUERY HERE ]
NUM_OF_DOCUMENTS = [ NUMBER OF DOCUMENTS TO CONSIDER HERE ]

endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/query"
body = {"query": USER_QUERY, "limit": NUM_OF_DOCUMENTS }
relevant_documents = requests.post(endpoint, json=body, headers=header)

This will return a list of the NUM_OF_DOCUMENTS most relevant documents in your document collection for answering the user query.

  1. Decode Base64 encoded documents

Now, decode the retrieved documents back to a string using:

relevant_documents_decoded = [
    base64.b64decode(entry['document']['properties']['content']).decode()
    for entry in relevant_documents.json()['properties']['matches']
]
  1. Generate final answer

Now, combine the user query and the result from the document collection in one prompt:

MODEL_ID = [ YOUR MODEL ID HERE]
endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
prompt = f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be an honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {"; ".join(relevant_documents_decoded)}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {USER_QUERY}<|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
"""
body = { "properties": {"input": prompt} }
result = requests.post(endpoint, json=body, headers=header)

The result will be a JSON-Document consisting of the answer to the customer and some meta information. You can access the answer using:

result.json()['properties']['output']

Note:

  • For details on how to use the foundation model, see Foundation Models.

  • The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.

Automated retrieval augmented generation

Our Foundation Model API allows for automating the process described above. Namely, by specifying the collection ID and the collection query directly to our foundation model endpoint, it first queries the document collection and returns it in a variable which you can then directly use in your prompt. This section describes how to do this.

Apply combined retrieval augmented generation prompt to foundation model

To implement a retrieval augmented generation use case with only one prompt, you have to invoke the /predictions endpoint of the Large Language Model you want to use and send the prompt as part of the body of this query:

MODEL_ID = [ YOUR MODEL ID HERE]
COLLECTION_ID = [ YOUR COLLECTION ID HERE]
USER_QUERY = [ USER QUERY HERE ]

endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
body = { "properties": {
    "input": f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be a honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {{{{.context}}}}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {{{{.collection_query}}}} <|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    """,
    "collectionId": COLLECTION_ID,
    "collectionQuery": USER_QUERY,
    "options": {  
        "max_length": "500",  
        "temperature": "0.01"
    }  
}}
result = requests.post(endpoint, json=body, headers=header)
result.json()['properties']['output']

This query conducts all steps necessary to answer a user query using retrieval augmented generation:

  • The user query (saved at collectionQuery) is sent to the collection (specified at collectionId).

  • The results of this query are saved in a variable .context, while the user query is saved in a variable .collection_query. You can use both variables in your prompt.

  • The example prompt uses the variables .context and .collection_query to answer the customer query.

Note:

  • For details on how to use the foundation model, see Foundation Models.

  • The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.

Summary

In this tutorial, you learned how to use the IONOS AI Model Hub API to implement retrieval augmented generation use cases.

Namely, you learned how to: Derive answers to user queries using the content of your document collection and one of the IONOS foundation models.

Last updated

Revision created

commented latest release