Retrieval Augmented Generation
The IONOS AI Model Hub allows you to combine foundation models and a vector database to implement retrieval augmented generation use cases.
Retrieval augmented generation is an approach that allows you to teach an existing Large Language Model, such as LLama or Mistral, to answer not only based on the knowledge the model learned during training, but also based on the knowledge you specified yourself.
Retrieval augmented generation uses two components:
a Large Language Model (we offer a corresponding model as part of our Foundation Models) and
If one of your users queries your retrieval augmented generation system, you first get the most similar documents from the corresponding document collection. Second, you ask the Large Language Model to answer the query by using both the knowledge it was trained on and the most similar documents from your document collection.
Overview
This tutorial is intended for developers. It assumes you have basic knowledge of:
REST APIs and how to call them
A programming language to handle REST API endpoints (for illustration purposes, the tutorials use Python and Bash scripting)
You should also be familiar with the IONOS:
By the end of this tutorial, you'll be able to: Answer customer queries using a Large Language Model which adds data from your document collections to the answers.
Background
The IONOS AI Model Hub API offers both document embeddings and Large Language Models that you can use to implement retrieval augmented generation without having to manage corresponding hardware yourself.
Our AI Model Hub API provides all required functionality without your data being transferred out of Germany.
Before you begin
To get started,
set up a document collection using Document Embeddings and get the identifier of this document collection.
choose a Large Language Model out of our Foundation Models and derive the identifier of this Large Language Model.
You will need both identifiers in the subsequent steps.
Next, you should open your IDE to enter Python code.
Install required libraries
You need to install the modules requests and pandas to your Python environment:
2. Import required libraries
You need to import the following modules:
3. Generate header for API requests
Next, generate a header document to authenticate yourself against the REST API:
After this step, you have installed all Python modules and have one variable header you can use to implement your first retrieval augmented generation use case.
Manual retrieval augmented generation
This section shows how to use the document collection and the contained documents to answer a user query.
Retrieve documents relevant for querying
To retrieve the documents relevant to answering the user query, invoke the query endpoint as follows:
This will return a list of the NUM_OF_DOCUMENTS
most relevant documents in your document collection for answering the user query.
Decode Base64 encoded documents
Now, decode the retrieved documents back to a string using:
Generate final answer
Now, combine the user query and the result from the document collection in one prompt:
The result will be a JSON-Document
consisting of the answer to the customer and some meta information. You can access the answer using:
Note:
For details on how to use the foundation model, see Foundation Models.
The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.
Automated retrieval augmented generation
Our Foundation Model API allows for automating the process described above. Namely, by specifying the collection ID and the collection query directly to our foundation model endpoint, it first queries the document collection and returns it in a variable which you can then directly use in your prompt. This section describes how to do this.
Apply combined retrieval augmented generation prompt to foundation model
To implement a retrieval augmented generation use case with only one prompt, you have to invoke the /predictions endpoint of the Large Language Model you want to use and send the prompt as part of the body of this query:
This query conducts all steps necessary to answer a user query using retrieval augmented generation:
The user query (saved at collectionQuery) is sent to the collection (specified at collectionId).
The results of this query are saved in a variable .context, while the user query is saved in a variable .collection_query. You can use both variables in your prompt.
The example prompt uses the variables .context and .collection_query to answer the customer query.
Note:
For details on how to use the foundation model, see Foundation Models.
The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.
Summary
In this tutorial, you learned how to use the IONOS AI Model Hub API to implement retrieval augmented generation use cases.
Namely, you learned how to: Derive answers to user queries using the content of your document collection and one of the IONOS foundation models.
Last updated