# Migration Guide from Predictions Endpoint

{% hint style="warning" %}
**Scheduled to Retire**: The native `/predictions` endpoint is scheduled to retire on May 5, 2026.
{% endhint %}

The `/collections` together with `/documents` and `/query` endpoints for managing document collections and documents will remain unaffected.

This guide explains how to migrate from the native `/predictions` endpoint to the OpenAI-compatible API for text, image and Retrieval Augmented Generation in the IONOS AI Model Hub.

***

If you are currently using the native `/predictions` endpoint (Example: `https://inference.de-txl.ionos.com/models/{modelId}/predictions`), you can migrate to the OpenAI-compatible API for standard text and image generation use cases. This migration facilitates more straightforward integration with OpenAI-compatible tools and SDKs and provides a more standardized developer experience.

***

## Text Generation Migration Example

* **Native Endpoint:** `POST https://inference.de-txl.ionos.com/models/{modelId}/predictions`
* **Native Request Body:**

  ```json
  {
    "type": "prediction",
    "properties": {
      "input": "Please give me 5 domain suggestions for a flower shop in Berlin. Provide for each domain name a paragraph explaining the domain name and why it is valuable.",
      "options": {
        "max_length": "1000",
        "temperature": "0.5"
      }
    }
  }
  ```
* **OpenAI-Compatible Endpoint:** `POST https://openai.inference.de-txl.ionos.com/v1/chat/completions`
* **Model Selection:** The `modelId` for the OpenAI-compatible API is taken from the list of available models at `https://openai.inference.de-txl.ionos.com/v1/models`. For example, you can use `openai/gpt-oss-120b` as the model ID.
* **OpenAI-Compatible Request Body:**

  ```json
  {
    "model": "openai/gpt-oss-120b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Please give me 5 domain suggestions for a flower shop in Berlin. Provide for each domain name a paragraph explaining the domain name and why it is valuable."
      }
    ],
    "max_tokens": 2000,
    "temperature": 0.5
  }
  ```
* **Field Mapping:**
  * `properties.input` → `messages[].content` (user role)
  * `properties.options.max_length` → `max_tokens`
  * `properties.options.temperature` → `temperature`
  * `modelId` in the URL → `model` field in the request body

***

## Image Generation Migration Example

* **Native Endpoint:** `POST https://inference.de-txl.ionos.com/models/{modelId}/predictions`
* **Native Request Body:**

  ```json
  {
    "type": "prediction",
    "properties": {
      "input": "Draw an image of a futuristic city skyline at sunset, digital art.",
      "options": {
        "size": "1024x1024"
      }
    }
  }
  ```
* **OpenAI-Compatible Endpoint:** `POST https://openai.inference.de-txl.ionos.com/v1/images/generations`
* **Model Selection:** The `modelId` for the OpenAI-compatible API is taken from the list of available models at `https://openai.inference.de-txl.ionos.com/v1/models`. For example, you can use `black-forest-labs/FLUX.1-schnell` as the model ID.
* **OpenAI-Compatible Request Body:**

  ```json
  {
    "model": "black-forest-labs/FLUX.1-schnell",
    "prompt": "A futuristic city skyline at sunset, digital art.",
    "n": 1,
    "size": "1024x1024"
  }
  ```
* **Field Mapping:**
  * `properties.input` → `prompt`
  * `properties.options.size` → `size`
  * `modelId` in the URL → `model` field in the request body

***

## Migrating from `/predictions` for Retrieval Augmented Generation (RAG) use case

Users who require Retrieval Augmented Generation (RAG) or document-based querying should migrate to the native `/query` endpoint and OpenAI-Compatible API. The new approach separates document retrieval from text generation into two steps:

### Step 1: Query Your Document Collection

**Endpoint:** `POST https://inference.de-txl.ionos.com/collections/{collectionId}/query`

**Request Body:**

```json
{
  "query": "What are the supported models for AI Model Hub?",
  "limit": 5
}
```

**Response Example:**

```json
{
  "results": [
    {
      "documentId": "doc-123",
      "content": "IONOS AI Model Hub supports various models including GPT-OSS-120B, Llama 3, Mistral, and FLUX.1 for image generation...",
      "score": 0.92
    },
    {
      "documentId": "doc-456",
      "content": "The following embedding models are available: text-embedding-ada-002...",
      "score": 0.87
    }
  ]
}
```

### Step 2: Generate a Response Using Retrieved Context

**Endpoint:** `POST https://openai.inference.de-txl.ionos.com/v1/chat/completions`

**Request Body:**

```json
{
  "model": "openai/gpt-oss-120b",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant. Answer the user's question based only on the provided context. If the context doesn't contain enough information, say so.\n\nContext:\n---\nIONOS AI Model Hub supports various models including GPT-OSS-120B, Llama 3, Mistral, and FLUX.1 for image generation...\n\nThe following embedding models are available: text-embedding-ada-002...\n---"
    },
    {
      "role": "user",
      "content": "Which models does IONOS AI Model Hub offer?"
    }
  ],
  "max_tokens": 1000,
  "temperature": 0
}
```

**Best Practice:** Place the retrieved context in the system message to clearly separate instructional context from the user's question. This approach provides cleaner separation of concerns, easier conversation continuation for follow-up questions, and better model adherence to grounding information.
