# Reranking

The <code class="expression">space.vars.ionos\_cloud\_ai\_model\_hub</code> provides an OpenAI-compatible API that enables reranking of candidate documents against a query. Reranking models score the relevance between a query and each document using a cross-encoder architecture, making them ideal as a precision refinement step after an initial recall phase in two-stage retrieval pipelines.

## Supported Reranking Models

The <code class="expression">space.vars.ionos\_cloud\_ai\_model\_hub</code> [<mark style="color:blue;">Models</mark>](/cloud/ai/ai-model-hub/models.md) shows all models available for reranking. Refer to the relevant model cards for each reranking model's suitable use cases.

## Overview

In this guide, you learn how to rerank documents using the OpenAI-compatible API. It targets developers who already understand:

* REST APIs.
* A programming language for working with REST endpoints (such as Python or Bash).
* The basics of retrieval pipelines and relevance scoring.

By the end, you will be able to:

1. Retrieve a list of available reranking models in the <code class="expression">space.vars.ionos\_cloud\_ai\_model\_hub</code>.
2. Use the API to rerank a set of candidate documents against a query.
3. Use the returned relevance scores to filter or sort results.

## Getting Started with Reranking

To use reranking models, first set up your environment and authenticate using the OpenAI-compatible API endpoints.

### Step 1: Retrieve Available Models

Fetch a list of models to see which reranking models are available for your use case:

{% tabs %}
{% tab title="Python" %}

```python
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"

endpoint = "https://openai.inference.de-txl.ionos.com/v1/models"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}",
    "Content-Type": "application/json"
}
requests.get(endpoint, headers=header).json()
```

{% endtab %}

{% tab title="Bash" %}

```bash
#!/bin/bash

IONOS_API_TOKEN=[YOUR API TOKEN HERE]

curl -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
        --get https://openai.inference.de-txl.ionos.com/v1/models
```

{% endtab %}
{% endtabs %}

#### Output

```
      {
         "id":"Qwen/Qwen3-VL-Reranker-8B",
         "object":"model",
         "created":1677610602,
      },
```

This query returns a JSON document that lists each model's name, which you will use to specify a model for reranking in later steps.

### Step 2: Rerank Documents

To rerank a set of candidate documents, send the query and documents to the `/rerank` endpoint:

{% tabs %}
{% tab title="Python" %}

```python
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"
MODEL_NAME = "[MODEL NAME HERE]"
QUERY = "What is the capital of France?"
DOCUMENTS = [
    "Paris is the capital of France.",
    "London is the capital of England.",
    "Berlin is the capital of Germany."
]

endpoint = "https://openai.inference.de-txl.ionos.com/v1/rerank"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}",
    "Content-Type": "application/json"
}
body = {
    "model": MODEL_NAME,
    "query": QUERY,
    "documents": DOCUMENTS,
    "top_n": 2
}
result = requests.post(endpoint, json=body, headers=header).json()
```

{% endtab %}

{% tab title="Bash" %}

```bash
#!/bin/bash

IONOS_API_TOKEN=[YOUR API TOKEN HERE]
MODEL_NAME=[MODEL NAME HERE]

BODY='{
    "model": "'"$MODEL_NAME"'",
    "query": "What is the capital of France?",
    "documents": [
        "Paris is the capital of France.",
        "London is the capital of England.",
        "Berlin is the capital of Germany."
    ],
    "top_n": 2
}'

curl -X POST -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
     -H "Content-Type: application/json" \
     -d "$BODY" \
     https://openai.inference.de-txl.ionos.com/v1/rerank
```

{% endtab %}
{% endtabs %}

### Image size and token budget

When reranking image documents, the model tokenises each image at **1 token per 32×32 pixel block**. Images exceeding 1,310,720 pixels in total are downscaled proportionally before tokenisation. Each (query, document) pair must fit within the 32,768-token context window.

To estimate the token cost of an image:

1. If `width × height > 1,310,720`, apply the scale factor: `scale = √(1,310,720 / (width × height))`
2. Round the scaled dimensions down to the nearest multiple of 32.
3. Calculate: `tokens = (scaled_width / 32) × (scaled_height / 32)`

| Image resolution | Total pixels | Tokens after downscaling |
| :--------------: | :----------: | :----------------------: |
|     512 × 512    |    262,144   |           \~256          |
|    1024 × 768    |    786,432   |           \~756          |
|    1296 × 1936   |   2,507,616  |          \~1,247         |
|    1920 × 1080   |   2,073,600  |          \~1,247         |

With \~200 tokens reserved for the query and prompt overhead, a single document can hold approximately **25 images at 1296×1936 px** before reaching the context limit. Use smaller images or lower resolutions to fit more documents per request.

### Step 3: Use Relevance Scores

The returned JSON includes the following key fields:

* **`results.[..].index`**: The position of the document in the original input list.
* **`results.[..].document.text`**: The document text that was scored.
* **`results.[..].relevance_score`**: A score between 0 and 1 indicating how relevant the document is to the query. Higher values indicate greater relevance.
* **`usage.prompt_tokens`**: Token count for the input.
* **`usage.total_tokens`**: Token count for the entire request.

Using Python, you can filter results above a relevance threshold:

{% tabs %}
{% tab title="Python" %}

```python
import requests

IONOS_API_TOKEN = "[YOUR API TOKEN HERE]"
MODEL_NAME = "Qwen/Qwen3-VL-Reranker-8B"
QUERY = "What is the capital of France?"
DOCUMENTS = [
    "Paris is the capital of France.",
    "London is the capital of England.",
    "Berlin is the capital of Germany."
]
THRESHOLD = 0.5

endpoint = "https://openai.inference.de-txl.ionos.com/v1/rerank"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}",
    "Content-Type": "application/json"
}
body = {
    "model": MODEL_NAME,
    "query": QUERY,
    "documents": DOCUMENTS
}
result = requests.post(endpoint, json=body, headers=header).json()

relevant = [
    r for r in result["results"]
    if r["relevance_score"] >= THRESHOLD
]

# [{'index': 0, 'document': {'text': 'Paris is the capital of France.'}, 'relevance_score': 0.9234}]
```

{% endtab %}
{% endtabs %}

The Rerank API uses standard HTTP error codes to indicate the outcome of a request. The error codes and their description are as below:

* `200 OK`: The request was successful.
* `401 Unauthorized`: The request was unauthorized.
* `404 Not Found`: The requested resource was not found.
* `500 Internal Server Error`: An internal server error occurred.

## Summary

In this guide, you learned how to:

1. Access available reranking models.
2. Rerank a set of candidate documents against a query.
3. Filter results using relevance score thresholds.

For information on how to use reranking in a full retrieval pipeline with embeddings and document collections, see [<mark style="color:blue;">Retrieval Augmented Generation</mark>](/cloud/ai/ai-model-hub/how-tos/retrieval-augmented-generation.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/reranking.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.