Reranking

The IONOS CLOUD AI Model Hub provides an OpenAI-compatible API that enables reranking of candidate documents against a query. Reranking models score the relevance between a query and each document using a cross-encoder architecture, making them ideal as a precision refinement step after an initial recall phase in two-stage retrieval pipelines.

Supported Reranking Models

The IONOS CLOUD AI Model Hub Models shows all models available for reranking. Refer to the relevant model cards for each reranking model's suitable use cases.

Overview

In this guide, you learn how to rerank documents using the OpenAI-compatible API. It targets developers who already understand:

  • REST APIs.

  • A programming language for working with REST endpoints (such as Python or Bash).

  • The basics of retrieval pipelines and relevance scoring.

By the end, you will be able to:

  1. Retrieve a list of available reranking models in the IONOS CLOUD AI Model Hub.

  2. Use the API to rerank a set of candidate documents against a query.

  3. Use the returned relevance scores to filter or sort results.

Getting Started with Reranking

To use reranking models, first set up your environment and authenticate using the OpenAI-compatible API endpoints.

Step 1: Retrieve Available Models

Fetch a list of models to see which reranking models are available for your use case:

Output

This query returns a JSON document that lists each model's name, which you will use to specify a model for reranking in later steps.

Step 2: Rerank Documents

To rerank a set of candidate documents, send the query and documents to the /rerank endpoint:

Image size and token budget

When reranking image documents, the model tokenises each image at 1 token per 32×32 pixel block. Images exceeding 1,310,720 pixels in total are downscaled proportionally before tokenisation. Each (query, document) pair must fit within the 32,768-token context window.

To estimate the token cost of an image:

  1. If width × height > 1,310,720, apply the scale factor: scale = √(1,310,720 / (width × height))

  2. Round the scaled dimensions down to the nearest multiple of 32.

  3. Calculate: tokens = (scaled_width / 32) × (scaled_height / 32)

Image resolution
Total pixels
Tokens after downscaling

512 × 512

262,144

~256

1024 × 768

786,432

~756

1296 × 1936

2,507,616

~1,247

1920 × 1080

2,073,600

~1,247

With ~200 tokens reserved for the query and prompt overhead, a single document can hold approximately 25 images at 1296×1936 px before reaching the context limit. Use smaller images or lower resolutions to fit more documents per request.

Step 3: Use Relevance Scores

The returned JSON includes the following key fields:

  • results.[..].index: The position of the document in the original input list.

  • results.[..].document.text: The document text that was scored.

  • results.[..].relevance_score: A score between 0 and 1 indicating how relevant the document is to the query. Higher values indicate greater relevance.

  • usage.prompt_tokens: Token count for the input.

  • usage.total_tokens: Token count for the entire request.

Using Python, you can filter results above a relevance threshold:

The Rerank API uses standard HTTP error codes to indicate the outcome of a request. The error codes and their description are as below:

  • 200 OK: The request was successful.

  • 401 Unauthorized: The request was unauthorized.

  • 404 Not Found: The requested resource was not found.

  • 500 Internal Server Error: An internal server error occurred.

Summary

In this guide, you learned how to:

  1. Access available reranking models.

  2. Rerank a set of candidate documents against a query.

  3. Filter results using relevance score thresholds.

For information on how to use reranking in a full retrieval pipeline with embeddings and document collections, see Retrieval Augmented Generation.

Last updated

Was this helpful?