> For the complete documentation index, see [llms.txt](https://docs.ionos.com/cloud/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/pdf-to-text.md).

# Extract Text from PDF Documents

Vision models on the <code class="expression">space.vars.ionos\_cloud\_ai\_model\_hub</code> accept image input only. To process a PDF document, each page must be rendered as an image before it is sent to the model. This guide shows how to do that using [<mark style="color:blue;">pypdfium2</mark>](https://pypi.org/project/pypdfium2/), a lightweight PDF rendering library, together with the OpenAI-compatible API.

## About this guide

In this guide, you will learn how to:

* Render PDF pages as images using `pypdfium2`
* Extract text from a PDF using LightOnOCR-2-1B, a model optimized for document OCR
* Extract and understand content from a PDF using Mistral Small 24B, a multimodal language model

## Prerequisites

Before you start, make sure you have:

* Python 3.11 or higher installed on your machine
* The `IONOS_API_TOKEN` environment variable set with your [<mark style="color:blue;">authentication token</mark>](/cloud/ai/ai-model-hub/how-tos/access-management.md#generate-an-authentication-token)
* A PDF file to process

Download the code files to follow with ready-to-use examples:

{% tabs %}
{% tab title="Python Notebook" %}
Download the Python Notebook to explore PDF text extraction with ready-to-use examples.

{% file src="/files/SLFSHjVnH2FkizTTU8rR" %}
{% endtab %}

{% tab title="Python Code" %}
Download the standalone Python script for a quick implementation.

{% file src="/files/JXGCDRQLcm5rhDe6jxh8" %}
{% endtab %}
{% endtabs %}

## Step 1: Install dependencies

{% tabs %}
{% tab title="pip" %}

```bash
pip install pypdfium2 pillow openai
```

{% endtab %}

{% tab title="uv" %}

```bash
uv add pypdfium2 pillow openai
```

{% endtab %}
{% endtabs %}

## Step 2: Choose a model

The <code class="expression">space.vars.ionos\_cloud\_ai\_model\_hub</code> offers two model types suitable for PDF text extraction:

| **Model**         | **Identifier**                         | **Best for**                                                    |
| ----------------- | -------------------------------------- | --------------------------------------------------------------- |
| LightOnOCR-2-1B   | `lightonai/LightOnOCR-2-1B`            | Pure text extraction from scanned documents and complex layouts |
| Mistral Small 24B | `mistralai/Mistral-Small-24B-Instruct` | Text extraction combined with document understanding and Q\&A   |

## Step 3: Render a PDF page as an image

`pypdfium2` renders each PDF page as a PNG image in memory. The `scale=2.0` parameter renders at 200 DPI, which balances text sharpness with request payload size.

```python
import base64
import io

import pypdfium2 as pdfium


def pdf_page_to_base64(pdf_path: str, page_index: int, scale: float = 2.0) -> str:
    doc = pdfium.PdfDocument(pdf_path)
    bitmap = doc[page_index].render(scale=scale)
    buf = io.BytesIO()
    bitmap.to_pil().save(buf, format="PNG")
    return base64.b64encode(buf.getvalue()).decode()
```

The function returns a base64-encoded PNG string ready to send directly to the API.

{% hint style="warning" %}
**Payload size limit:** The maximum allowed request payload size is 20 MB. For high-resolution or dense pages, lower the `scale` value or split large documents into individual pages before sending.
{% endhint %}

## Step 4: Send a page to the API

{% tabs %}
{% tab title="LightOnOCR-2-1B" %}
LightOnOCR-2-1B does not require a prompt — its output behaviour is embedded in the model weights.

```python
import os

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["IONOS_API_TOKEN"],
    base_url="https://openai.inference.de-txl.ionos.com/v1",
)

image_b64 = pdf_page_to_base64("document.pdf", page_index=0)

response = client.chat.completions.create(
    model="lightonai/LightOnOCR-2-1B",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_b64}"},
                }
            ],
        }
    ],
    max_tokens=4096,
    temperature=0.0,
)

print(response.choices[0].message.content)
```

{% endtab %}

{% tab title="Mistral Small 24B" %}
Mistral Small 24B accepts a text prompt alongside the image, which lets you control the extraction or ask questions about the page content.

```python
import os

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["IONOS_API_TOKEN"],
    base_url="https://openai.inference.de-txl.ionos.com/v1",
)

image_b64 = pdf_page_to_base64("document.pdf", page_index=0)

response = client.chat.completions.create(
    model="mistralai/Mistral-Small-24B-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Extract all text from this document page and preserve the structure.",
                },
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_b64}"},
                },
            ],
        }
    ],
    max_tokens=4096,
    temperature=0.0,
)

print(response.choices[0].message.content)
```

{% endtab %}
{% endtabs %}

## Step 5: Process all pages in a PDF

To extract text from every page, iterate over the page count and collect the results:

```python
import base64
import io
import os

import pypdfium2 as pdfium
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["IONOS_API_TOKEN"],
    base_url="https://openai.inference.de-txl.ionos.com/v1",
)


def pdf_page_to_base64(pdf_path: str, page_index: int, scale: float = 2.0) -> str:
    doc = pdfium.PdfDocument(pdf_path)
    bitmap = doc[page_index].render(scale=scale)
    buf = io.BytesIO()
    bitmap.to_pil().save(buf, format="PNG")
    return base64.b64encode(buf.getvalue()).decode()


def extract_pdf(pdf_path: str, model: str) -> str:
    doc = pdfium.PdfDocument(pdf_path)
    parts = []
    for i in range(len(doc)):
        image_b64 = pdf_page_to_base64(pdf_path, i)
        response = client.chat.completions.create(
            model=model,
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "image_url",
                            "image_url": {"url": f"data:image/png;base64,{image_b64}"},
                        }
                    ],
                }
            ],
            max_tokens=4096,
            temperature=0.0,
        )
        parts.append(f"--- Page {i + 1} ---\n\n{response.choices[0].message.content}")
    return "\n\n".join(parts)


print(extract_pdf("document.pdf", model="lightonai/LightOnOCR-2-1B"))
```

## What you learned

In this guide, you learned how to:

1. Render PDF pages as base64-encoded PNG images using `pypdfium2`
2. Send page images to LightOnOCR-2-1B or Mistral Small 24B through the OpenAI-compatible API
3. Process all pages in a document and collect the results

For pure OCR tasks, LightOnOCR-2-1B delivers fast and structured Markdown output. For tasks that combine extraction with document understanding, Mistral Small 24B offers more flexibility through its text prompt.

For more information about OCR capabilities, see the [<mark style="color:blue;">Optical Character Recognition (OCR)</mark>](/cloud/ai/ai-model-hub/how-tos/ocr.md) guide.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/pdf-to-text.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.