> For the complete documentation index, see [llms.txt](https://docs.ionos.com/cloud/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/ocr.md).

# Optical Character Recognition (OCR)

The <code class="expression">space.vars.ionos\_cloud\_ai\_model\_hub</code> supports Optical Character Recognition (OCR) models, such as LightOnOCR-2-1B, a vision-language model that converts document images (scans, images) into clean, naturally ordered text. OCR lets you extract editable text from visual content, making it valuable for document digitization, data extraction, and content accessibility.

{% hint style="info" %}
**Note:**

* LightOnOCR-2-1B accepts image inputs only; PDF files are not natively supported. If your source document is a PDF, convert each page into an image format (for example, PNG or JPEG) before sending your request to the API.
* LightOnOCR-2-1B is an end-to-end vision-language model that processes images directly without requiring separate preprocessing or layout detection steps. It handles complex layouts such as tables, forms, receipts, and scientific notation. The model always outputs Markdown-formatted text (including LaTeX spans for mathematical notation); this behavior is embedded in the model weights and cannot be changed through text prompts.
  {% endhint %}

## OCR models supporting document conversion

Not all models on the AI Model Hub [<mark style="color:blue;">models list</mark>](/cloud/ai/ai-model-hub/models.md) support OCR. LightOnOCR-2-1B is specifically designed for document-to-text tasks. Check the model cards for compatibility details.

## Overview

In this guide, you will learn how to integrate the LightOnOCR-2-1B model through the <code class="expression">space.vars.ionos\_cloud</code> OpenAI-compatible API to extract text from images and documents.

This guide is intended for developers with basic knowledge of:

* REST APIs
* A programming language capable of making HTTP requests (Python and Bash examples included)
* <code class="expression">space.vars.ionos\_cloud\_ai\_model\_hub</code>'s OpenAI-compatible API

## Getting started with OCR

First, set up your environment and authenticate using the OpenAI-compatible API endpoint.

Download the respective code files to use OCR-specific scripts and examples and generate the intended output:

{% tabs %}
{% tab title="Python Notebook" %}
Download the Python Notebook to explore OCR with ready-to-use examples.

{% file src="/files/6NfkKDY0zxnIg6NNdc73" %}
{% endtab %}

{% tab title="Python Code" %}
Download the standalone Python script for a quick implementation.

{% file src="/files/5GnQBEO8O38scjmAxmqa" %}
{% endtab %}

{% tab title="Bash Code" %}
Download the Bash script for a command-line implementation.

{% file src="/files/9MxL2WncJIT9gydqllcQ" %}
{% endtab %}
{% endtabs %}

### Basic example

```bash
curl -X POST "https://openai.inference.de-txl.ionos.com/v1/chat/completions" \
  -H "Authorization: Bearer $IONOS_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lightonai/LightOnOCR-2-1B",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/document.png"
            }
          }
        ]
      }
    ],
    "max_tokens": 4096,
    "temperature": 0.2,
    "top_p": 0.9
  }'
```

#### Input / Output quick reference

{% tabs %}
{% tab title="Input (sent)" %}

```json
{
  "model": "lightonai/LightOnOCR-2-1B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/document.png"
          }
        }
      ]
    }
  ],
  "max_tokens": 4096,
  "temperature": 0.2,
  "top_p": 0.9
}
```

* `model`: The OCR model identifier
* `messages`: Array containing the user message with image content
* `image_url`: The document image as a URL or base64-encoded data URI
* `max_tokens`: Maximum number of tokens in the response
* `temperature`: Controls output randomness (lower = more deterministic)
  {% endtab %}

{% tab title="Output (received)" %}

```json
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "lightonai/LightOnOCR-2-1B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Extracted text from the document..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 1234,
    "completion_tokens": 567,
    "total_tokens": 1801
  }
}
```

* `choices[0].message.content`: The extracted text from the image
* `usage`: Token counts for billing purposes
  {% endtab %}
  {% endtabs %}

### Step 1: Prepare your image input

LightOnOCR-2-1B accepts images in two formats:

1. **URL**: Provide a publicly available URL to the image.
2. **Base64-encoded data URI**: Encode a local image file as a base64 string.

If your source document is a PDF, convert each page to an image before sending it to the API. Tools such as `pdf2image` (Python) or `pdftoppm` (CLI) can render PDF pages as PNG or JPEG files. For optimal OCR accuracy, a resolution of 200 DPI is recommended.

{% hint style="warning" %}
**Payload size limit:** The maximum allowed request payload size is 20 MB. When using base64-encoded images, make sure the encoded content does not exceed this limit.
{% endhint %}

{% tabs %}
{% tab title="Python: URL" %}

```python
# Using an image URL
content = [
    {
        "type": "image_url",
        "image_url": {
            "url": "https://example.com/document.png"
        }
    }
]
```

{% endtab %}

{% tab title="Python: Base64" %}

```python
import base64

def encode_image_to_data_url(path: str) -> str:
    with open(path, "rb") as f:
        b64 = base64.b64encode(f.read()).decode()
    return f"data:image/png;base64,{b64}"

# Using a local image file
image_data_url = encode_image_to_data_url("document.png")

content = [
    {
        "type": "image_url",
        "image_url": {
            "url": image_data_url
        }
    }
]
```

{% endtab %}
{% endtabs %}

### Step 2: Make an OCR API request

Send the prepared image to the LightOnOCR-2-1B model through the OpenAI-compatible chat completions endpoint.

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("IONOS_API_TOKEN"),
    base_url="https://openai.inference.de-txl.ionos.com/v1",
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/document.png"
                }
            }
        ]
    }
]

response = client.chat.completions.create(
    model="lightonai/LightOnOCR-2-1B",
    messages=messages,
    max_tokens=4096,
    temperature=0.2,
    top_p=0.9,
)

print(response.choices[0].message.content)
```

{% endtab %}

{% tab title="Bash" %}

```bash
#!/bin/bash

IONOS_API_TOKEN=${IONOS_API_TOKEN}

curl -s -X POST "https://openai.inference.de-txl.ionos.com/v1/chat/completions" \
  -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lightonai/LightOnOCR-2-1B",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/document.png"
            }
          }
        ]
      }
    ],
    "max_tokens": 4096,
    "temperature": 0.2,
    "top_p": 0.9
  }'
```

{% endtab %}
{% endtabs %}

### Step 3: OCR with local images using base64 encoding

For local image files, encode them as base64 data URIs before sending to the API.

{% tabs %}
{% tab title="Python" %}

```python
import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("IONOS_API_TOKEN"),
    base_url="https://openai.inference.de-txl.ionos.com/v1",
)

def encode_image_to_data_url(path: str) -> str:
    with open(path, "rb") as f:
        b64 = base64.b64encode(f.read()).decode()
    return f"data:image/png;base64,{b64}"

image_data_url = encode_image_to_data_url("document.png")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": image_data_url
                }
            }
        ]
    }
]

response = client.chat.completions.create(
    model="lightonai/LightOnOCR-2-1B",
    messages=messages,
    max_tokens=4096,
    temperature=0.2,
    top_p=0.9,
)

print(response.choices[0].message.content)
```

{% endtab %}

{% tab title="Bash" %}

```bash
#!/bin/bash

IONOS_API_TOKEN=${IONOS_API_TOKEN}

# Encode local image as base64 data URI
IMAGE_BASE64=$(base64 -w 0 document.png)
IMAGE_DATA_URL="data:image/png;base64,${IMAGE_BASE64}"

curl -s -X POST "https://openai.inference.de-txl.ionos.com/v1/chat/completions" \
  -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"lightonai/LightOnOCR-2-1B\",
    \"messages\": [
      {
        \"role\": \"user\",
        \"content\": [
          {
            \"type\": \"image_url\",
            \"image_url\": {
              \"url\": \"${IMAGE_DATA_URL}\"
            }
          }
        ]
      }
    ],
    \"max_tokens\": 4096,
    \"temperature\": 0.2,
    \"top_p\": 0.9
  }"
```

{% endtab %}
{% endtabs %}

## What you learned

In this guide, you learned how to:

1. Send images to the LightOnOCR-2-1B model for text extraction
2. Work with both URL-based and base64-encoded local images
3. Process the OCR response from the OpenAI-compatible API

LightOnOCR-2-1B always returns Markdown-formatted text, making it straightforward to integrate OCR output into downstream workflows. This is particularly valuable for document management, data entry automation, and content accessibility.

To automate the PDF-to-image conversion step using Python, see [<mark style="color:blue;">Extract Text from PDF Documents</mark>](/cloud/ai/ai-model-hub/how-tos/pdf-to-text.md).

For more information about other AI capabilities, see our documentation on [<mark style="color:blue;">Text Generation</mark>](/cloud/ai/ai-model-hub/how-tos/text-generation.md), [<mark style="color:blue;">Image Generation</mark>](/cloud/ai/ai-model-hub/how-tos/image-generation.md), and [<mark style="color:blue;">Tool Calling</mark>](/cloud/ai/ai-model-hub/how-tos/tool-calling.md).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/ocr.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.