# Optical Character Recognition (OCR)

The IONOS Cloud AI Model Hub supports Optical Character Recognition (OCR) models, such as LightOnOCR-2-1B, a vision-language model that converts documents (PDFs, scans, images) into clean, naturally ordered text. OCR lets you extract editable text from visual content, making it valuable for document digitization, data extraction, and content accessibility.

{% hint style="info" %}
**Note:** LightOnOCR-2-1B is an end-to-end vision-language model that processes images directly without requiring separate preprocessing or layout detection steps. It handles complex layouts such as tables, forms, receipts, and scientific notation. The model always outputs Markdown-formatted text (including LaTeX spans for mathematical notation); this behavior is embedded in the model weights and cannot be changed via text prompts.
{% endhint %}

## OCR models supporting document conversion

Not all models on the AI Model Hub [<mark style="color:blue;">models list</mark>](https://docs.ionos.com/sections-test/guides/ai/ai-model-hub/models) support OCR. LightOnOCR-2-1B is specifically designed for document-to-text tasks. Check the model cards for compatibility details.

## Overview

In this guide, you will learn how to integrate the LightOnOCR-2-1B model through the IONOS Cloud OpenAI-compatible API to extract text from images and documents.

This guide is intended for developers with basic knowledge of:

* REST APIs
* A programming language capable of making HTTP requests (Python and Bash examples included)
* IONOS Cloud AI Model Hub's OpenAI-compatible API

## Getting started with OCR

First, set up your environment and authenticate using the OpenAI-compatible API endpoint.

Download the respective code files to easily access OCR-specific scripts and examples and generate the intended output:

{% tabs %}
{% tab title="Python Notebook" %}
Download the Python Notebook to explore OCR with ready-to-use examples.

{% file src="<https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-55a7bb7c0fb751b942836595f0fb1ff80fde344f%2Fai-model-hub-ocr.ipynb?alt=media>" %}
{% endtab %}

{% tab title="Python Code" %}
Download the standalone Python script for a quick implementation.

{% file src="<https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-09959ce29aab7c28a82fdd610e8abbb1b90e97f6%2Fai-model-hub-ocr.py?alt=media>" %}
{% endtab %}

{% tab title="Bash Code" %}
Download the Bash script for a command-line implementation.

{% file src="<https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-60f037cc5cd5080583314e03720f42611d451b5d%2Fai-model-hub-ocr.sh?alt=media>" %}
{% endtab %}
{% endtabs %}

### Simple example

```bash
curl -X POST "https://openai.inference.de-txl.ionos.com/v1/chat/completions" \
  -H "Authorization: Bearer $IONOS_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lightonai/LightOnOCR-2-1B",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/document.png"
            }
          }
        ]
      }
    ],
    "max_tokens": 4096,
    "temperature": 0.2,
    "top_p": 0.9
  }'
```

#### Input / Output — quick reference

{% tabs %}
{% tab title="Input (sent)" %}

```json
{
  "model": "lightonai/LightOnOCR-2-1B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/document.png"
          }
        }
      ]
    }
  ],
  "max_tokens": 4096,
  "temperature": 0.2,
  "top_p": 0.9
}
```

* `model`: The OCR model identifier
* `messages`: Array containing the user message with image content
* `image_url`: The document image as a URL or base64-encoded data URI
* `max_tokens`: Maximum number of tokens in the response
* `temperature`: Controls output randomness (lower = more deterministic)
  {% endtab %}

{% tab title="Output (received)" %}

```json
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "lightonai/LightOnOCR-2-1B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Extracted text from the document..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 1234,
    "completion_tokens": 567,
    "total_tokens": 1801
  }
}
```

* `choices[0].message.content`: The extracted text from the image
* `usage`: Token counts for billing purposes
  {% endtab %}
  {% endtabs %}

### Step 1: Prepare your image input

LightOnOCR-2-1B accepts images in two formats:

1. **URL**: Provide a publicly accessible URL to the image.
2. **Base64-encoded data URI**: Encode a local image file as a base64 string.

{% hint style="warning" %}
**Payload size limit:** The maximum allowed request payload size is 20 MB. When using base64-encoded images, make sure the encoded content does not exceed this limit.
{% endhint %}

{% tabs %}
{% tab title="Python — URL" %}

```python
# Using an image URL
content = [
    {
        "type": "image_url",
        "image_url": {
            "url": "https://example.com/document.png"
        }
    }
]
```

{% endtab %}

{% tab title="Python — Base64" %}

```python
import base64

def encode_image_to_data_url(path: str) -> str:
    with open(path, "rb") as f:
        b64 = base64.b64encode(f.read()).decode()
    return f"data:image/png;base64,{b64}"

# Using a local image file
image_data_url = encode_image_to_data_url("document.png")

content = [
    {
        "type": "image_url",
        "image_url": {
            "url": image_data_url
        }
    }
]
```

{% endtab %}
{% endtabs %}

### Step 2: Make an OCR API request

Send the prepared image to the LightOnOCR-2-1B model via the OpenAI-compatible chat completions endpoint.

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("IONOS_API_TOKEN"),
    base_url="https://openai.inference.de-txl.ionos.com/v1",
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/document.png"
                }
            }
        ]
    }
]

response = client.chat.completions.create(
    model="lightonai/LightOnOCR-2-1B",
    messages=messages,
    max_tokens=4096,
    temperature=0.2,
    top_p=0.9,
)

print(response.choices[0].message.content)
```

{% endtab %}

{% tab title="Bash" %}

```bash
#!/bin/bash

IONOS_API_TOKEN=${IONOS_API_TOKEN}

curl -s -X POST "https://openai.inference.de-txl.ionos.com/v1/chat/completions" \
  -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lightonai/LightOnOCR-2-1B",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/document.png"
            }
          }
        ]
      }
    ],
    "max_tokens": 4096,
    "temperature": 0.2,
    "top_p": 0.9
  }'
```

{% endtab %}
{% endtabs %}

### Step 3: OCR with local images using base64 encoding

For local image files, encode them as base64 data URIs before sending to the API.

{% tabs %}
{% tab title="Python" %}

```python
import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("IONOS_API_TOKEN"),
    base_url="https://openai.inference.de-txl.ionos.com/v1",
)

def encode_image_to_data_url(path: str) -> str:
    with open(path, "rb") as f:
        b64 = base64.b64encode(f.read()).decode()
    return f"data:image/png;base64,{b64}"

image_data_url = encode_image_to_data_url("document.png")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": image_data_url
                }
            }
        ]
    }
]

response = client.chat.completions.create(
    model="lightonai/LightOnOCR-2-1B",
    messages=messages,
    max_tokens=4096,
    temperature=0.2,
    top_p=0.9,
)

print(response.choices[0].message.content)
```

{% endtab %}

{% tab title="Bash" %}

```bash
#!/bin/bash

IONOS_API_TOKEN=${IONOS_API_TOKEN}

# Encode local image as base64 data URI
IMAGE_BASE64=$(base64 -w 0 document.png)
IMAGE_DATA_URL="data:image/png;base64,${IMAGE_BASE64}"

curl -s -X POST "https://openai.inference.de-txl.ionos.com/v1/chat/completions" \
  -H "Authorization: Bearer ${IONOS_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"lightonai/LightOnOCR-2-1B\",
    \"messages\": [
      {
        \"role\": \"user\",
        \"content\": [
          {
            \"type\": \"image_url\",
            \"image_url\": {
              \"url\": \"${IMAGE_DATA_URL}\"
            }
          }
        ]
      }
    ],
    \"max_tokens\": 4096,
    \"temperature\": 0.2,
    \"top_p\": 0.9
  }"
```

{% endtab %}
{% endtabs %}

## Summary

In this guide, you learned how to:

1. Send images to the LightOnOCR-2-1B model for text extraction
2. Work with both URL-based and base64-encoded local images
3. Process the OCR response from the OpenAI-compatible API

LightOnOCR-2-1B always returns Markdown-formatted text, making it straightforward to integrate OCR output into downstream workflows. This is particularly valuable for document management, data entry automation, and content accessibility.

For more information about other AI capabilities, see our documentation on [<mark style="color:blue;">Text Generation</mark>](https://docs.ionos.com/sections-test/guides/ai/ai-model-hub/how-tos/text-generation), [<mark style="color:blue;">Image Generation</mark>](https://docs.ionos.com/sections-test/guides/ai/ai-model-hub/how-tos/image-generation), and [<mark style="color:blue;">Tool Calling</mark>](https://docs.ionos.com/sections-test/guides/ai/ai-model-hub/how-tos/tool-calling).
