Optical Character Recognition (OCR)
The IONOS Cloud AI Model Hub supports Optical Character Recognition (OCR) models, such as LightOnOCR-2-1B, a vision-language model that converts documents (PDFs, scans, images) into clean, naturally ordered text. OCR lets you extract editable text from visual content, making it valuable for document digitization, data extraction, and content accessibility.
Note: LightOnOCR-2-1B is an end-to-end vision-language model that processes images directly without requiring separate preprocessing or layout detection steps. It handles complex layouts such as tables, forms, receipts, and scientific notation. The model always outputs Markdown-formatted text (including LaTeX spans for mathematical notation); this behavior is embedded in the model weights and cannot be changed via text prompts.
OCR models supporting document conversion
Not all models on the AI Model Hub models list support OCR. LightOnOCR-2-1B is specifically designed for document-to-text tasks. Check the model cards for compatibility details.
Overview
In this guide, you will learn how to integrate the LightOnOCR-2-1B model through the IONOS Cloud OpenAI-compatible API to extract text from images and documents.
This guide is intended for developers with basic knowledge of:
REST APIs
A programming language capable of making HTTP requests (Python and Bash examples included)
IONOS Cloud AI Model Hub's OpenAI-compatible API
Getting started with OCR
First, set up your environment and authenticate using the OpenAI-compatible API endpoint.
Download the respective code files to easily access OCR-specific scripts and examples and generate the intended output:
Download the Python Notebook to explore OCR with ready-to-use examples.
Download the standalone Python script for a quick implementation.
Download the Bash script for a command-line implementation.
Simple example
Input / Output — quick reference
model: The OCR model identifiermessages: Array containing the user message with image contentimage_url: The document image as a URL or base64-encoded data URImax_tokens: Maximum number of tokens in the responsetemperature: Controls output randomness (lower = more deterministic)
choices[0].message.content: The extracted text from the imageusage: Token counts for billing purposes
Step 1: Prepare your image input
LightOnOCR-2-1B accepts images in two formats:
URL: Provide a publicly accessible URL to the image.
Base64-encoded data URI: Encode a local image file as a base64 string.
Payload size limit: The maximum allowed request payload size is 20 MB. When using base64-encoded images, make sure the encoded content does not exceed this limit.
Step 2: Make an OCR API request
Send the prepared image to the LightOnOCR-2-1B model via the OpenAI-compatible chat completions endpoint.
Step 3: OCR with local images using base64 encoding
For local image files, encode them as base64 data URIs before sending to the API.
Summary
In this guide, you learned how to:
Send images to the LightOnOCR-2-1B model for text extraction
Work with both URL-based and base64-encoded local images
Process the OCR response from the OpenAI-compatible API
LightOnOCR-2-1B always returns Markdown-formatted text, making it straightforward to integrate OCR output into downstream workflows. This is particularly valuable for document management, data entry automation, and content accessibility.
For more information about other AI capabilities, see our documentation on Text Generation, Image Generation, and Tool Calling.
Last updated
Was this helpful?