Optical Character Recognition (OCR)

The IONOS Cloud AI Model Hub supports Optical Character Recognition (OCR) models, such as LightOnOCR-2-1B, a vision-language model that converts documents (PDFs, scans, images) into clean, naturally ordered text. OCR lets you extract editable text from visual content, making it valuable for document digitization, data extraction, and content accessibility.

circle-info

Note: LightOnOCR-2-1B is an end-to-end vision-language model that processes images directly without requiring separate preprocessing or layout detection steps. It handles complex layouts such as tables, forms, receipts, and scientific notation. The model always outputs Markdown-formatted text (including LaTeX spans for mathematical notation); this behavior is embedded in the model weights and cannot be changed via text prompts.

OCR models supporting document conversion

Not all models on the AI Model Hub models list support OCR. LightOnOCR-2-1B is specifically designed for document-to-text tasks. Check the model cards for compatibility details.

Overview

In this guide, you will learn how to integrate the LightOnOCR-2-1B model through the IONOS Cloud OpenAI-compatible API to extract text from images and documents.

This guide is intended for developers with basic knowledge of:

  • REST APIs

  • A programming language capable of making HTTP requests (Python and Bash examples included)

  • IONOS Cloud AI Model Hub's OpenAI-compatible API

Getting started with OCR

First, set up your environment and authenticate using the OpenAI-compatible API endpoint.

Download the respective code files to easily access OCR-specific scripts and examples and generate the intended output:

Download the Python Notebook to explore OCR with ready-to-use examples.

Simple example

Input / Output — quick reference

  • model: The OCR model identifier

  • messages: Array containing the user message with image content

  • image_url: The document image as a URL or base64-encoded data URI

  • max_tokens: Maximum number of tokens in the response

  • temperature: Controls output randomness (lower = more deterministic)

Step 1: Prepare your image input

LightOnOCR-2-1B accepts images in two formats:

  1. URL: Provide a publicly accessible URL to the image.

  2. Base64-encoded data URI: Encode a local image file as a base64 string.

circle-exclamation

Step 2: Make an OCR API request

Send the prepared image to the LightOnOCR-2-1B model via the OpenAI-compatible chat completions endpoint.

Step 3: OCR with local images using base64 encoding

For local image files, encode them as base64 data URIs before sending to the API.

Summary

In this guide, you learned how to:

  1. Send images to the LightOnOCR-2-1B model for text extraction

  2. Work with both URL-based and base64-encoded local images

  3. Process the OCR response from the OpenAI-compatible API

LightOnOCR-2-1B always returns Markdown-formatted text, making it straightforward to integrate OCR output into downstream workflows. This is particularly valuable for document management, data entry automation, and content accessibility.

For more information about other AI capabilities, see our documentation on Text Generation, Image Generation, and Tool Calling.

Last updated

Was this helpful?