Data Annotation
AI Model Studio for Free: AI Model Studio is available as a free beta offering in the German market from October 1 through December 30, 2025. Register today to fine-tune your first AI model.´
Attention: The AI Model Studio is a free beta. Breaking changes and limited availability of features may occur at any time. It is developed in cooperation with our partners at manufactAI and is based on their innovative fine-tuning solutions. While the service runs entirely on IONOS Cloud infrastructure, our partners manage it at manufactAI, and support for the service is only provided through contacting manufactAI. This service does not accept IONOS Cloud API tokens and log-in credentials.
The quality of your fine-tuned model begins and ends with the quality of your training data. This guide outlines the core principles of data annotation, provides task-specific specifications, and recommends powerful, free, open-source tools to help you create a world-class dataset for our fine-tuning service.
Building the ground truth
When fine-tuning a model using the AI Model Studio for a specific task, whether classifying customer emails or detecting product anomalies, the model needs a precise, reliable "answer key". This answer key is your annotated dataset, often referred to as the Ground Truth.
AI Model Studio is a powerful engine designed to learn complex patterns from this data. However, it cannot correct human errors or resolve ambiguity. While some cases can be realized using synthetic data, high-quality annotation is the single most critical factor determining your model’s accuracy and effectiveness in real-world scenarios and key for successfully implementing complex use cases.
Core principles
High-quality annotation is a structured process that relies on a clear rulebook. Before you begin labeling, you must define an Annotation Schema, a set of exhaustive, unambiguous guidelines.
Principle
Why It Matters for Your Business Case
How to Ensure It
Consistency
If the same invoice or the same type of customer email receives different labels from different people, the model will fail to automate your back-office processes reliably.
Train all annotators (even a single data scientist) using the same documented schema. Use the same labels for the same outcomes every time.
Clarity & Specificity
Ambiguous rules lead to ambiguous data. A lack of rules for edge cases, for example, a "mixed-sentiment" email, results in a model that cannot handle real-world complexity.
Your schema must define every label, every boundary, and provide examples of what to do when something is unclear.
The "Assistant" Role is the Answer
The final assistant
message in your JSON is the model’s target behavior. It must be the exact, desired output, whether it’s a full conversation reply, a simple classification tag, or a structured data block.
The final labeled data must be immediately usable by your business process without further cleanup.
Inter-Annotator Agreement (IAA)
IAA is the benchmark for your rulebook's quality. If your experts consistently disagree on labels, your guidelines are flawed, and your data quality is low.
Measure the agreement between 2-3 annotators on a small sample. If agreement is below $80%$, refine your guidelines before labeling the full dataset.
Annotation specifications by use case
All data must be formatted as JSON Lines, containing an array of messages
, each with a role
and content
.
Text Automation Cases
The system
context defines the AI's job; the user
content provides the input data; and the assistant
content is the correct, desired output.
Use Case
Annotation Focus
Example Data
Dialogue / Chatbot
Full, natural conversation flow; setting persona in the system
message.
{"messages":[{"role":"system","content":"You are a helpful customer support assistant."},{"role":"user","content":"I need help with my order #12345. It hasn't arrived yet."},{"role":"assistant","content":"I'd be happy to help you track order #12345. Let me check the shipping status for you."}]}
Email Classification
The model's final response is the classification. This is often a simple string.
{"messages":[{"role":"system","content":"You are an expert email classification system. Your response must be only one of the following: 'SUPPORT_REQUEST', 'SALES_INQUIRY', or 'GENERAL_FEEDBACK'."},{"role":"user","content":"Hi, can you tell me more about the pricing for your pro plan? Thanks!"},{"role":"assistant","content":"Classification: SALES_INQUIRY"}]}
Structured Extraction
The model is trained to extract key-value pairs, for example, from an invoice or text body, and output a predictable structure.
{"messages":[{"role":"system","content":"You are a data extraction bot. Extract the invoice ID, total amount with currency, and due date from the provided text. Respond only with a valid JSON object."},{"role":"user","content":"Invoice: Thank you for your business. Invoice #4567 is attached. The total due is 129.99 EUR and must be paid by November 15, 2025."},{"role":"assistant","content":"{ 'invoice_id': '4567', 'amount': '129.99 EUR', 'due_date': '2025-11-15' }"}]}
Summarization
The assistant role contains the human-written, high-quality summary of the user's input text.
{"messages":[{"role":"system","content":"You are a text summarization tool. Your task is to provide a concise, neutral summary of the customer's feedback."},{"role":"user","content":"The product quality was excellent—sturdy and well-made. However, the shipping took three weeks longer than promised, which was very frustrating. I would recommend the product, but warn others about the slow delivery."},{"role":"assistant","content":"The customer experienced excellent product quality but was dissatisfied with the long shipping time."}]}
Multimodal (Image and Text)
For any task involving an image, such as quality control, anomaly detection, or visual question answering, the image must be integrated as a Base64
encoded string. Additionally, the input data type must be defined for each message.
{"messages": [{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant converting images to CAD code."}]}, {"role": "user", "content": [{"type": "text", "text": "Generate the CADQuery code needed to create the CAD for the provided image. Just the code, no other words."}, {"type": "image_url", "image_url": {"url": "..."}}]}, {"role": "assistant", "content": [{"type": "text", "text": "code"}]}]}
Therefore, typical use cases that require multi-modal fine-tuning look like this. Keep in mind that complex use cases may require labeling using bound boxes as used in some examples.
Use Case
Annotation Focus
Example Data
CAD Code Generation
Including a CAD image and providing the desired output CADquery code.
{"messages": [{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant converting images to CAD code."}]}, {"role": "user", "content": [{"type": "text", "text": "Generate the CADQuery code needed to create the CAD for the provided image. Just the code, no other words."}, {"type": "image_url", "image_url": {"url": "..."}}]}, {"role": "assistant", "content": [{"type": "text", "text": "import cadquery as cq\n# Generating a workplane for sketch 0\nwp_sketch0 = cq.Workplane(cq.Plane(cq.Vector(-0.421875, -0.75, 0.0), cq.Vector(1.0, 0.0, 0.0),..."}]}]}
Quality Control & Defect Detection
Identifying and localizing specific manufacturing defects (e.g., scratches, cracks) on a part image. Requires a bounding box list.
{"messages": [{"role": "system", "content": [{"type": "text", "text": "You are an AI assistant specialized in computer vision for manufacturing quality control, tasked with identifying and localizing defects."}]}, {"role": "user", "content": [{"type": "text", "text": "Analyze the high-resolution image of the metal component and output the JSON list of defects detected."}, {"type": "image_url", "image_url": {"url": "..."}}]}, {"role": "assistant", "content": [{"type": "text", "text": "[{\"box_2d\": [150, 200, 250, 300], \"label\": \"scratch\"}, {\"box_2d\": [400, 50, 550, 150], \"label\": \"dent\"}, {\"box_2d\": [80, 800, 150, 950], \"label\": \"paint_chip\"}]"}]}]}
Predictive Maintenance
Analyzing a machine's thermal image to classify its state (e.g., Normal, Overheating) and identifying the critical hot spot.
{"messages": [{"role": "system", "content": [{"type": "text", "text": "You are an AI assistant for predictive maintenance, specialized in classifying industrial machine status from thermal images."}]}, {"role": "user", "content": [{"type": "text", "text": "Analyze the provided thermal image of the electric motor and output the overall condition status."}, {"type": "image_url", "image_url": {"url": "..."}}]}, {"role": "assistant", "content": [{"type": "text", "text": "STATUS: Warning (Bearing Overheating)\nMax_Temp_C: 98.5\nLocation: Rear Bearing Housing\nSeverity: High"}]}]}
Input format
Our fine-tuning service requires data in the JSON Lines (.jsonl) format, where each line is a self-contained, valid JSON object.
{"messages":[{"role":"system","content":"You are a helpful customer support assistant."},{"role":"user","content":"I need help with my order #12345. It hasn't arrived yet."},{"role":"assistant","content":"I'd be happy to help you track order #12345. Let me check the shipping status for you."}]}
{"messages":[{"role":"system","content":"You are an AI trained to classify customer feedback."},{"role":"user","content":"The product quality is excellent but the shipping took too long."},{"role":"assistant","content":"Classification: MIXED\nPositive aspects: Product quality\nNegative aspects: Shipping time"}]}
{"messages":[{"role":"system","content":"You are a quality control AI. Classify the image as DEFECT or PASS."},{"role":"user","content":"Inspect the following item: <image:product_001.jpg>"},{"role":"assistant","content":"Classification: PASS"}]}
A example file for a multi-modal case based on GenCAD Code Dataset can be downloaded here.
Annotation tools
We recommend the following community-driven, self-hostable tools to manage your annotation projects and export structured data.
Tool
Focus Areas
Why We Recommend It
Local/Self-Hostable?
Label Studio
Text (NER, Classification, Dialogue), Images
The most versatile platform supporting nearly all our use cases, including complex multi-modal projects. Excellent JSON export capabilities.
Yes (via Docker or pip install
)
Doccano
Text Only (Classification, NER, Seq2Seq)
Simple, fast, and optimized specifically for text annotation. Ideal if your project is purely focused on back-office text automation.
Yes (via Docker or pip install
)
CVAT
Image & Video (Detection, Segmentation, Classification)
Industry-standard for computer vision. Ideal for labeling your industrial quality control and predictive maintenance image datasets.
Yes (via Docker)
Attention: These tools may export your data as a standard JSON
or CSV
file. In this case, you will need to run a small, custom script to convert specific output format into the exact role
/content
JSON Lines structure required by our fine-tuning service. This script acts as the final quality check and conversion step in your data pipeline.
Last updated
Was this helpful?