Qwen3 VL Embedding 8B
Summary: Qwen3 VL Embedding 8B is a multimodal embedding model by Alibaba's Qwen team that generates semantic vector representations from both text and images. Supporting over 30 languages and a 32,768-token context window, this model excels in multimodal search, visual document retrieval, and cross-modal semantic matching, making it ideal for applications that require understanding across both textual and visual content such as image-text retrieval, screenshot search, and PDF document discovery.
Intelligence
Speed
Sovereignty
Input
Output
![]()
![]()
![]()
![]()
![]()
High
Medium
Medium
Text, Image
Number Vector
Central parameters
Description: Multimodal embedding model by Alibaba's Qwen team, generating 4096-dimensional vectors from text and image inputs across 30+ languages.
Model identifier: Qwen/Qwen3-VL-Embedding-8B
IONOS CLOUD AI Model Hub Lifecycle and Alternatives
IONOS Launch
End of Life
Alternative
Successor
May 12, 2026
N/A
Origin
Provider
Country
License
Flavor
Release
Technology
Input Length
Parameters
Tensor Type
Multilingual
Further details
Input and output: Each input produces one embedding vector, regardless of modality. A document can be text-only, image-only, or a combination of text and image, each produces a single vector up to 4096 dimensions.
Modalities
Text
Image
Audio
Input and output
Input
Not supported
Endpoints
Chat Completions
Embeddings
Image generation
Not supported
v1/embeddings
Not supported
Features
Streaming
Reasoning
Tool calling
Not supported
Not supported
Not supported
Usage examples
Text embeddings
The following example demonstrates how to generate text embeddings using Qwen3 VL Embedding 8B.
API Endpoint: POST https://openai.inference.de-txl.ionos.com/v1/embeddings
Request:
Response:
Multimodal embeddings
The following example demonstrates how to generate embeddings from combined text and image input using Qwen3 VL Embedding 8B.
Request:
Response:
Rate limits
Rate limits ensure fair usage and reliable access to the AI Model Hub. In addition to the contract-wide rate limits, no model-specific limits apply.
Last updated
Was this helpful?