GPT-OSS 120B
Summary: GPT-OSS 120B is a cutting-edge open-source Mixture of Experts (MoE) model, specifically optimized for agent workflows and complex reasoning tasks. This medium-sized model combines the efficiency of selective parameter activation with exceptional language understanding, making it ideal for research, development, and production environments where transparency, customization, and sophisticated AI capabilities are essential.
Intelligence
Speed
Sovereignty
Input
Output
![]()
![]()
![]()
![]()
![]()
High
Medium
Medium
Text
Text
Central parameters
Description: Open-source Mixture-of-Experts architecture with efficient expert routing for optimized inference performance.
Model identifier: openai/gpt-oss-120b
IONOS AI Model Hub Lifecycle and Alternatives
IONOS Launch
End of Life
Alternative
Successor
Origin
Provider
Country
License
Flavor
Release
Technology
Context window
Parameters
Quantization
Multilingual
Further details
Modalities
Text
Image
Audio
Input and output
Not supported
Not supported
Endpoints
Chat Completions
Embeddings
Image generation
v1/chat/completions
Not supported
Not supported
Features
Streaming
Reasoning
Tool calling
Supported
Supported
Supported
Reasoning Example
GPT-OSS 120B supports advanced reasoning capabilities with configurable reasoning effort. Control how deeply the model thinks by setting the reasoning_effort parameter to low, medium (default), or high. The model's reasoning process is included in the output response.
Reasoning Effort Levels:
Low: Fast responses with minimal internal reasoning. Best for straightforward questions and when speed is prioritized. Uses fewer tokens.
Medium (default): Balanced approach with moderate reasoning depth. Suitable for most use cases requiring thoughtful responses.
High: Deep analytical thinking with extensive internal reasoning. Ideal for complex problem-solving, mathematical proofs, and multi-step reasoning tasks. Uses more tokens due to an extended reasoning process.
Higher reasoning effort levels result in more comprehensive analysis but consume additional tokens and increase response time. The reasoning tokens are included in token usage.
Request
Response (shortened for readability)
Rate limits
Rate limits ensure fair usage and reliable access to the AI Model Hub. In addition to the contract-wide rate limits, no model-specific limits apply.
Last updated
Was this helpful?