GPT-OSS 120B

Summary: GPT-OSS 120B is a cutting-edge open-source Mixture of Experts (MoE) model, specifically optimized for agent workflows and complex reasoning tasks. This medium-sized model combines the efficiency of selective parameter activation with exceptional language understanding, making it ideal for research, development, and production environments where transparency, customization, and sophisticated AI capabilities are essential.

Intelligence

Speed

Sovereignty

Input

Output

High

Medium

Medium

Text

Text

Central parameters

Description: Open-source Mixture-of-Experts architecture with efficient expert routing for optimized inference performance.

Model identifier: openai/gpt-oss-120b

IONOS AI Model Hub Lifecycle and Alternatives

IONOS Launch

End of Life

Alternative

Successor

August 12, 2025

N/A

Origin

Provider

Country

License

Flavor

Release

USA

Base

August 2025

Technology

Context window

Parameters

Quantization

Multilingual

Further details

128k

120B

MXFP4

Yes

Modalities

Text

Image

Audio

Input and output

Not supported

Not supported

Endpoints

Chat Completions

Embeddings

Image generation

v1/chat/completions

Not supported

Not supported

Features

Streaming

Reasoning

Tool calling

Supported

Supported

Supported

Reasoning Example

GPT-OSS 120B supports advanced reasoning capabilities with configurable reasoning effort. Control how deeply the model thinks by setting the reasoning_effort parameter to low, medium (default), or high. The model's reasoning process is included in the output response.

Reasoning Effort Levels:

  • Low: Fast responses with minimal internal reasoning. Best for straightforward questions and when speed is prioritized. Uses fewer tokens.

  • Medium (default): Balanced approach with moderate reasoning depth. Suitable for most use cases requiring thoughtful responses.

  • High: Deep analytical thinking with extensive internal reasoning. Ideal for complex problem-solving, mathematical proofs, and multi-step reasoning tasks. Uses more tokens due to an extended reasoning process.

Higher reasoning effort levels result in more comprehensive analysis but consume additional tokens and increase response time. The reasoning tokens are included in token usage.

Request

Response (shortened for readability)

Rate limits

Rate limits ensure fair usage and reliable access to the AI Model Hub. In addition to the contract-wide rate limits, no model-specific limits apply.

Last updated

Was this helpful?