GPT-OSS 120B

Summary: GPT-OSS 120B is a cutting-edge open-source Mixture of Experts (MoE) model, specifically optimized for agent workflows and complex reasoning tasks. This medium-sized model combines the efficiency of selective parameter activation with exceptional language understanding, making it ideal for research, development, and production environments where transparency, customization, and sophisticated AI capabilities are essential.

Intelligence

Speed

Sovereignty

Input

Output

High

Medium

Text

Central parameters

Description: Open-source Mixture-of-Experts architecture with efficient expert routing for optimized inference performance.

Model identifier: openai/gpt-oss-120b

IONOS AI Model Hub Lifecycle and Alternatives

IONOS Launch

End of Life

Alternative

Successor

August 12, 2025

N/A

Llama 3.3 70B

Origin

Provider

Country

License

Flavor

Release

OpenAI

USA

Apache 2.0

Base

August 2025

Technology

Context window

Parameters

Quantization

Multilingual

Further details

128k

120B

MXFP4

Yes

Hugging Face

Modalities

Text

Image

Audio

Input and output

Not supported

Endpoints

Chat Completions

Embeddings

Image generation

v1/chat/completions

Not supported

Features

Streaming

Reasoning

Tool calling

Supported

Reasoning Example

GPT-OSS 120B supports advanced reasoning capabilities with configurable reasoning effort. Control how deeply the model thinks by setting the reasoning_effort parameter to low, medium (default), or high. The model's reasoning process is included in the output response.

Reasoning Effort Levels:

Low: Fast responses with minimal internal reasoning. Best for straightforward questions and when speed is prioritized. Uses fewer tokens.
Medium (default): Balanced approach with moderate reasoning depth. Suitable for most use cases requiring thoughtful responses.
High: Deep analytical thinking with extensive internal reasoning. Ideal for complex problem-solving, mathematical proofs, and multi-step reasoning tasks. Uses more tokens due to an extended reasoning process.

Higher reasoning effort levels result in more comprehensive analysis but consume additional tokens and increase response time. The reasoning tokens are included in token usage.

Request

{
  "stream": false,
  "model": "openai/gpt-oss-120b",
  "reasoning_effort": "low",
  "messages": [
    {
      "role": "user",
      "content": "Answer me with one letter. Maybe A."
    }
  ]
}

Response (shortened for readability)

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "A",
        "reasoning": "User asks to answer with one letter, maybe A. So respond with a single letter."
      }
    }
  ]
}

Rate limits

Rate limits ensure fair usage and reliable access to the AI Model Hub. In addition to the contract-wide rate limits, no model-specific limits apply.

PreviousLlama 3.3 70B NextLlama 3.1 405B

Last updated 3 months ago

Was this helpful?

Good night

hashtagCentral parameters

hashtagIONOS AI Model Hub Lifecycle and Alternatives

hashtagOrigin

hashtagTechnology

hashtagModalities

hashtagEndpoints

hashtagFeatures

hashtagReasoning Example

hashtagRequest

hashtagResponse (shortened for readability)

hashtagRate limits