Llama 3.3 70B

Summary: Llama 3.3 70B is a breakthrough medium-sized language model that delivers flagship-level quality traditionally associated with 405B models while maintaining significantly improved efficiency. This advanced model excels in complex reasoning, nuanced language understanding, and sophisticated problem-solving, making it ideal for enterprise applications, advanced chatbots, content generation, and professional AI assistants that demand high-quality responses without the computational overhead of larger models.

Intelligence

Speed

Sovereignty

Input

Output

High

Medium

Low

Text

Central parameters

Description: Latest text-only model from Meta with 70B parameters, benchmarked to achieve 405B-level quality at 70B inference speeds.

Model identifier: meta-llama/Llama-3.3-70B-Instruct

IONOS AI Model Hub Lifecycle and Alternatives

IONOS Launch

End of Life

Alternative

Successor

March 15, 2025

N/A

gpt-oss-120b

Origin

The model available in AI Model Hub is an optimized variant, quantized by IONOS Cloud for high performance.

IONOS Variant

Provider

Modification

Release

IONOS

FP8 Quantization

May 20, 2025

Base Model

Provider

Country

License

Flavor

Release

Meta

USA

License

Instruct

December 9, 2024

Technology

Context window

Parameters

Quantization

Multilingual

Further details

128k

70.6B

fp8

Yes

Hugging Face

Modalities

Text

Image

Audio

Input and output

Not supported

Endpoints

Chat Completions

Embeddings

Image generation

v1/chat/completions

Not supported

Features

Streaming

Reasoning

Tool calling

Supported

Not supported

Supported

Usage example

Chat completions

The following example demonstrates how to use Structured Outputs to extract specific entities from unstructured text into a predefined JSON schema.

API Endpoint: POST https://openai.inference.de-txl.ionos.com/v1/chat/completions

Request:

{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "Extract the name, age, and occupation from this text - \"John Doe is a 30-year-old software engineer from Berlin.\""
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "person_info",
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "integer" },
          "occupation": { "type": "string" }
        },
        "required": ["name", "age", "occupation"],
        "additionalProperties": false
      },
      "strict": true
    }
  },
  "temperature": 0.1,
  "max_completion_tokens": 500
}

Response:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "{\"name\": \"John Doe\", \"age\": 30, \"occupation\": \"software engineer\"}"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 12,
    "total_tokens": 32
  }
}

Rate limits

Rate limits ensure fair usage and reliable access to the AI Model Hub. In addition to the contract-wide rate limits, no model-specific limits apply.

PreviousMistral Small 24B NextGPT-OSS 120B

Last updated 11 days ago

Was this helpful?

Good morning

hashtagCentral parameters

hashtagIONOS AI Model Hub Lifecycle and Alternatives

hashtagOrigin

hashtagIONOS Variant

hashtagBase Model

hashtagTechnology

hashtagModalities

hashtagEndpoints

hashtagFeatures

hashtagUsage example

hashtagChat completions

hashtagRate limits

Central parameters

IONOS AI Model Hub Lifecycle and Alternatives

Origin

IONOS Variant

Base Model

Technology

Modalities

Endpoints

Features

Usage example

Chat completions

Rate limits