# Llama 3.3 70B

**Summary:** Llama 3.3 70B is a breakthrough medium-sized language model that delivers flagship-level quality traditionally associated with 405B models while maintaining significantly improved efficiency. This advanced model excels in complex reasoning, nuanced language understanding, and sophisticated problem-solving, making it ideal for enterprise applications, advanced chatbots, content generation, and professional AI assistants that demand high-quality responses without the computational overhead of larger models.

|                                                                       **Intelligence**                                                                      |                                         **Speed**                                         |                   **Sovereignty**                  |                                                                 **Input**                                                                 |                                                                 **Output**                                                                |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------: | :------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: |
| ![Intelligence active](/files/dnDi7yuqXqkBFqwaxdnm) ![Intelligence active](/files/dnDi7yuqXqkBFqwaxdnm) ![Intelligence active](/files/dnDi7yuqXqkBFqwaxdnm) | ![Speed active](/files/evfYW3bq4dTBLlZH3dQf) ![Speed active](/files/evfYW3bq4dTBLlZH3dQf) | ![Sovereignty active](/files/bNpzGRJfez9SidEjNCoy) | ![Text active](/files/45qlqURbT8c2Ekr8HJfK) ![Image inactive](/files/0mPVwOtrYhZrpz9clC3D) ![Audio inactive](/files/PRglWWEC5Zoc5fgynNLM) | ![Text active](/files/45qlqURbT8c2Ekr8HJfK) ![Image inactive](/files/0mPVwOtrYhZrpz9clC3D) ![Audio inactive](/files/PRglWWEC5Zoc5fgynNLM) |
|                                                                            *High*                                                                           |                                          *Medium*                                         |                        *Low*                       |                                                                   *Text*                                                                  |                                                                   *Text*                                                                  |

## Central parameters

**Description:** Latest text-only model from Meta with 70B parameters, benchmarked to achieve 405B-level quality at 70B inference speeds.

**Model identifier:** `meta-llama/Llama-3.3-70B-Instruct`

## IONOS CLOUD AI Model Hub Lifecycle and Alternatives

| **IONOS Launch** | **End of Life** |                                                 **Alternative**                                                | **Successor** |
| :--------------: | :-------------: | :------------------------------------------------------------------------------------------------------------: | :-----------: |
| *March 15, 2025* |       N/A       | [<mark style="color:blue;">**gpt-oss-120b**</mark>](/cloud/ai/ai-model-hub/models/llms/openai-gpt-oss-120b.md) |               |

## Origin

The model available in AI Model Hub is an optimized variant, quantized by <code class="expression">space.vars.ionos\_cloud</code> for high performance.

### IONOS Variant

| **Provider** |  **Modification** |   **Release**  |
| :----------: | :---------------: | :------------: |
|     IONOS    | INT4 Quantization | *May 20, 2025* |

### Base model

|                            **Provider**                            | **Country** |                                        **License**                                       | **Flavor** |     **Release**    |
| :----------------------------------------------------------------: | :---------: | :--------------------------------------------------------------------------------------: | :--------: | :----------------: |
| [<mark style="color:blue;">**Meta**</mark>](https://www.meta.com/) |     USA     | [<mark style="color:blue;">**License**</mark>](https://llama.meta.com/llama3_3/license/) |  Instruct  | *December 9, 2024* |

## Technology

| **Context window** | **Parameters** | **Quantization** | **Multilingual** |                                              **Further details**                                             |
| :----------------: | :------------: | :--------------: | :--------------: | :----------------------------------------------------------------------------------------------------------: |
|       *128k*       |     *70.6B*    |       *fp8*      |       *Yes*      | [<mark style="color:blue;">**Hugging Face**</mark>](https://huggingface.co/ionos/Llama-3.3-70B-Instruct-FP8) |

## Modalities

|     **Text**     |   **Image**   |   **Audio**   |
| :--------------: | :-----------: | :-----------: |
| Input and output | Not supported | Not supported |

## Endpoints

| **Chat Completions** | **Embeddings** | **Image generation** |
| :------------------: | :------------: | :------------------: |
|  v1/chat/completions |  Not supported |     Not supported    |

## Features

| **Streaming** | **Reasoning** | **Tool calling** |
| :-----------: | :-----------: | :--------------: |
|   Supported   | Not supported |     Supported    |

## Usage example

### Chat completions

The following example demonstrates how to use **Structured Outputs** to extract specific entities from unstructured text into a predefined JSON schema.

**API Endpoint:** `POST https://openai.inference.de-txl.ionos.com/v1/chat/completions`

**Request:**

```json
{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "Extract the name, age, and occupation from this text - \"John Doe is a 30-year-old software engineer from Berlin.\""
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "person_info",
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "integer" },
          "occupation": { "type": "string" }
        },
        "required": ["name", "age", "occupation"],
        "additionalProperties": false
      },
      "strict": true
    }
  },
  "temperature": 0.1,
  "max_completion_tokens": 500
}
```

**Response:**

```json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "{\"name\": \"John Doe\", \"age\": 30, \"occupation\": \"software engineer\"}"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 12,
    "total_tokens": 32
  }
}
```

## Rate limits

Rate limits ensure fair usage and reliable access to the AI Model Hub. In addition to the [<mark style="color:blue;">contract-wide rate limits</mark>](/cloud/ai/ai-model-hub/how-tos/rate-limits.md), no model-specific limits apply.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/meta-llama-3-3-70b.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.