# Llama 3.1 405B

**Summary:** Llama 3.1 405B is Meta's flagship large language model representing the pinnacle of open-source AI capabilities with exceptional reasoning abilities and comprehensive knowledge coverage. This massive model excels in the most demanding AI applications including advanced research, complex problem-solving, sophisticated content creation, and enterprise-grade AI solutions where maximum intelligence and accuracy are paramount, despite longer inference times inherent to its large-scale architecture.

|                                                                                                                                                                                                                                                                                                                              **Intelligence**                                                                                                                                                                                                                                                                                                                              |                                                                                                     **Speed**                                                                                                     |                                                                                                     **Sovereignty**                                                                                                     |                                                                                                                                                                                                                                                                                                                       **Input**                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                      **Output**                                                                                                                                                                                                                                                                                                                      |
| :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-b23196ddc0cba1be0b981aa5572379cec1538be3%2Fai-model-hub-intelligence.png?alt=media) ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-b23196ddc0cba1be0b981aa5572379cec1538be3%2Fai-model-hub-intelligence.png?alt=media) ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-b23196ddc0cba1be0b981aa5572379cec1538be3%2Fai-model-hub-intelligence.png?alt=media) | ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-be3201cb2eba83650220699adf5b3d9120c83377%2Fai-model-hub-speed.png?alt=media) | ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-2c04b225a16490c4ff8c3e062bb166f25e05e1c2%2Fai-model-hub-sovereignty.png?alt=media) | ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-cc6707e286bceb4641047e45e095950e8db880fd%2Fai-model-hub-text.png?alt=media) ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-bac2752d06f18e86dc7f0b9531ab32ec58f30aec%2Fai-model-hub-image.png?alt=media) ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-8b0332538fcac6644893a504f9fbbd1ba2b56d21%2Fai-model-hub-audio.png?alt=media) | ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-cc6707e286bceb4641047e45e095950e8db880fd%2Fai-model-hub-text.png?alt=media) ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-bac2752d06f18e86dc7f0b9531ab32ec58f30aec%2Fai-model-hub-image.png?alt=media) ![](https://1737632334-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MifAzdGvKLDTtvJP8sm%2Fuploads%2Fgit-blob-8b0332538fcac6644893a504f9fbbd1ba2b56d21%2Fai-model-hub-audio.png?alt=media) |
|                                                                                                                                                                                                                                                                                                                                   *High*                                                                                                                                                                                                                                                                                                                                   |                                                                                                       *Low*                                                                                                       |                                                                                                          *Low*                                                                                                          |                                                                                                                                                                                                                                                                                                                        *Text*                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                        *Text*                                                                                                                                                                                                                                                                                                                        |

## Central parameters

**Description:** Largest open-source model from Meta with 405B parameters, optimized with FP8 quantization for maximum intelligence and knowledge coverage.

**Model identifier:** `meta-llama/Meta-Llama-3.1-405B-Instruct-FP8`

## IONOS AI Model Hub Lifecycle and Alternatives

| **IONOS Launch** | **End of Life** |                                                                                                                                            **Alternative**                                                                                                                                           | **Successor** |
| :--------------: | :-------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------: |
| *August 1, 2024* |       N/A       | [<mark style="color:blue;">**Llama 3.3 (70B)**</mark>](https://docs.ionos.com/sections-test/guides/ai/ai-model-hub/models/llms/meta-llama-3-3-70b), [<mark style="color:blue;">**GPT-OSS 120B**</mark>](https://docs.ionos.com/sections-test/guides/ai/ai-model-hub/models/llms/openai-gpt-oss-120b) |               |

## Origin

|                            **Provider**                            | **Country** |                                       **License**                                      | **Flavor** |   **Release**   |
| :----------------------------------------------------------------: | :---------: | :------------------------------------------------------------------------------------: | :--------: | :-------------: |
| [<mark style="color:blue;">**Meta**</mark>](https://www.meta.com/) |     USA     | [<mark style="color:blue;">**License**</mark>](https://llama.meta.com/llama3/license/) |  Instruct  | *July 23, 2024* |

## Technology

| **Context window** | **Parameters** | **Quantization** | **Multilingual** |                                               **Further details**                                              |
| :----------------: | :------------: | :--------------: | :--------------: | :------------------------------------------------------------------------------------------------------------: |
|       *128k*       |     *406B*     |      *int4*      |       *Yes*      | [<mark style="color:blue;">**Hugging Face**</mark>](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) |

## Modalities

|     **Text**     |   **Image**   |   **Audio**   |
| :--------------: | :-----------: | :-----------: |
| Input and output | Not supported | Not supported |

## Endpoints

| **Chat Completions** | **Embeddings** | **Image generation** |
| :------------------: | :------------: | :------------------: |
|  v1/chat/completions |  Not supported |     Not supported    |

## Features

| **Streaming** | **Reasoning** | **Tool calling** |
| :-----------: | :-----------: | :--------------: |
|   Supported   | Not supported |     Supported    |

## Usage example

### Chat completions

The following example demonstrates how to use **Llama 3.1 405B** for complex reasoning tasks.

**API Endpoint:** `POST https://openai.inference.de-txl.ionos.com/v1/chat/completions`

**Request:**

```json
{
  "model": "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8",
  "messages": [
    {
      "role": "user",
      "content": "Explain the concept of quantum entanglement to a 5-year-old using simple analogies."
    }
  ],
  "temperature": 0.7,
  "max_tokens": 100
}
```

**Response:**

```json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Imagine you have two magic dice. No matter how far apart they are—even if one is on Earth and the other is on Mars—if you roll a 6 on one, the other one will instantly show a 6 too! They are connected in a special way that lets them 'talk' to each other instantly."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 60,
    "total_tokens": 85
  }
}
```

## Troubleshooting

### Infinite or repetitive response loops

Llama 3.1 405B can produce repetitive output that does not terminate naturally. When "max\_tokens" is set to the context window maximum (128000) or left unconfigured, the model continues generating until it hits that context window ceiling.

**Recommended mitigations:**

1. **Set max\_tokens explicitly.** Avoid setting `max_tokens` (or `max_completion_tokens`) to the full context window (128000). Instead, limit the value to match your specific use case:

   | **Use case**                               | **Recommended value** |
   | ------------------------------------------ | :-------------------: |
   | Conversational use and short responses     |          2048         |
   | Detailed analysis and code generation      |          8192         |
   | Long-form documents and research summaries |         16384         |

{% hint style="info" %}
**Note:** Use values exceeding 16384 only when the task strictly requires them. When doing so, always implement stop sequences (see Step 2) to ensure the model terminates correctly.
{% endhint %}

2. **Add explicit stop sequences.** Add both Llama 3 end-of-turn tokens as stop strings in your request:

```json
"stop": ["<|eot_id|>", "<|end_of_text|>"]
```

3. **Use sampling instead of greedy decoding.** Avoid combining `temperature: 0` with ambiguous or contradictory prompts, as this often triggers infinite loops. Instead, use the following:

   ```json
   "temperature": 0.6,
   "top_p": 0.9
   ```
4. **Apply a frequency penalty.** Setting `frequency_penalty` to a small positive value reduces the likelihood of the model repeating the same tokens. A value between `0.1` and `0.3` is effective for most use cases.

   ```json
   "frequency_penalty": 0.1
   ```

**Example request with all mitigations applied:**

```json
{
  "model": "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8",
  "messages": [{ "role": "user", "content": "Hello" }],
  "max_tokens": 2048,
  "temperature": 0.6,
  "top_p": 0.9,
  "frequency_penalty": 0.1,
  "stop": ["<|eot_id|>", "<|end_of_text|>"]
}
```

## Rate limits

Rate limits ensure fair usage and reliable access to the AI Model Hub. In addition to the [<mark style="color:blue;">contract-wide rate limits</mark>](https://docs.ionos.com/sections-test/guides/ai/ai-model-hub/how-tos/rate-limits), no model-specific limits apply.
