# Llama 3.1 8B

**Summary:** Llama 3.1 8B is a compact, highly efficient language model from Meta's flagship Llama family, optimized for conversational agents and real-time applications. With an impressive 128k token context window and robust multilingual support, this model delivers exceptional performance for chatbots, virtual assistants, and interactive applications where speed, responsiveness, and accuracy are crucial while maintaining high-quality natural language understanding.

|                   **Intelligence**                  |                                                                **Speed**                                                               |                   **Sovereignty**                  |                                                                 **Input**                                                                 |                                                                 **Output**                                                                |
| :-------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: |
| ![Intelligence active](/files/dnDi7yuqXqkBFqwaxdnm) | ![Speed active](/files/evfYW3bq4dTBLlZH3dQf) ![Speed active](/files/evfYW3bq4dTBLlZH3dQf) ![Speed active](/files/evfYW3bq4dTBLlZH3dQf) | ![Sovereignty active](/files/bNpzGRJfez9SidEjNCoy) | ![Text active](/files/45qlqURbT8c2Ekr8HJfK) ![Image inactive](/files/0mPVwOtrYhZrpz9clC3D) ![Audio inactive](/files/PRglWWEC5Zoc5fgynNLM) | ![Text active](/files/45qlqURbT8c2Ekr8HJfK) ![Image inactive](/files/0mPVwOtrYhZrpz9clC3D) ![Audio inactive](/files/PRglWWEC5Zoc5fgynNLM) |
|                        *Low*                        |                                                                 *High*                                                                 |                        *Low*                       |                                                                   *Text*                                                                  |                                                                   *Text*                                                                  |

## Central parameters

**Description:** Latest small-sized model from Meta's Llama 3.1 series with optimized architecture for efficient inference.

**Model identifier:** `meta-llama/Meta-Llama-3.1-8B-Instruct`

## IONOS CLOUD AI Model Hub Lifecycle and Alternatives

| **IONOS start date** | **End of Life** |                                            **Alternative**                                            | **Successor** |
| :------------------: | :-------------: | :---------------------------------------------------------------------------------------------------: | :-----------: |
|    *July 1, 2024*    |       N/A       | [<mark style="color:blue;">**Nemo (12B)**</mark>](/cloud/ai/ai-model-hub/models/llms/mistral-nemo.md) |               |

## Origin

|                            **Provider**                            | **Country** |                                       **License**                                      | **Flavor** |   **Release**   |
| :----------------------------------------------------------------: | :---------: | :------------------------------------------------------------------------------------: | :--------: | :-------------: |
| [<mark style="color:blue;">**Meta**</mark>](https://www.meta.com/) |     USA     | [<mark style="color:blue;">**License**</mark>](https://llama.meta.com/llama3/license/) |  Instruct  | *July 23, 2024* |

## Technology

| **Context window** | **Parameters** | **Quantization** | **Multilingual** |                                              **Further details**                                             |
| :----------------: | :------------: | :--------------: | :--------------: | :----------------------------------------------------------------------------------------------------------: |
|       *128k*       |     *8.03B*    |       *fp8*      |       *Yes*      | [<mark style="color:blue;">**Hugging Face**</mark>](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |

## Modalities

|     **Text**     |   **Image**   |   **Audio**   |
| :--------------: | :-----------: | :-----------: |
| Input and output | Not supported | Not supported |

## Endpoints

| **Chat Completions** | **Embeddings** | **Image generation** |
| :------------------: | :------------: | :------------------: |
|  v1/chat/completions |  Not supported |     Not supported    |

## Features

| **Streaming** | **Reasoning** | **Tool calling** |
| :-----------: | :-----------: | :--------------: |
|   Supported   | Not supported |     Supported    |

## Usage example

### Chat completions

The following example demonstrates how to use **Llama 3.1 8B** for conversational tasks.

**API Endpoint:** `POST https://openai.inference.de-txl.ionos.com/v1/chat/completions`

**Request:**

```json
{
  "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "Compose a short poem about the sea."
    }
  ],
  "temperature": 0.7,
  "max_tokens": 100
}
```

**Response:**

```json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The waves whisper secrets to the sand,\nA salt-kissed breeze sweeps across the land.\nBlue horizons stretch endlessly wide,\nWith the rhythm of the eternal tide."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 30,
    "total_tokens": 45
  }
}
```

## Rate limits

Rate limits ensure fair usage and reliable access to the AI Model Hub. In addition to the [<mark style="color:blue;">contract-wide rate limits</mark>](/cloud/ai/ai-model-hub/how-tos/rate-limits.md), no model-specific limits apply.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/meta-llama-3-1-8b.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.