OpenAI Compatible Endpoints
Endpoints compatible with OpenAI's API specification
Create Chat Completions by calling an available model in a format that is compatible with the OpenAI API
ID of the model to use
The sampling temperature to be used
1
An alternative to sampling with temperature
-1
The number of chat completion choices to generate for each input message
1
If set to true, it sends partial message deltas
false
Up to 4 sequences where the API will stop generating further tokens
The maximum number of tokens to generate in the chat. This value is now deprecated in favor of max_completion_tokens completion
16
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens
16
It is used to penalize new tokens based on their existence in the text so far
0
It is used to penalize new tokens based on their frequency in the text so far
0
Used to modify the probability of specific tokens appearing in the completion
A unique identifier representing your end-user
Controls which (if any) tool is called by the model.
none
means the model will not call any tool and instead generates a message.
auto
means the model can pick between generating a message or calling one or more tools.
required
means the model must call one or more tools.
Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}}
forces the model to call that tool.
none
is the default when no tools are present. auto
is the default if tools are present.
none
means the model will not call any tool and instead generates a message. auto
means the model can pick between generating a message or calling one or more tools. required
means the model must call one or more tools.
POST /v1/chat/completions HTTP/1.1
Host: openai.inference.de-txl.ionos.com
Authorization: Bearer JWT
Content-Type: application/json
Accept: */*
Content-Length: 326
{
"model": "meta-llama/Meta-Llama-3-70B-Instruct",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Please say hello."
}
],
"temperature": 0.7,
"top_p": 0.9,
"n": 1,
"stream": false,
"stop": [
"\n"
],
"max_tokens": 1000,
"presence_penalty": 0,
"frequency_penalty": 0,
"logit_bias": {},
"user": "user-123"
}
{
"id": "text",
"choices": [
{
"finish_reason": "text",
"index": 1,
"message": {
"role": "text",
"content": "text",
"tool_calls": [
{
"id": "text",
"type": "function",
"function": {
"name": "text",
"arguments": "text"
}
}
]
}
}
],
"created": 1,
"object": "text",
"model": "text",
"system_fingerprint": "text",
"usage": {
"prompt_tokens": 1,
"completion_tokens": 1,
"total_tokens": 1
}
}
Create Completions by calling an available model in a format that is compatible with the OpenAI API
ID of the model to use
The prompt to generate completions from
The sampling temperature to be used
An alternative to sampling with temperature
The number of chat completion choices to generate for each input message
If set to true, it sends partial message deltas
Up to 4 sequences where the API will stop generating further tokens
The maximum number of tokens to generate in the chat completion
It is used to penalize new tokens based on their existence in the text so far
It is used to penalize new tokens based on their frequency in the text so far
Used to modify the probability of specific tokens appearing in the completion
A unique identifier representing your end-user
POST /v1/completions HTTP/1.1
Host: openai.inference.de-txl.ionos.com
Authorization: Bearer JWT
Content-Type: application/json
Accept: */*
Content-Length: 239
{
"model": "meta-llama/Meta-Llama-3-70B-Instruct",
"prompt": "Say this is a test",
"temperature": 0.01,
"top_p": 0.9,
"n": 1,
"stream": false,
"stop": [
"\n"
],
"max_tokens": 1000,
"presence_penalty": 0,
"frequency_penalty": 0,
"logit_bias": {},
"user": "user-123"
}
{
"id": "text",
"choices": [
{
"finish_reason": "text",
"index": 1,
"text": "text"
}
],
"created": 1,
"object": "text",
"model": "text",
"usage": {
"prompt_tokens": 1,
"completion_tokens": 1,
"total_tokens": 1
}
}
Get the entire list of available models in a format that is compatible with the OpenAI API
GET /v1/models HTTP/1.1
Host: openai.inference.de-txl.ionos.com
Authorization: Bearer JWT
Accept: */*
Successful operation
No content
Generate one or more images using a model in a format that is compatible with the OpenAI API
ID of the model to use. Please check /v1/models for available models
The prompt to generate images from
The number of images to generate. Defaults to 1.
1
The size of the image to generate.
Defaults to "1024*1024"
.
Must be one of "1024*1024"
, "1792*1024"
, or "1024*1792"
.
The maximum supported resolution is "1792*1024"
1024*1024
The format of the response.
b64_json
Possible values: A unique identifier representing your end-user
POST /v1/images/generations HTTP/1.1
Host: openai.inference.de-txl.ionos.com
Authorization: Bearer JWT
Content-Type: application/json
Accept: */*
Content-Length: 151
{
"model": "stabilityai/stable-diffusion-xl-base-1.0",
"prompt": "A beautiful sunset over the ocean",
"n": 1,
"size": "1024*1024",
"response_format": "b64_json"
}
{
"created": 1,
"data": [
{
"url": "text",
"b64_json": "text",
"revised_prompt": "text"
}
]
}
Creates an embedding vector representing the input text.
ID of the model to use. Please check /v1/models for available models
The input text to create an embedding for (single string)
The input text to create embeddings for (list of strings)
POST /v1/embeddings HTTP/1.1
Host: openai.inference.de-txl.ionos.com
Authorization: Bearer JWT
Content-Type: application/json
Accept: */*
Content-Length: 83
{
"input": [
"The food was delicious and the waiter."
],
"model": "intfloat/e5-large-v2"
}
{
"model": "text",
"object": "text",
"data": [
{
"index": 1,
"object": "text",
"embedding": [
1
]
}
],
"usage": {
"prompt_tokens": 1,
"total_tokens": 1
}
}
Was this helpful?