POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://llm-gateway.heurist.xyz/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "hermes-3-llama3.1-8b",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "max_tokens": 1024,
  "temperature": 0.5,
  "stream": false
}'
{
  "id": "<string>",
  "object": "chat.completion",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "<string>"
      },
      "finish_reason": "stop",
      "index": 123
    }
  ]
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
Chat completion request parameters
model
enum<string>
required

ID of the model to use

Available options:
hermes-3-llama3.1-8b,
meta-llama/llama-3.3-70b-instruct,
mistralai/mistral-7b-instruct,
mistralai/mixtral-8x22b-instruct,
mistralai/mixtral-8x7b-instruct,
nvidia/llama-3.1-nemotron-70b-instruct,
qwen/qwen-2.5-coder-32b-instruct
messages
object[]
required

A list of messages comprising the conversation so far

max_tokens
integer
default:
1024

The maximum number of tokens to generate

Example:

1024

temperature
number

Sampling temperature between 0 and 1

Required range: 0 <= x <= 1
stream
boolean
default:
false

Whether to stream back partial progress

Response

200
application/json
Successful completion response
id
string

Unique identifier for the completion

object
enum<string>
Available options:
chat.completion
created
integer

Unix timestamp of when the completion was created

model
string

Model ID used

choices
object[]