Crun Chat Completions API
OpenAI-compatible /chat/completions endpoint for CRUN language models.
API Endpoint
Authentication
You can authenticate with either:Authorization: Bearer YOUR_API_KEYX-API-KEY: YOUR_API_KEY
Request Examples
Conversation History Example
CRUN does not manage multi-turn conversation context for your application. To continue a conversation, you must manage prior message history yourself and include the relevant history in each new request.Structured Output Example
Useresponse_format when you need the model to return JSON. For schema-enforced structured outputs, use a model and upstream provider that support json_schema.
Tool Calling Example
Chat Completions accepts OpenAI-compatibletools and tool_choice fields. Your application is responsible for executing tool calls and sending tool results back in a follow-up request.
Vision Input Example
When the selected model supports image input, pass multimodal content parts in the user message.Streaming Example
Setstream=true to receive Server-Sent Events. CRUN enables usage reporting for streaming requests when possible.
Response Examples
Notes
- Set
stream=trueto receive atext/event-streamresponse. - If you omit
stream_options.include_usage, CRUN enables it automatically for streaming requests. max_tokensandmax_completion_tokensare both accepted and are capped by the selected model’s output token limit.- Unknown model IDs return an OpenAI-style error body with
code: "model_not_found". - For new stateful, tool-heavy, or structured-output workflows, also consider the
/responsesendpoint. Responses usesinput,instructions, andtext.formatinstead ofmessagesandresponse_format.
Related Resources
Responses API
LLM Quickstart
Models Overview
Pricing
Authorizations
Use your CRUN API key as a Bearer token for OpenAI-compatible SDKs.
Body
OpenAI-compatible Chat Completions request. Additional compatible fields are accepted and passed through when supported by the upstream model.
Public model ID returned by GET /api/v1/models.
1 - 128"gpt-4o-mini"
Conversation messages in OpenAI format.
1Sampling temperature.
0 <= x <= 20.7
Nucleus sampling value.
0 <= x <= 11
Number of completion choices to generate.
1 <= x <= 81
Whether to return a Server-Sent Events stream.
false
Stop sequence or list of stop sequences.
"###"
Maximum output tokens. Capped by the selected model.
x >= 1512
Maximum completion tokens. Capped by the selected model.
x >= 1512
Presence penalty value.
-2 <= x <= 20
Frequency penalty value.
-2 <= x <= 20
End-user identifier passed for observability.
"user_123"
Streaming options. When stream=true, CRUN enables include_usage=true by default if omitted.
{ "include_usage": true }Structured output option in OpenAI-compatible format. Chat Completions uses response_format; Responses uses text.format.
{ "type": "json_object" }Tool definitions in OpenAI-compatible format. Your application executes tool calls and returns tool results in a follow-up request.
Tool selection strategy.
"auto"
Response
Successful completion response. Returns JSON when stream=false, or SSE when stream=true.
