# API Reference

DevToks exposes a verified OpenAI-compatible API surface for chat, Responses, Claude Messages, embeddings, rerank, media generation, asset management, and model discovery.

> Reference

## Navigation

- [Base URL](#base-url)
- [Auth](#authentication)
- [Requests](#request-format)
- [Examples](#examples)
- [Parameters](#parameters-responses)
- [Chat](#chat)
- [Responses](#responses)
- [Messages](#messages)
- [Embeddings](#embeddings)
- [Rerank](#rerank)
- [Images](#images)
- [Audio](#audio)
- [Video](#video)
- [Moderation](#moderation)
- [Realtime](#realtime)
- [MCP](#mcp)
- [Assets](#assets)
- [Models](#model-discovery)
- [Balance](#balance)
- [Special](#special)
- [Streaming](#streaming)
- [Errors](#errors)
- [Unsupported](#unsupported)

## Base URL

Use the SDK base URL for OpenAI-compatible SDKs. The endpoint tables below show full HTTP paths under the API origin, so do not add /v1 twice.

- OpenAI-compatible SDK base URL: `https://api.devtoks.com/v1`
- HTTP API origin: `https://api.devtoks.com`
- Example: with the SDK base URL shown here, POST /v1/chat/completions is called as /chat/completions from OpenAI SDK clients.

## Authentication

Authenticate every API request with a project API key.

```http
Authorization: Bearer sk-your-key
```

- Recommended header: Authorization: Bearer sk-your-key.
- Anthropic-compatible clients may also send X-Api-Key: sk-your-key.
- Never expose API keys in browser code, mobile apps, public repositories, or client-side logs.

## Request Contract

Requests and responses use JSON unless a specific endpoint requires multipart upload, binary download, Server-Sent Events, or WebSocket upgrade semantics.

- Send Content-Type: application/json for JSON requests.
- Use stream: true on supported generation endpoints to receive Server-Sent Events.
- Use GET /v1/models with the same API key to discover the exact model IDs available to that key.
- Provider-specific features depend on the selected model and channel capability; unsupported options return a structured error.

## Verified Examples

These examples use the same public routes registered by the API gateway.

### Chat Completions

OpenAI SDK-compatible request for text generation.

```bash
curl https://api.devtoks.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-key" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "user", "content": "Say hello in one sentence."}
    ]
  }'
```

### Responses API

OpenAI Responses request. Native-only actions require a native Responses channel.

```bash
curl https://api.devtoks.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-key" \
  -d '{
    "model": "gpt-4.1",
    "input": "Summarize the benefits of API routing."
  }'
```

### Claude Messages

Anthropic Messages format routed through the same API key.

```bash
curl https://api.devtoks.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-key" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Write a concise integration checklist."}
    ]
  }'
```

### Model Discovery

Use this before production rollout to confirm model IDs and permissions.

```bash
curl https://api.devtoks.com/v1/models \
  -H "Authorization: Bearer sk-your-key"
```

### Balance Snapshot

Check account quota, API key quota, and the effective available balance for this key.

```bash
curl https://api.devtoks.com/v1/token/balance \
  -H "Authorization: Bearer sk-your-key"
```

## Endpoints / Parameters and Responses

The gateway accepts the public compatibility contracts below. Unknown or provider-specific fields are forwarded only when supported by the selected model, channel, and sanitizer rules.

### Chat & Text Completions


- `POST /v1/chat/completions`: Chat completions — primary endpoint
- `POST /v1/completions`: Legacy text completions
- `POST /v1/edits`: Legacy edits endpoint

#### Chat Completions — POST /v1/chat/completions

OpenAI-compatible chat request body.

**Required**

- model: string
- messages: array of { role, content }

**Common optional fields**

- max_tokens or max_completion_tokens
- temperature, top_p, top_k, stop
- stream, stream_options.include_usage
- tools, tool_choice, parallel_tool_calls
- response_format, reasoning_effort, verbosity
- metadata, user, service_tier, seed

**Successful response**

- Non-streaming responses use choices[].message, choices[].finish_reason, model, created, and usage.
- Streaming responses use Server-Sent Events with choices[].delta and optional usage chunks.

**Notes**

- Values are validated for known ranges such as penalties, token limits, service tier, and reasoning effort.

### Responses API

Create Responses API calls. Retrieval, input item, input token, cancel, delete, and compact actions require native Responses API upstream support.

- `POST /v1/responses`: Create a response
- `POST /v1/responses/compact`: Compact a response context (native Responses only)
- `POST /v1/responses/input_tokens`: Count input tokens (native Responses only)
- `GET /v1/responses/:id`: Retrieve a response
- `GET /v1/responses/:id/input_items`: List response input items (native Responses only)
- `DELETE /v1/responses/:id`: Delete a response
- `POST /v1/responses/:id/cancel`: Cancel in-progress response

#### Responses API — POST /v1/responses

OpenAI Responses contract. Native channels pass through Responses payloads; other compatible channels may use the chat fallback.

**Required**

- model: string
- input or prompt

**Common optional fields**

- instructions, previous_response_id, store, background
- max_output_tokens, temperature, top_p
- tools, tool_choice, parallel_tool_calls, max_tool_calls
- reasoning, text, include, truncation
- stream, metadata, user, service_tier, safety_identifier

**Successful response**

- Native Responses channels return upstream Responses API objects or typed SSE events.
- Fallback channels return a Responses-shaped object converted from Chat Completions output.

**Notes**

- Retrieve, input_items, input_tokens, cancel, delete, and compact actions require native Responses API upstream support.

### Messages (Anthropic Native)

Use Claude models with the Anthropic Messages format directly.

- `POST /v1/messages`: Anthropic-compatible Messages endpoint

#### Claude Messages — POST /v1/messages

Anthropic Messages-compatible request body.

**Required**

- model: string
- max_tokens: number
- messages: array of { role, content }

**Common optional fields**

- system, temperature, top_p, top_k
- stream, stop_sequences
- tools, tool_choice, thinking
- reasoning_effort, effort, output_config, metadata

**Successful response**

- JSON responses use id, type, role, model, content[], stop_reason, stop_sequence, and usage.
- Streaming responses use Anthropic Messages SSE events.

### Embeddings


- `POST /v1/embeddings`: Create text embeddings
- `POST /v1/engines/:model/embeddings`: Legacy path (OpenAI v1 compat)

#### Embeddings — POST /v1/embeddings

OpenAI-compatible embeddings request body.

**Required**

- model: string
- input: string or array

**Common optional fields**

- encoding_format
- dimensions
- user

**Successful response**

- Responses use object, data[].embedding, data[].index, model, and usage.
- Numeric embedding arrays and base64-encoded embedding payloads are normalized for compatible clients.

### Rerank


- `POST /v1/rerank`: Rerank documents by relevance
- `POST /v2/rerank`: V2 endpoint (Cohere-compatible)

#### Rerank — POST /v1/rerank and /v2/rerank

Canonical rerank request body with Cohere-style compatibility.

**Required**

- model: string
- query: string
- documents: string[]

**Common optional fields**

- top_n
- max_tokens_per_doc
- priority
- input as legacy query alias

**Successful response**

- Responses are provider-shaped rerank results containing ranked document indexes and relevance scores when supported upstream.

### Images

Requires a channel with image generation capability.

- `POST /v1/images/generations`: Generate images from text
- `POST /v1/images/edits`: Edit an existing image

#### Images — POST /v1/images/generations and /v1/images/edits

Image generation uses JSON or form fields. Image edits require multipart form upload.

**Required**

- generations: prompt
- edits: image, mask, model, prompt

**Common optional fields**

- model, n, size, quality, style
- response_format, aspect_ratio, output_format, background
- moderation, user, image_prompt, input_fidelity

**Successful response**

- Responses use created, data[] with url or b64_json, revised_prompt when provided, and usage when upstream supplies it.

### Audio

Requires a channel with audio capability.

- `POST /v1/audio/speech`: Text-to-speech (TTS)
- `POST /v1/audio/transcriptions`: Audio to text (Whisper)
- `POST /v1/audio/translations`: Transcribe and translate to English

#### Audio — speech, transcriptions, translations

Speech uses JSON. Transcription and translation use multipart form upload.

**Required**

- speech: model, input, voice
- transcriptions/translations: file, model

**Common optional fields**

- speech: speed, response_format
- transcriptions/translations: prompt, response_format, temperature
- transcriptions: language, timestamp_granularity

**Successful response**

- Speech returns audio bytes in the requested format when supported.
- Transcription and translation return text, JSON, verbose_json, srt, or vtt according to response_format.

### Video

Requires a channel with video generation capability.

- `POST /v1/videos`: Submit a video task
- `GET /v1/videos`: List video tasks
- `GET /v1/videos/:id`: Get task status
- `GET /v1/videos/:id/content`: Download completed video
- `DELETE /v1/videos/:id`: Delete a video task
- `POST /v1/video/generations`: Legacy video generation submit path
- `GET /v1/video/generations/:id`: Legacy video generation status path

#### Video — /v1/videos and legacy /v1/video/generations

Video tasks are asynchronous on supported video channels.

**Required**

- model and/or prompt, depending on selected channel

**Common optional fields**

- seconds, duration, duration_seconds
- size, resolution, aspect_ratio
- remix_id, reference_id, reference_assets
- generate_audio, seed, return_last_frame, callback_url

**Successful response**

- Create and status responses are provider-compatible task objects.
- GET /v1/videos/:id/content streams completed video content when available.

### Moderation


- `POST /v1/moderations`: Classify text for policy violations

#### Moderation — POST /v1/moderations

OpenAI-compatible moderation request body.

**Required**

- model: string
- input: string or array

**Common optional fields**

- metadata or provider-specific moderation options when supported

**Successful response**

- Responses contain provider-compatible moderation results and category scores when the upstream supplies them.

### Realtime


- `GET /v1/realtime`: WebSocket for real-time sessions
- `GET /v1/responses`: WebSocket upgrade for native Responses sessions

### MCP (Model Context Protocol)


- `POST /mcp`: MCP proxy — route tool calls

#### MCP — POST /mcp

Model Context Protocol proxy endpoint.

**Required**

- Provider-specific JSON-RPC or tool payload

**Common optional fields**

- Tool call metadata supported by the selected upstream channel

**Successful response**

- MCP responses follow the selected upstream capability and are returned in the provider-compatible shape.

### Asset Management

Manage Volcengine video reference assets (images). Requires a channel with asset management capability. All endpoints use POST with a JSON body.

- `POST /volc/asset/CreateAssetGroup`: Create an asset group
- `POST /volc/asset/CreateAsset`: Upload a reference asset to a group
- `POST /volc/asset/ListAssetGroups`: List your asset groups
- `POST /volc/asset/ListAssets`: List assets in a group
- `POST /volc/asset/GetAssetGroup`: Get asset group details
- `POST /volc/asset/GetAsset`: Get a single asset
- `POST /volc/asset/UpdateAssetGroup`: Rename or update an asset group
- `POST /volc/asset/UpdateAsset`: Update asset metadata

#### Assets — POST /volc/asset/*

Volcengine reference asset management for video-capable channels.

**Required**

- Asset group or asset JSON payload required by the selected asset operation

**Common optional fields**

- Asset metadata, group metadata, pagination, or update fields depending on operation

**Successful response**

- Asset management responses follow the selected upstream capability.

### Models


- `GET /v1/models`: List all available models
- `GET /v1/models/:model`: Retrieve model details

#### Models — GET /v1/models

Model discovery for the current API key.

**Required**

- No request body

**Common optional fields**

- GET /v1/models/:model returns one model detail when available

**Successful response**

- Models return the key-filtered model list or model detail.

### Balance

Read account, API key, and effective quota for the current API key. This is the same balance check shown in Quickstart.

- `GET /v1/token/balance`: Return a read-only balance snapshot for the authenticated API key
- `GET /api/token/balance`: Compatibility route for the same balance snapshot

#### Balance — GET /v1/token/balance

Read-only balance snapshot for the API key used to authenticate the request.

**Required**

- Authorization: Bearer sk-your-key

**Common optional fields**

- No query parameters or request body

**Successful response**

- The response envelope is { success, message, data }.
- data.object is balance_snapshot and legacy fields include remain_quota, used_quota, and unlimited_quota.
- data.account contains quota, used_quota, quota_per_credit, and credits.
- data.api_key contains public_id, name, status, remain_quota, used_quota, unlimited_quota, and expired_time.
- data.effective contains can_submit, available_quota, limited_by, and reason.
- data.display contains credit-denominated account_balance, api_key_balance, and available_balance.

**Notes**

- GET /api/token/balance is also registered for compatibility; OpenAI-compatible clients normally use /v1/token/balance through the configured SDK base URL.
- The endpoint never returns the secret API key value.
- limited_by is account, api_key, or none; reason is account_insufficient, api_key_exhausted, or empty when submission is allowed.

### Special Endpoints


- `POST /api/paas/v4/layout_parsing`: Zhipu OCR / document parsing

#### OCR — POST /api/paas/v4/layout_parsing

Zhipu OCR and document layout parsing compatibility endpoint.

**Required**

- Provider-specific JSON payload or document reference

**Common optional fields**

- request_id, user_id, page range, crop, and visualization flags

**Successful response**

- OCR responses follow the selected upstream capability and document parsing format.

Endpoint availability is enforced by API key permissions, model access, channel capability, and account balance. A registered route can still return an error when the selected model or channel does not support that operation.

## Streaming and Realtime

Streaming is supported on compatible text, Responses, Claude Messages, and media routes when the upstream channel supports the requested mode.

- For Chat Completions and Responses, set stream: true to receive Server-Sent Events.
- Claude Messages streaming returns Anthropic-shaped SSE events.
- GET /v1/realtime and WebSocket upgrades on /v1/responses are WebSocket flows, not normal JSON GET requests.
- If a stream has already started and an error occurs, the gateway sends a protocol-appropriate SSE error event instead of corrupting the stream with a JSON body.

## Model Discovery

The model list is dynamic and filtered by the API key, user entitlement, channel status, and configured model mappings. Treat GET /v1/models as the source of truth for customer integrations.

## Special Compatibility

### Claude Code Path Normalization

Claude Code may send Messages API requests through different path prefixes. DevToks rewrites all of the following to POST /v1/messages:

- `/openai/v1/messages`
- `/v1/v1/messages`
- `/openai/v1/v1/messages`
- `/api/v1/v1/messages`

No client-side configuration required.

### API Format Auto-Detection

If a request body is sent to the wrong endpoint, for example a Responses API payload to /v1/chat/completions, DevToks detects the mismatch and routes it to the correct handler by default. Deployments may be configured to return a 302 redirect instead.

## Error Codes

Relay endpoints return sanitized public error messages. Most JSON errors use the OpenAI-compatible { error: { message, type, param, code } } envelope; Claude Messages JSON errors use the Anthropic { type: "error", error: { type, message } } envelope.

- `400`: Invalid request, unsupported parameter, unsupported channel capability, or model mismatch
- `401`: Authentication failed. Check the API key and header format
- `403`: Permission denied or account balance is insufficient
- `404`: The requested resource or route was not found
- `408/504`: The request timed out
- `413`: The request body or model input is too large
- `429`: Rate limit, concurrency limit, or service busy condition
- `500/502/503`: Temporary service or upstream provider failure

### Error Shape

```json
{
  "error": {
    "message": "The request is too long for the selected model.",
    "type": "invalid_request_error",
    "param": "",
    "code": "context_window_exceeded"
  }
}
```

### Retry Guidance

- Retry 408, 429, 500, 502, 503, and 504 with exponential backoff and jitter.
- Do not retry 400, 401, 403, 404, or 413 until the request, credentials, permissions, balance, or input size is corrected.
- Include the request ID from the error message or response headers when contacting support.

## Registered but Unsupported

The following OpenAI-compatible route families are intentionally registered for clear 501 responses, but should not be documented as supported product features.

- `/v1/files`
- `/v1/fine_tuning/jobs`
- `/v1/assistants`
- `/v1/threads`
- `/v1/images/variations`
- `DELETE /v1/models/:model`

## Links

- [Quickstart Guide](https://devtoks.com/docs/quickstart)
- [Dashboard](https://devtoks.com/dashboard)