Complete API reference for the Distributed Compute Platform. Base URL: /api/v1
Four authentication methods are used across different API areas:
X-API-Key: your-api-keyX-Internal-Token: generated-token (persisted in data/internal_token)X-Admin-Key: admin-secret (from ADMIN_KEY env var)X-SDK-Token: session-token (issued on device registration)
Default API Keys:
test-api-key - General testingdashboard-api-key - Dashboard accessbenchmark-api-key - Benchmark operationsInternal endpoints for service-to-service communication. Authenticated via X-Internal-Token header.
Returns per-SDK-UUID compute usage stats (successful tasks only), bucketed by hour. Rolling 3-day window. Only hours with actual activity are included.
| Field | Type | Description |
|---|---|---|
| from required | string | Start hour in format YYYY-MM-DDTHH |
| to | string | End hour (defaults to current hour if omitted) |
{
"from": "2026-03-15T21",
"to": "2026-03-16T07",
"data": [
{ "uuid": "sdk-python-c5d145", "hour": "2026-03-15T21", "cpu_ms": 0, "gpu_ms": 72917, "cpu_tasks": 0, "gpu_tasks": 1867 }
]
}Same format as above, but for failed tasks only. Tracked separately from successful task stats.
Health check endpoints do not require authentication.
Returns basic health status and version information.
{
"status": "healthy",
"version": "1.1.0-abc1234",
"timestamp": "2026-02-08T10:30:00.000Z"
}Returns detailed health information including all service statuses and resource counts.
{
"status": "healthy",
"services": {
"device_registry": { "status": "healthy", "devices": 5 },
"resource_pool": { "status": "healthy", "available": 3 },
"job_manager": { "status": "healthy", "active_jobs": 2 },
"model_store": { "status": "healthy", "models": 4 },
"websocket": { "status": "healthy", "connections": 5 }
},
"resources": { ... },
"jobs": { ... }
}Upload, manage, and retrieve ML models. Models are automatically distributed to SDK devices when needed.
Upload an ONNX or TorchScript model file. Use multipart/form-data encoding. Specify built-in preprocessing and/or postprocessing adapters to enable server-side format conversion via AWS Lambda.
| Field | Type | Description |
|---|---|---|
| model required | file | Model file (.onnx or .pt) |
| name | string | Model name (defaults to filename) |
| format | string | "onnx" or "torchscript" |
| input_shape | json | Input tensor shape, e.g., [1, 10] |
| output_shape | json | Output tensor shape, e.g., [1, 2] |
| input_schema | json | JSON Schema for input validation |
| output_schema | json | JSON Schema for output documentation |
| labels | json | Label names array, e.g., ["cat","dog"] — used by adapters |
| preprocessing | string | Built-in adapter: image_classification, tabular, text_tokens, or passthrough |
| postprocessing | string | Built-in adapter: top_k_labels, binary_classification, regression, or passthrough |
{
"id": "model_abc12345",
"name": "my-model",
"format": "onnx",
"size": 4096,
"checksum": "sha256:...",
"uploaded_at": "2026-02-08T10:30:00.000Z",
"preprocessing": "image_classification",
"postprocessing": "top_k_labels",
"input_schema": { ... },
"labels": ["cat", "dog"]
}Returns all models owned by the authenticated user.
{
"models": [
{ "id": "model_abc", "name": "my-model", ... }
],
"total": 1
}Returns detailed information about a specific model.
Deletes a model. Returns 204 No Content on success.
Update input/output schema, labels, or adapter selections on an existing model without re-uploading. Only provided fields are modified.
| Field | Type | Description |
|---|---|---|
| input_schema | object | JSON Schema for input validation |
| output_schema | object | JSON Schema for output documentation |
| labels | array | Label names |
| preprocessing | string | Built-in preprocessing adapter name (or null) |
| postprocessing | string | Built-in postprocessing adapter name (or null) |
Returns updated model info.
Submit inference requests in synchronous or asynchronous mode. Batch inference is also supported.
Submit a single inference request. By default runs synchronously (waits for result). Use async mode for fire-and-forget.
If the model has an input_schema, the input is validated against it (returns 400 on mismatch). If the model has a postprocessing adapter (other than passthrough), the result is the business format produced by the dcmp-postprocess Lambda (e.g. { predictions: [...] }) instead of raw {data, shape}. When preprocessing is set, input is expected in the business format the adapter consumes.
Note: adapter conversion currently applies only to synchronous single inference. Asynchronous calls (async=true) and POST /inference/batch return raw {data, shape} regardless of adapter configuration.
| Field | Type | Description |
|---|---|---|
| model_id required | string | ID of the model to use |
| input required | object | Input tensor: { "data": [...], "shape": [1, 10] } |
| async | boolean | Run asynchronously (default: false) |
| project_id | string | Associate with a project for tracking |
| options.timeout | number | Timeout in milliseconds |
| options.prefer_gpu | boolean | Prefer GPU devices |
Async Mode: Set via query param ?async=true, header X-Async: true, or body field "async": true
{
"job_id": "infer_abc123",
"status": "completed",
"result": { "data": [0.8, 0.2], "shape": [1, 2] },
"latency_ms": 45,
"device_id": "dev_xyz789"
}{
"job_id": "infer_abc123",
"status": "queued",
"message": "Inference job queued",
"poll_url": "/api/v1/inference/infer_abc123"
}Submit multiple inference requests as a batch. Always runs asynchronously.
Adapter limitation: the batch route does not currently invoke preprocessing / postprocessing adapters. Inputs must be raw {data, shape} tensors and results are returned as raw tensors, even for models that declare adapter fields.
| Field | Type | Description |
|---|---|---|
| model_id required | string | ID of the model to use |
| inputs required | array | Array of input tensors |
| name | string | Batch job name |
| project_id | string | Project ID (auto-generated if not provided) |
{
"job_id": "batch_abc123",
"project_id": "batch_1707384600000",
"status": "queued",
"total_tasks": 100
}Get the status and results of an inference job. For batch jobs, includes progress and partial results.
Cancel a running or queued inference job.
Start and manage distributed training jobs across multiple devices.
Start a distributed training job. The system automatically distributes batches across available devices and aggregates gradients.
| Field | Type | Description |
|---|---|---|
| model_id required | string | ID of the model to train |
| data_config required | object | Data configuration (see below) |
| training_config required | object | Training configuration (see below) |
| resource_config | object | Resource requirements |
| Field | Type | Description |
|---|---|---|
| type | string | "stream" (generated data) or "dataset" |
| total_samples | number | Total training samples |
| batch_size | number | Batch size per device |
| Field | Type | Description |
|---|---|---|
| epochs | number | Number of epochs |
| learning_rate | number | Learning rate (e.g., 0.001) |
| optimizer | string | "sgd" or "adam" |
| target_accuracy | number | Target accuracy for early stopping (e.g., 0.95) |
| sync_mode | string | FedAvg sync mode: "async" (default), "semi_sync", "full_sync" |
| local_steps | number | Batches before weight sync (default: 50) |
| sync_interval_ms | number | Weight pull interval in ms (default: 60000) |
| Field | Type | Description |
|---|---|---|
| min_devices | number | Minimum devices required (default: 1) |
| max_devices | number | Maximum devices to use |
| prefer_gpu | boolean | Prefer GPU devices |
| sdk_types | array | Filter by SDK type, e.g. ["python"] |
| min_sdk_version | string | Minimum SDK version, e.g. "1.1.6" |
Devices are allocated proportional to job weight (total_samples * epochs). If no devices are free, the job queues until the scheduler rebalances and drains a device from an over-allocated job.
{
"job_id": "train_abc123",
"status": "queued",
"estimated_devices": 3
}Get detailed status of a training job including progress, loss, and assigned devices.
{
"job_id": "train_abc123",
"status": "running",
"model_id": "model_xyz",
"config": { "epochs": 10, "learning_rate": 0.001 },
"progress": {
"current_epoch": 3,
"total_epochs": 10,
"batches_completed": 150,
"total_batches": 500,
"percent_complete": 30,
"current_loss": 0.4523,
"best_loss": 0.4102
},
"assigned_devices": ["dev_a", "dev_b", "dev_c"]
}Pause a running training job. Can be resumed later.
Resume a paused training job.
Cancel a training job. Stops all distributed training tasks.
View connected SDK devices and their status.
Returns all connected devices with their status and capabilities.
| Field | Type | Description |
|---|---|---|
| status | string | Filter by status: "idle", "busy", "offline" |
Returns aggregated statistics about connected devices.
Returns detailed information about a specific device including hardware, capabilities, and metrics.
Requests logs from a connected SDK device via WebSocket. Returns the device's recent log entries.
| Field | Type | Description |
|---|---|---|
| lines | number | Number of log lines (default: 100) |
| level | string | Filter by level (default: "all") |
| timeout | number | Timeout in ms (default: 30000) |
Group related inference or training tasks into projects for tracking and analytics.
Create a new project for tracking tasks.
| Field | Type | Description |
|---|---|---|
| name | string | Project name |
| type | string | "realtime", "batch", "training", or "benchmark" |
| model_id | string | Associated model ID |
| total_tasks | number | Expected total tasks (for progress tracking) |
Returns all projects for the authenticated user.
Returns full project details including all metrics.
Returns computed analytics including latency percentiles, throughput data, and instance type comparisons.
Manually mark a realtime project as complete.
Cancel a project and all associated jobs.
Track compute usage for billing purposes. Tracks CPU/GPU time, task counts, and data transfer.
Returns total compute usage for the authenticated customer including GPU/CPU hours, task counts, and estimated cost.
Returns compute usage for a specific project.
Returns hourly usage data for time-series analysis.
| Field | Type | Description |
|---|---|---|
| hours | number | Hours to return (default: 24, max: 72) |
Returns usage data for all customers. Requires X-Admin-Key header.
Training efficiency metrics: time-to-accuracy (TTA), scaling efficiency, and power consumption.
Returns aggregate training efficiency statistics.
| Field | Type | Description |
|---|---|---|
| instance_type | string | "aws", "user", or omit for both |
Returns side-by-side comparison of AWS instances vs user-contributed devices.
Chart-ready data. Types: tta, scaling, power, throughput.
Returns currently active training/inference runs with real-time metrics.
Detailed efficiency metrics for a specific job.
Specialized endpoints for the web dashboard with aggregated data.
Returns device statistics grouped by platform (nodejs, python, windows).
Returns jobs from the last 5 days, sorted by activity.
| Field | Type | Description |
|---|---|---|
| type | string | Filter by type: "inference" or "training" |
Returns up to 50 most recent projects, active first.
Server-Sent Events endpoint for real-time project updates. Pushes stats every 2 seconds.
| Field | Type | Description |
|---|---|---|
| format | string | "json" (default) or "text" (ANSI for CLI) |
| interval | number | Update interval in ms (default: 2000) |
Returns individual task records across all projects.
| Field | Type | Description |
|---|---|---|
| limit | number | Max tasks to return (default: 100) |
| project_id | string | Filter by project |
Returns time-bucketed throughput data for graphing.
| Field | Type | Description |
|---|---|---|
| project_id | string | Filter by project |
| bucket_size | number | Bucket size in seconds (default: 1) |
Training efficiency overview with recent TTA, scaling, and power data.
Training and inference comparison between AWS and user devices, formatted for dashboard charts.
Current cluster state: device allocation, CPU/GPU usage, and capacity metrics.
Start a quick inference or training benchmark.
| Field | Type | Description |
|---|---|---|
| mode | string | "inference", "training", or "gpu-training" |
| epochs | number | Training epochs (training mode only) |
| batch_size | number | Batch size (training mode only) |
Start a step-load performance benchmark with configurable RPS targets.
| Field | Type | Description |
|---|---|---|
| duration | number | Total duration in seconds (min: 60) |
| instance_type | string | "aws", "user", or "all" |
| rps_steps | array | RPS targets per step (default: [2,4,6,8,10,15,20,30,40,50]) |
| max_concurrency | number | Max concurrent requests (default: 5) |
Get status and results of a performance benchmark.
{
"error": {
"code": "ERROR_CODE",
"message": "Human-readable error message"
}
}Upload a model and run inference
curl -X POST http://localhost:3000/api/v1/models \ -H "X-API-Key: test-api-key" \ -F "model=@my_model.onnx" \ -F "name=my-classifier"
Response: {"id": "model_abc123", ...}
curl -X POST http://localhost:3000/api/v1/inference \
-H "X-API-Key: test-api-key" \
-H "Content-Type: application/json" \
-d '{
"model_id": "model_abc123",
"input": {
"data": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
"shape": [1, 10]
}
}'
Response: {"job_id": "infer_xyz", "result": {"data": [0.8, 0.2], "shape": [1, 2]}, "latency_ms": 45}
Create a project and submit batch inference with progress tracking
curl -X POST http://localhost:3000/api/v1/projects \
-H "X-API-Key: test-api-key" \
-H "Content-Type: application/json" \
-d '{
"name": "Image Classification Batch",
"type": "batch",
"model_id": "model_abc123",
"total_tasks": 100
}'
curl -X POST http://localhost:3000/api/v1/inference/batch \
-H "X-API-Key: test-api-key" \
-H "Content-Type: application/json" \
-d '{
"model_id": "model_abc123",
"project_id": "proj_xyz",
"inputs": [
{"data": [...], "shape": [1, 10]},
{"data": [...], "shape": [1, 10]}
]
}'
curl http://localhost:3000/api/v1/inference/batch_123 \ -H "X-API-Key: test-api-key"
curl http://localhost:3000/api/v1/projects/proj_xyz/analytics \ -H "X-API-Key: test-api-key"
Start a distributed training job and monitor progress
curl -X POST http://localhost:3000/api/v1/training \
-H "X-API-Key: test-api-key" \
-H "Content-Type: application/json" \
-d '{
"model_id": "model_abc123",
"data_config": {
"type": "stream",
"total_samples": 10000,
"batch_size": 32
},
"training_config": {
"epochs": 10,
"learning_rate": 0.001,
"optimizer": "adam"
},
"resource_config": {
"min_devices": 2,
"max_devices": 5,
"prefer_gpu": true
}
}'
Response: {"job_id": "train_abc", "status": "queued", "estimated_devices": 3}
curl http://localhost:3000/api/v1/training/train_abc \ -H "X-API-Key: test-api-key"
curl -X POST http://localhost:3000/api/v1/training/train_abc/pause \ -H "X-API-Key: test-api-key"
curl -X POST http://localhost:3000/api/v1/training/train_abc/resume \ -H "X-API-Key: test-api-key"
For continuous inference, associate requests with a project to track metrics, latency percentiles, and throughput
curl -X POST http://localhost:3000/api/v1/projects \
-H "X-API-Key: test-api-key" \
-H "Content-Type: application/json" \
-d '{"name": "Production API", "type": "realtime", "model_id": "model_abc"}'
curl -X POST http://localhost:3000/api/v1/inference \
-H "X-API-Key: test-api-key" \
-H "Content-Type: application/json" \
-d '{
"model_id": "model_abc",
"project_id": "proj_production",
"input": {"data": [...], "shape": [1, 10]}
}'
curl http://localhost:3000/api/v1/projects/proj_production/analytics \ -H "X-API-Key: test-api-key"
curl -X POST http://localhost:3000/api/v1/projects/proj_production/complete \ -H "X-API-Key: test-api-key"
Distributed Compute Platform API v1.1.33 | Dashboard | Health Check