# fp16.cloud — Agent API Guide

> **Audience:** AI agents, coding assistants, and automation scripts integrating with fp16.cloud.  
> **Human-readable reference:** https://fp16.cloud/docs  
> **API base:** `https://api.fp16.cloud`  
> **Site:** `https://fp16.cloud`

This document is the canonical machine-oriented overview. Prefer `/v1/*` routes with an API key. Do not scrape the React SPA for API details.

---

## Authentication

| Context | How to authenticate |
|--------|---------------------|
| **Programmatic (`/v1/*`)** | `Authorization: Bearer fp16_…` or `X-API-Key: fp16_…` on every request except health |
| **Browser app (`/api/*`)** | Session cookie + Cloudflare Turnstile on POST; no API key required while signed in |

Get an API key: sign up at https://fp16.cloud/signup → https://fp16.cloud/account → create key (shown once).

```
Authorization: Bearer fp16_YOUR_API_KEY
```

---

## Health

```
GET https://api.fp16.cloud/v1/health
```

No auth. Returns `{ "status": "ok", "service": "api.fp16.cloud", "version": "v1" }`.

---

## Tokenizer Compare

Compare token counts across Hugging Face tokenizers.

| Method | Path | Body |
|--------|------|------|
| POST | `/v1/tokenizer/compare` | `{ "model": "Qwen/Qwen3-8B", "text": "…" }` |
| GET | `/v1/tokenizer/references` | — community models ranked by usage |
| GET | `/v1/tokenizer/cache` | — models cached on server |

- `model`: HF repo ID or huggingface.co URL (max 256 chars)
- `text`: up to 100,000 chars

---

## Embeddings

Fixed model: **Qwen/Qwen3-Embedding-0.6B** (1024 dimensions, L2-normalized).

| Method | Path | Body |
|--------|------|------|
| GET | `/v1/embeddings/info` | worker status |
| POST | `/v1/embeddings` | `{ "texts": ["…", "…"] }` — 1–50 strings, max 8192 chars each |
| POST | `/v1/embeddings/similarity` | `{ "textA": "…", "textB": "…" }` → `{ "similarity": 0.0–1.0 }` |

---

## YOLO (Computer Vision)

Ultralytics YOLO — detect, segment, classify, pose, obb.

| Method | Path | Notes |
|--------|------|-------|
| GET | `/v1/yolo/catalog?task=detect` | List tasks and model IDs |
| POST | `/v1/yolo/infer` | `multipart/form-data` |

**Infer fields:**

- `image` (file, required) — JPEG/PNG/WebP/GIF, max 10 MB
- `task` (required) — `detect` \| `segment` \| `classify` \| `pose` \| `obb`
- `models` (required) — JSON array of catalog IDs, max 10, e.g. `["yolo26-n-detect"]`
- `stream` (optional) — `true` → NDJSON stream (`{"type":"result",…}` per line)

---

## Diffusion Timelapse

Model: **Sana 600M** (`Efficient-Large-Model/Sana_600M_1024px_diffusers`). Output 1024×1024.

| Method | Path | Notes |
|--------|------|-------|
| GET | `/v1/diffusion/info` | steps, queue stats, worker status |
| GET | `/v1/diffusion/jobs/{jobId}/frames/{step}/hq?token=…` | full-quality JPEG for one step |
| WS | `wss://api.fp16.cloud/v1/diffusion/ws` | timelapse generation |

**WebSocket flow**

1. Connect with `Authorization: Bearer fp16_…` on the upgrade.
2. Send: `{ "type": "generate", "prompt": "…", "steps": 20, "seed": 42 }`
   - `steps`: 1–50 (default 20)
   - `seed`: optional integer
3. Events (JSON):
   - `{ "type": "queue", "position": 1, "total": 3 }` — waiting for GPU
   - `{ "type": "meta", "jobId", "frameToken", "steps", "seed?" }` — job started
   - `{ "type": "frame", "step", "totalSteps", "previewBase64" }` — low-res JPEG (~256px) per denoising step
   - `{ "type": "done", "inferenceMs", "seed", "totalSteps" }`
   - `{ "type": "error", "message" }`
4. Fetch HQ frame: `GET /v1/diffusion/jobs/{jobId}/frames/{step}/hq?token={frameToken}`

**Limits:** 4 generations/min, 20/hour, 200/day per key. One global GPU job at a time; excess requests queue FIFO.

---

## Rate limits (summary)

| Service | Typical limit |
|---------|----------------|
| Diffusion generate | 4/min, 20/hour, 200/day per key |
| Diffusion HQ fetch | 60/min per job |
| `/v1` per key | 120 req/min |
| Tokenizer compare | 20/min |
| YOLO infer | 16/min |
| Embeddings | 20/min |

On limit: HTTP 429 with retry guidance. Diffusion WebSocket may close with 4429.

---

## Errors

| Code | Meaning |
|------|---------|
| 400 | Invalid request body or parameters |
| 401 | Missing or invalid API key |
| 403 | Revoked key, Turnstile failure (browser), or unauthorized resource |
| 404 | Unknown route or expired diffusion job |
| 422 | Tokenizer/model could not be loaded |
| 429 | Rate limited |
| 503 | GPU worker unavailable |

JSON errors often look like: `{ "error": "…", "message": "…" }`.

---

## Quick examples

**Embeddings**

```bash
curl -s https://api.fp16.cloud/v1/embeddings \
  -H "Authorization: Bearer fp16_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"texts":["Hello world"]}'
```

**YOLO detect**

```bash
curl -s https://api.fp16.cloud/v1/yolo/infer \
  -H "Authorization: Bearer fp16_YOUR_API_KEY" \
  -F "image=@photo.jpg" \
  -F "task=detect" \
  -F 'models=["yolo26-n-detect"]'
```

**Diffusion info**

```bash
curl -s https://api.fp16.cloud/v1/diffusion/info \
  -H "Authorization: Bearer fp16_YOUR_API_KEY"
```

---

## Version

API version: **1.0.0**  
Last updated: 2026-05-25

For interactive examples and full response schemas, see https://fp16.cloud/docs .