From zero to production-ready in eight steps. Companion: API Reference, PHP Recipes.
Before wiring anything into Laravel, confirm the three pieces of connection data are in your hands and the service is actually reachable from where your app runs. The only external call in this section is a no-auth liveness probe, so you can run it from any shell that can reach the backend host.
| You need | Where it comes from |
|---|---|
BASE_URL | Crystallise ops — the host where the service is deployed (e.g. https://api.example.com). |
X-API-Key | One of the keys in the server's CRYSTALLISE_API_KEYS env var. Rotated per tenant. |
X-OpenAI-API-Key (optional) | Your own OpenAI key if you want per-tenant billing. Otherwise the server's env fallback is used. |
Any language works for the examples in this playbook — curl is the baseline. The parallel Recipes document assumes Laravel 10+ on PHP 8.2+ for the code snippets you'll paste into a real app later.
export BASE_URL=https://api.example.com export API_KEY=your-service-key export OPENAI_KEY=sk-... # only needed for live calls curl -s "$BASE_URL/health" # → {"status": "ok"} curl -s "$BASE_URL/health/ready" # → {"status": "ready", "checks": {"database": "ok", "openai_key": "configured"}}
/health is a pure liveness probe — it returns 200 even when OpenAI is unreachable. /health/ready is the one to wire your monitoring against: it flips to 503 with a degraded status if the DB or OpenAI key check fails. Both are public (no X-API-Key needed).
/health returns 200 with {"status": "ok"} from the machine your app will run on.X-API-Key and can paste it into $API_KEY without copy-paste errors.X-OpenAI-API-Key per request (per-tenant billing) or rely on the server's env var fallback.Mock mode is the fastest way to confirm the request/response shape and your polling loop without spending a cent on OpenAI. Every mutating endpoint accepts "mock": true — the service returns deterministic canned data and does not make an OpenAI call, so no X-OpenAI-API-Key is required. Submit, then poll until status flips from "pending" to "completed".
# Submit — minimal screening job with 2 papers, 1 criterion, mock mode curl -sX POST "$BASE_URL/v1/screening/jobs" \ -H "X-API-Key: $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "mock": true, "papers": [ {"id": "p1", "title": "RCT of drug X in adults", "abstract": "Randomized trial of 150 adults..."}, {"id": "p2", "title": "Case report on rare syndrome", "abstract": "A single patient with..."} ], "criteria": [{"name": "Population", "type": "include", "value": "Adults"}] }' # → {"job_id": "abc-123", "status": "pending", "progress": 0.0, "stage": ""}
# Poll — run every 1-2 seconds until status === "completed" curl -s "$BASE_URL/v1/screening/jobs/abc-123" -H "X-API-Key: $API_KEY" # → {"job_id": "abc-123", "status": "completed", "progress": 1.0, "stage": "assignment", # "results": [{"id": "p1", "final_score": 4.2, "cluster_id": 1, "reasoning": "..."}, # {"id": "p2", "final_score": 2.1, "cluster_id": 2, "reasoning": "..."}], # "clusters": [...], "duration_ms": 3421, "estimated_cost_usd": 0.0}
The POST returns a job_id immediately with status: "pending" — work happens asynchronously across four pipeline stages (labelling → reasoning → clustering → assignment). The GET response is the same shape regardless of current status; only results and clusters populate once status becomes "completed". In mock mode estimated_cost_usd is 0.0 and scores are deterministic canned values.
job_id and an initial status: "pending".status: "completed" within a few seconds.results[] has exactly 2 entries, one per paper you submitted.Same request, one header added, one flag flipped. A live call actually invokes OpenAI and costs a few cents per paper depending on model and repetitions. Scores now reflect the real content of each paper's title and abstract rather than canned data — expect them to vary between papers in a meaningful way.
curl -sX POST "$BASE_URL/v1/screening/jobs" \
-H "X-API-Key: $API_KEY" \
-H "X-OpenAI-API-Key: $OPENAI_KEY" \
-H "Content-Type: application/json" \
-d '{
"mock": false,
"papers": [
{"id": "p1", "title": "RCT of drug X in adults", "abstract": "Randomized trial of 150 adults..."},
{"id": "p2", "title": "Case report on rare syndrome", "abstract": "A single patient with..."}
],
"criteria": [{"name": "Population", "type": "include", "value": "Adults"}]
}'
# → {"job_id": "def-456", "status": "pending", "progress": 0.0, "stage": ""}
curl -s "$BASE_URL/v1/screening/jobs/def-456" -H "X-API-Key: $API_KEY" # → {"status": "completed", "progress": 1.0, "stage": "assignment", # "results": [{"id": "p1", "final_score": 4.6, "cluster_id": 1, "reasoning": "Clear RCT..."}, # {"id": "p2", "final_score": 1.8, "cluster_id": 2, "reasoning": "Case report..."}], # "duration_ms": 12840, "estimated_cost_usd": 0.018}
final_score is a 1–5 mean across repetitions AI calls (5 by default — see backend-guide.html for how the pipeline computes it). Two differently-eligible papers should land at noticeably different scores; if both come back at the same value, double-check that mock really flipped to false. estimated_cost_usd will be a small non-zero number — this is the service's own pricing estimate, not the invoice OpenAI sends you.
status: "completed" with final_score values that vary per paper (not deterministic).estimated_cost_usd is non-zero and in a plausible range (fractions of a cent per paper for gpt-5-nano).The service classifies every error into one of five error_code buckets so client code can decide, per category, whether to retry, surface the message to the user, or abort. For synchronous endpoints the code appears in the HTTP body under detail.error_code; for async screening and indexer jobs it appears inside the job response (HTTP 200) as error_category when status === "failed". The two fields share the same taxonomy.
# Failed async job — poll response for a job whose OpenAI key was invalid curl -s "$BASE_URL/v1/screening/jobs/ghi-789" -H "X-API-Key: $API_KEY" # → {"job_id": "ghi-789", "status": "failed", # "error": "Invalid OpenAI key", # "error_category": "auth", # "error_retryable": false}
| error_code | Retryable | Recommended action |
|---|---|---|
auth | no | Surface to user / ops — the X-API-Key or X-OpenAI-API-Key is bad. Do not retry. |
rate_limit | yes | Retry with exponential backoff (e.g. 2s, 4s, 8s, 16s, then give up). |
validation | no | Surface to the caller — the request body is malformed. Fix and resubmit, don't retry blindly. |
server_restart | yes | Async-only; the service restarted mid-job. Resubmit the entire job. |
internal / unknown | — | Retry once with a delay, then surface. Usually a transient 500. |
Every branch of your integration that calls this service should pattern-match on error_code / error_category and route accordingly. Blind "retry everything three times" is the wrong default — hammering auth or validation failures wastes requests and delays the real fix.
rate_limit, server_restart) back off exponentially rather than retrying in a tight loop.auth, validation) surface immediately to the user or ops channel.Every Crystallise job should carry a project_id — an opaque integer from your side that correlates the Crystallise job back to the row in your app's database. The service uses this for two things: a one-active-job-per-project lock (returns 409 Conflict on concurrent submit), and the GET /v1/screening/active-job?project_id=X lookup so you can find an in-flight job after a client restart. Retention is your responsibility — pull the result JSON and store it before the service forgets.
# Second submit while a job is already running for project_id=42 curl -sX POST "$BASE_URL/v1/screening/jobs" \ -H "X-API-Key: $API_KEY" \ -H "Content-Type: application/json" \ -d '{"project_id": 42, "mock": true, "papers": [...], "criteria": [...]}' # → HTTP 409 # → {"detail": "Project 42 already has an active screening job"}
# Recovery: find the existing job and decide whether to wait or cancel curl -s "$BASE_URL/v1/screening/active-job?project_id=42" -H "X-API-Key: $API_KEY" # → {"job_id": "abc-123", "status": "running", ...} (or null if none)
On 409, the choice is yours: poll the existing job to completion and use its result, cancel and resubmit, or queue your new request in your own app and fire it when the existing job finishes. The server does not queue for you. The service's in-memory job store is ephemeral — a restart loses running jobs (you'll see error_category: "server_restart" on next poll) and old completed jobs are garbage-collected over time. Persist the results payload on your side the moment a job completes.
project_id from your app's schema.results JSON in its own database as soon as a job reaches status: "completed".Three levers control spend: a preflight cost estimate, a hard ceiling carried in the POST, and per-tenant billing via the OpenAI key header. Use all three on any production path. The numbers the service reports are approximate — the pricing table is hardcoded in crystallise.llm.cost.DEFAULT_PRICING_PER_1M and can drift from OpenAI's current rates by single-digit percent.
# Preflight estimate — no papers submitted, just sizing curl -sX POST "$BASE_URL/v1/screening/estimate" \ -H "X-API-Key: $API_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-5-nano", "papers_count": 500, "repetitions": 5, "criteria_count": 10}' # → {"estimated_input_tokens": 775000, "estimated_output_tokens": 75000, # "estimated_cost_usd": 0.0625, "confidence": "approximate", # "disclaimer": "Estimate based on empirical averages. Actual cost may vary +-30%..."}
# Submit with a hard cap — service rejects with 400 if the live estimate exceeds it curl -sX POST "$BASE_URL/v1/screening/jobs" \ -H "X-API-Key: $API_KEY" \ -H "X-OpenAI-API-Key: $OPENAI_KEY" \ -H "Content-Type: application/json" \ -d '{"papers": [...], "criteria": [...], "max_estimated_cost_usd": 0.50}' # → 400: {"detail": "Estimated cost $1.23 exceeds max_estimated_cost_usd=$0.50"}
Run the estimate first for any job whose size isn't bounded by hand, then pass a max_estimated_cost_usd that gives you some headroom over the estimate but aborts on pathological inputs (a 50,000-paper submission by accident, say). For user-triggered jobs, forward the tenant's OpenAI key as X-OpenAI-API-Key so the charge lands on their invoice rather than yours.
estimated_cost_usd — both pre- and post-job — as a planning figure, not a billing line. Reconcile monthly against the OpenAI invoice. If the drift is more than a few percent, flag it to the backend team so the pricing table can be refreshed.max_estimated_cost_usd cap sized against the preflight estimate.X-OpenAI-API-Key per request so the charge flows to the right tenant.estimated_cost_usd across jobs against OpenAI's invoice at least monthly.Run everything in mock: true by default in CI. Mock mode exercises the full request/response/polling contract without an OpenAI call, so it's free, fast, and deterministic. Save real OpenAI traffic for a single nightly smoke test against a canary, not for every PR.
# PR test — mock only, no OPENAI_KEY set curl -sX POST "$BASE_URL/v1/screening/jobs" \ -H "X-API-Key: $API_KEY" \ -H "Content-Type: application/json" \ -d '{"mock": true, "papers": [{"id": "p1", "title": "t", "abstract": "a"}], "criteria": []}' # → {"job_id": "...", "status": "pending"} # Then assert structure, not content: # - response has job_id, status, progress, stage fields # - status transitions pending → completed within N polls # - results[] length equals papers[] length # - each result has id, final_score, cluster_id, reasoning
LLM outputs are non-deterministic — asserting on a specific score or phrasing will give you a flaky test suite. Assert the shape of the response and the transitions of the state machine, not the content. The nightly live smoke test runs one tiny real job against the canary environment with a tight max_estimated_cost_usd cap and alerts on failure; this catches environmental breakage (bad key, network, cost-table drift) without burning money per PR.
mock: true.Before you point the first production user at the integration, walk this list end-to-end. Each item is a dependency for the next — skipping one turns the others into theatre. Monitoring and the kill-switch come last because they're useless without the first six items in place.
CRYSTALLISE_API_KEYS — confirm the key in $API_KEY is the production key, not the dev-shared one. Per-tenant keys if you have multiple customers on one deployment.max_estimated_cost_usd on all outbound POSTs. No unbounded jobs.auth, rate_limit, validation, server_restart, internal each land somewhere explicit in your code (retry, surface, abort)./health/ready — wire a probe that hits this every 30–60s and alerts on 503 or on the degraded body.rate_limit — with a cap on total wait. Don't retry forever.error_category: server_restart spikes — one is normal, a cluster in 10 minutes means the backend is flapping.results to your own database before it ages out.# Monitoring hook — the probe your uptime checker should hit curl -sI "$BASE_URL/health/ready" # → HTTP/1.1 200 OK (healthy) # → HTTP/1.1 503 (degraded — inspect the JSON body for which check failed)
Tick every item before pointing a production user at the integration.
/health/ready status is wired up and reviewed.