Crystallise AI Backend — Integration Playbook

From zero to production-ready in eight steps. Companion: API Reference, PHP Recipes.

Before you start
Send your first mock request
Switch to a live call
Handle errors properly
Integrate with your app's job model
Cost management
Testing in CI
Go-live checklist

1. Before you start

Before wiring anything into Laravel, confirm the three pieces of connection data are in your hands and the service is actually reachable from where your app runs. The only external call in this section is a no-auth liveness probe, so you can run it from any shell that can reach the backend host.

You need	Where it comes from
`BASE_URL`	Crystallise ops — the host where the service is deployed (e.g. `https://api.example.com`).
`X-API-Key`	One of the keys in the server's `CRYSTALLISE_API_KEYS` env var. Rotated per tenant.
`X-OpenAI-API-Key` (optional)	Your own OpenAI key if you want per-tenant billing. Otherwise the server's env fallback is used.

Any language works for the examples in this playbook — curl is the baseline. The parallel Recipes document assumes Laravel 10+ on PHP 8.2+ for the code snippets you'll paste into a real app later.

export BASE_URL=https://api.example.com
export API_KEY=your-service-key
export OPENAI_KEY=sk-...         # only needed for live calls

curl -s "$BASE_URL/health"
# → {"status": "ok"}

curl -s "$BASE_URL/health/ready"
# → {"status": "ready", "checks": {"database": "ok", "openai_key": "configured"}}

/health is a pure liveness probe — it returns 200 even when OpenAI is unreachable. /health/ready is the one to wire your monitoring against: it flips to 503 with a degraded status if the DB or OpenAI key check fails. Both are public (no X-API-Key needed).

Done when:

/health returns 200 with {"status": "ok"} from the machine your app will run on.
You have a value for X-API-Key and can paste it into $API_KEY without copy-paste errors.
You've decided whether you'll send X-OpenAI-API-Key per request (per-tenant billing) or rely on the server's env var fallback.

2. Send your first mock request

Mock mode is the fastest way to confirm the request/response shape and your polling loop without spending a cent on OpenAI. Every mutating endpoint accepts "mock": true — the service returns deterministic canned data and does not make an OpenAI call, so no X-OpenAI-API-Key is required. Submit, then poll until status flips from "pending" to "completed".

# Submit — minimal screening job with 2 papers, 1 criterion, mock mode
curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mock": true,
    "papers": [
      {"id": "p1", "title": "RCT of drug X in adults", "abstract": "Randomized trial of 150 adults..."},
      {"id": "p2", "title": "Case report on rare syndrome", "abstract": "A single patient with..."}
    ],
    "criteria": [{"name": "Population", "type": "include", "value": "Adults"}]
  }'
# → {"job_id": "abc-123", "status": "pending", "progress": 0.0, "stage": ""}

# Poll — run every 1-2 seconds until status === "completed"
curl -s "$BASE_URL/v1/screening/jobs/abc-123" -H "X-API-Key: $API_KEY"
# → {"job_id": "abc-123", "status": "completed", "progress": 1.0, "stage": "assignment",
#    "results": [{"id": "p1", "final_score": 4.2, "cluster_id": 1, "reasoning": "..."},
#                {"id": "p2", "final_score": 2.1, "cluster_id": 2, "reasoning": "..."}],
#    "clusters": [...], "duration_ms": 3421, "estimated_cost_usd": 0.0}

The POST returns a job_id immediately with status: "pending" — work happens asynchronously across four pipeline stages (labelling → reasoning → clustering → assignment). The GET response is the same shape regardless of current status; only results and clusters populate once status becomes "completed". In mock mode estimated_cost_usd is 0.0 and scores are deterministic canned values.

Done when:

The POST returned a job_id and an initial status: "pending".
A subsequent GET returned status: "completed" within a few seconds.
results[] has exactly 2 entries, one per paper you submitted.

3. Switch to a live call

Same request, one header added, one flag flipped. A live call actually invokes OpenAI and costs a few cents per paper depending on model and repetitions. Scores now reflect the real content of each paper's title and abstract rather than canned data — expect them to vary between papers in a meaningful way.

curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "X-OpenAI-API-Key: $OPENAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mock": false,
    "papers": [
      {"id": "p1", "title": "RCT of drug X in adults", "abstract": "Randomized trial of 150 adults..."},
      {"id": "p2", "title": "Case report on rare syndrome", "abstract": "A single patient with..."}
    ],
    "criteria": [{"name": "Population", "type": "include", "value": "Adults"}]
  }'
# → {"job_id": "def-456", "status": "pending", "progress": 0.0, "stage": ""}

curl -s "$BASE_URL/v1/screening/jobs/def-456" -H "X-API-Key: $API_KEY"
# → {"status": "completed", "progress": 1.0, "stage": "assignment",
#    "results": [{"id": "p1", "final_score": 4.6, "cluster_id": 1, "reasoning": "Clear RCT..."},
#                {"id": "p2", "final_score": 1.8, "cluster_id": 2, "reasoning": "Case report..."}],
#    "duration_ms": 12840, "estimated_cost_usd": 0.018}

final_score is a 1–5 mean across repetitions AI calls (5 by default — see backend-guide.html for how the pipeline computes it). Two differently-eligible papers should land at noticeably different scores; if both come back at the same value, double-check that mock really flipped to false. estimated_cost_usd will be a small non-zero number — this is the service's own pricing estimate, not the invoice OpenAI sends you.

Done when:

The job reached status: "completed" with final_score values that vary per paper (not deterministic).
estimated_cost_usd is non-zero and in a plausible range (fractions of a cent per paper for gpt-5-nano).
OpenAI's dashboard shows a corresponding token spend roughly matching that estimate.

4. Handle errors properly

The service classifies every error into one of five error_code buckets so client code can decide, per category, whether to retry, surface the message to the user, or abort. For synchronous endpoints the code appears in the HTTP body under detail.error_code; for async screening and indexer jobs it appears inside the job response (HTTP 200) as error_category when status === "failed". The two fields share the same taxonomy.

# Failed async job — poll response for a job whose OpenAI key was invalid
curl -s "$BASE_URL/v1/screening/jobs/ghi-789" -H "X-API-Key: $API_KEY"
# → {"job_id": "ghi-789", "status": "failed",
#    "error": "Invalid OpenAI key",
#    "error_category": "auth",
#    "error_retryable": false}

error_code	Retryable	Recommended action
`auth`	no	Surface to user / ops — the `X-API-Key` or `X-OpenAI-API-Key` is bad. Do not retry.
`rate_limit`	yes	Retry with exponential backoff (e.g. 2s, 4s, 8s, 16s, then give up).
`validation`	no	Surface to the caller — the request body is malformed. Fix and resubmit, don't retry blindly.
`server_restart`	yes	Async-only; the service restarted mid-job. Resubmit the entire job.
`internal` / `unknown`	—	Retry once with a delay, then surface. Usually a transient 500.

Every branch of your integration that calls this service should pattern-match on error_code / error_category and route accordingly. Blind "retry everything three times" is the wrong default — hammering auth or validation failures wastes requests and delays the real fix.

Done when:

Your integration has explicit handling for each of the five buckets — no fall-through.
Retryable categories (rate_limit, server_restart) back off exponentially rather than retrying in a tight loop.
Non-retryable categories (auth, validation) surface immediately to the user or ops channel.

5. Integrate with your app's job model

Every Crystallise job should carry a project_id — an opaque integer from your side that correlates the Crystallise job back to the row in your app's database. The service uses this for two things: a one-active-job-per-project lock (returns 409 Conflict on concurrent submit), and the GET /v1/screening/active-job?project_id=X lookup so you can find an in-flight job after a client restart. Retention is your responsibility — pull the result JSON and store it before the service forgets.

# Second submit while a job is already running for project_id=42
curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"project_id": 42, "mock": true, "papers": [...], "criteria": [...]}'
# → HTTP 409
# → {"detail": "Project 42 already has an active screening job"}

# Recovery: find the existing job and decide whether to wait or cancel
curl -s "$BASE_URL/v1/screening/active-job?project_id=42" -H "X-API-Key: $API_KEY"
# → {"job_id": "abc-123", "status": "running", ...}   (or null if none)

On 409, the choice is yours: poll the existing job to completion and use its result, cancel and resubmit, or queue your new request in your own app and fire it when the existing job finishes. The server does not queue for you. The service's in-memory job store is ephemeral — a restart loses running jobs (you'll see error_category: "server_restart" on next poll) and old completed jobs are garbage-collected over time. Persist the results payload on your side the moment a job completes.

Done when:

Every Crystallise job you create carries a stable project_id from your app's schema.
Your app stores the results JSON in its own database as soon as a job reaches status: "completed".
A 409 from POST is handled explicitly — either by polling the existing active job or by cancelling and resubmitting.

6. Cost management

Three levers control spend: a preflight cost estimate, a hard ceiling carried in the POST, and per-tenant billing via the OpenAI key header. Use all three on any production path. The numbers the service reports are approximate — the pricing table is hardcoded in crystallise.llm.cost.DEFAULT_PRICING_PER_1M and can drift from OpenAI's current rates by single-digit percent.

# Preflight estimate — no papers submitted, just sizing
curl -sX POST "$BASE_URL/v1/screening/estimate" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-5-nano", "papers_count": 500, "repetitions": 5, "criteria_count": 10}'
# → {"estimated_input_tokens": 775000, "estimated_output_tokens": 75000,
#    "estimated_cost_usd": 0.0625, "confidence": "approximate",
#    "disclaimer": "Estimate based on empirical averages. Actual cost may vary +-30%..."}

# Submit with a hard cap — service rejects with 400 if the live estimate exceeds it
curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "X-OpenAI-API-Key: $OPENAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{"papers": [...], "criteria": [...], "max_estimated_cost_usd": 0.50}'
# → 400: {"detail": "Estimated cost $1.23 exceeds max_estimated_cost_usd=$0.50"}

Run the estimate first for any job whose size isn't bounded by hand, then pass a max_estimated_cost_usd that gives you some headroom over the estimate but aborts on pathological inputs (a 50,000-paper submission by accident, say). For user-triggered jobs, forward the tenant's OpenAI key as X-OpenAI-API-Key so the charge lands on their invoice rather than yours.

Caveat: treat estimated_cost_usd — both pre- and post-job — as a planning figure, not a billing line. Reconcile monthly against the OpenAI invoice. If the drift is more than a few percent, flag it to the backend team so the pricing table can be refreshed.

Done when:

Every production job carries a max_estimated_cost_usd cap sized against the preflight estimate.
User-triggered jobs pass X-OpenAI-API-Key per request so the charge flows to the right tenant.
You reconcile the sum of estimated_cost_usd across jobs against OpenAI's invoice at least monthly.

7. Testing in CI

Run everything in mock: true by default in CI. Mock mode exercises the full request/response/polling contract without an OpenAI call, so it's free, fast, and deterministic. Save real OpenAI traffic for a single nightly smoke test against a canary, not for every PR.

# PR test — mock only, no OPENAI_KEY set
curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"mock": true, "papers": [{"id": "p1", "title": "t", "abstract": "a"}], "criteria": []}'
# → {"job_id": "...", "status": "pending"}

# Then assert structure, not content:
#   - response has job_id, status, progress, stage fields
#   - status transitions pending → completed within N polls
#   - results[] length equals papers[] length
#   - each result has id, final_score, cluster_id, reasoning

LLM outputs are non-deterministic — asserting on a specific score or phrasing will give you a flaky test suite. Assert the shape of the response and the transitions of the state machine, not the content. The nightly live smoke test runs one tiny real job against the canary environment with a tight max_estimated_cost_usd cap and alerts on failure; this catches environmental breakage (bad key, network, cost-table drift) without burning money per PR.

Done when:

CI never calls OpenAI on PR builds — every fixture uses mock: true.
CI asserts only structural properties (fields present, transitions valid), never LLM content.
There is exactly one nightly live smoke test against a canary with a cost cap and an alert on failure.

8. Go-live checklist

Before you point the first production user at the integration, walk this list end-to-end. Each item is a dependency for the next — skipping one turns the others into theatre. Monitoring and the kill-switch come last because they're useless without the first six items in place.

Rotate CRYSTALLISE_API_KEYS — confirm the key in $API_KEY is the production key, not the dev-shared one. Per-tenant keys if you have multiple customers on one deployment.
Cap every production call — set max_estimated_cost_usd on all outbound POSTs. No unbounded jobs.
Route all five error categories — auth, rate_limit, validation, server_restart, internal each land somewhere explicit in your code (retry, surface, abort).
Monitor /health/ready — wire a probe that hits this every 30–60s and alerts on 503 or on the degraded body.
Exponential backoff on rate_limit — with a cap on total wait. Don't retry forever.
Alert on error_category: server_restart spikes — one is normal, a cluster in 10 minutes means the backend is flapping.
Retain result JSON on your side — the Crystallise job store is ephemeral; persist every completed job's results to your own database before it ages out.
Kill-switch — a feature flag or queue pause that lets you stop all outbound Crystallise traffic without shipping a deploy, in case of a cost incident or an upstream outage.

# Monitoring hook — the probe your uptime checker should hit
curl -sI "$BASE_URL/health/ready"
# → HTTP/1.1 200 OK     (healthy)
# → HTTP/1.1 503        (degraded — inspect the JSON body for which check failed)

Tick every item before pointing a production user at the integration.

Done when:

Every item in the eight-point list above has an owner and is ticked off.
You have a rollback plan (feature-flag flip, queue pause) if a post-launch signal looks off.
A dashboard showing rolling job cost, error-category mix, and /health/ready status is wired up and reviewed.

Next: once the playbook is done, bookmark Recipes for reusable PHP snippets and Troubleshooting for when something doesn't go to plan.

Crystallise AI Backend — Integration Playbook

Contents

1. Before you start

2. Send your first mock request

3. Switch to a live call

4. Handle errors properly

5. Integrate with your app's job model

6. Cost management

7. Testing in CI

8. Go-live checklist