# Crystallise AI Backend — Integration Playbook

From zero to production-ready in eight steps. Companion: [API Reference](api-reference.md), [PHP Recipes](recipes.md).

### Contents

1. [Before you start](#prereqs)
2. [Send your first mock request](#first-mock)
3. [Switch to a live call](#first-live)
4. [Handle errors properly](#errors)
5. [Integrate with your app's job model](#app-job-model)
6. [Cost management](#cost)
7. [Testing in CI](#ci)
8. [Go-live checklist](#go-live)

## 1. Before you start

Before wiring anything into Laravel, confirm the three pieces of connection data are in your hands and the service is actually reachable from where your app runs. The only external call in this section is a no-auth liveness probe, so you can run it from any shell that can reach the backend host.

| You need | Where it comes from |
| --- | --- |
| `BASE_URL` | Crystallise ops — the host where the service is deployed (e.g. `https://api.example.com`). |
| `X-API-Key` | One of the keys in the server's `CRYSTALLISE_API_KEYS` env var. Rotated per tenant. |
| `X-OpenAI-API-Key` (optional) | Your own OpenAI key if you want per-tenant billing. Otherwise the server's env fallback is used. |

Any language works for the examples in this playbook — curl is the baseline. The parallel [Recipes](recipes.md) document assumes Laravel 10+ on PHP 8.2+ for the code snippets you'll paste into a real app later.

```
export BASE_URL=https://api.example.com
export API_KEY=your-service-key
export OPENAI_KEY=sk-...         # only needed for live calls

curl -s "$BASE_URL/health"
# → {"status": "ok"}

curl -s "$BASE_URL/health/ready"
# → {"status": "ready", "checks": {"database": "ok", "openai_key": "configured"}}
```

`/health` is a pure liveness probe — it returns 200 even when OpenAI is unreachable. `/health/ready` is the one to wire your monitoring against: it flips to 503 with a `degraded` status if the DB or OpenAI key check fails. Both are public (no `X-API-Key` needed).

**Done when:**

- `/health` returns 200 with `{"status": "ok"}` from the machine your app will run on.
- You have a value for `X-API-Key` and can paste it into `$API_KEY` without copy-paste errors.
- You've decided whether you'll send `X-OpenAI-API-Key` per request (per-tenant billing) or rely on the server's env var fallback.

## 2. Send your first mock request

Mock mode is the fastest way to confirm the request/response shape and your polling loop without spending a cent on OpenAI. Every mutating endpoint accepts `"mock": true` — the service returns deterministic canned data and does not make an OpenAI call, so no `X-OpenAI-API-Key` is required. Submit, then poll until `status` flips from `"pending"` to `"completed"`.

```
# Submit — minimal screening job with 2 papers, 1 criterion, mock mode
curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mock": true,
    "papers": [
      {"id": "p1", "title": "RCT of drug X in adults", "abstract": "Randomized trial of 150 adults..."},
      {"id": "p2", "title": "Case report on rare syndrome", "abstract": "A single patient with..."}
    ],
    "criteria": [{"name": "Population", "type": "include", "value": "Adults"}]
  }'
# → {"job_id": "abc-123", "status": "pending", "progress": 0.0, "stage": ""}
```

```
# Poll — run every 1-2 seconds until status === "completed"
curl -s "$BASE_URL/v1/screening/jobs/abc-123" -H "X-API-Key: $API_KEY"
# → {"job_id": "abc-123", "status": "completed", "progress": 1.0, "stage": "assignment",
#    "results": [{"id": "p1", "final_score": 4.2, "cluster_id": 1, "reasoning": "..."},
#                {"id": "p2", "final_score": 2.1, "cluster_id": 2, "reasoning": "..."}],
#    "clusters": [...], "duration_ms": 3421, "estimated_cost_usd": 0.0}
```

The POST returns a `job_id` immediately with `status: "pending"` — work happens asynchronously across four pipeline stages (`labelling → reasoning → clustering → assignment`). The GET response is the same shape regardless of current status; only `results` and `clusters` populate once `status` becomes `"completed"`. In mock mode `estimated_cost_usd` is `0.0` and scores are deterministic canned values.

**Done when:**

- The POST returned a `job_id` and an initial `status: "pending"`.
- A subsequent GET returned `status: "completed"` within a few seconds.
- `results[]` has exactly 2 entries, one per paper you submitted.

## 3. Switch to a live call

Same request, one header added, one flag flipped. A live call actually invokes OpenAI and costs a few cents per paper depending on model and repetitions. Scores now reflect the real content of each paper's title and abstract rather than canned data — expect them to vary between papers in a meaningful way.

```
curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "X-OpenAI-API-Key: $OPENAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mock": false,
    "papers": [
      {"id": "p1", "title": "RCT of drug X in adults", "abstract": "Randomized trial of 150 adults..."},
      {"id": "p2", "title": "Case report on rare syndrome", "abstract": "A single patient with..."}
    ],
    "criteria": [{"name": "Population", "type": "include", "value": "Adults"}]
  }'
# → {"job_id": "def-456", "status": "pending", "progress": 0.0, "stage": ""}
```

```
curl -s "$BASE_URL/v1/screening/jobs/def-456" -H "X-API-Key: $API_KEY"
# → {"status": "completed", "progress": 1.0, "stage": "assignment",
#    "results": [{"id": "p1", "final_score": 4.6, "cluster_id": 1, "reasoning": "Clear RCT..."},
#                {"id": "p2", "final_score": 1.8, "cluster_id": 2, "reasoning": "Case report..."}],
#    "duration_ms": 12840, "estimated_cost_usd": 0.018}
```

`final_score` is a 1–5 mean across `repetitions` AI calls (5 by default — see `backend-guide.md` for how the pipeline computes it). Two differently-eligible papers should land at noticeably different scores; if both come back at the same value, double-check that `mock` really flipped to `false`. `estimated_cost_usd` will be a small non-zero number — this is the service's own pricing estimate, not the invoice OpenAI sends you.

**Done when:**

- The job reached `status: "completed"` with `final_score` values that vary per paper (not deterministic).
- `estimated_cost_usd` is non-zero and in a plausible range (fractions of a cent per paper for `gpt-5-nano`).
- OpenAI's dashboard shows a corresponding token spend roughly matching that estimate.

## 4. Handle errors properly

The service classifies every error into one of five `error_code` buckets so client code can decide, per category, whether to retry, surface the message to the user, or abort. For synchronous endpoints the code appears in the HTTP body under `detail.error_code`; for async screening and indexer jobs it appears inside the job response (HTTP 200) as `error_category` when `status === "failed"`. The two fields share the same taxonomy.

```
# Failed async job — poll response for a job whose OpenAI key was invalid
curl -s "$BASE_URL/v1/screening/jobs/ghi-789" -H "X-API-Key: $API_KEY"
# → {"job_id": "ghi-789", "status": "failed",
#    "error": "Invalid OpenAI key",
#    "error_category": "auth",
#    "error_retryable": false}
```

| error\_code | Retryable | Recommended action |
| --- | --- | --- |
| `auth` | no | Surface to user / ops — the `X-API-Key` or `X-OpenAI-API-Key` is bad. Do not retry. |
| `rate_limit` | yes | Retry with exponential backoff (e.g. 2s, 4s, 8s, 16s, then give up). |
| `validation` | no | Surface to the caller — the request body is malformed. Fix and resubmit, don't retry blindly. |
| `server_restart` | yes | Async-only; the service restarted mid-job. Resubmit the entire job. |
| `internal` / `unknown` | — | Retry once with a delay, then surface. Usually a transient 500. |

Every branch of your integration that calls this service should pattern-match on `error_code` / `error_category` and route accordingly. Blind "retry everything three times" is the wrong default — hammering `auth` or `validation` failures wastes requests and delays the real fix.

**Done when:**

- Your integration has explicit handling for each of the five buckets — no fall-through.
- Retryable categories (`rate_limit`, `server_restart`) back off exponentially rather than retrying in a tight loop.
- Non-retryable categories (`auth`, `validation`) surface immediately to the user or ops channel.

## 5. Integrate with your app's job model

Every Crystallise job should carry a `project_id` — an opaque integer from your side that correlates the Crystallise job back to the row in your app's database. The service uses this for two things: a one-active-job-per-project lock (returns 409 Conflict on concurrent submit), and the `GET /v1/screening/active-job?project_id=X` lookup so you can find an in-flight job after a client restart. Retention is your responsibility — pull the result JSON and store it before the service forgets.

```
# Second submit while a job is already running for project_id=42
curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"project_id": 42, "mock": true, "papers": [...], "criteria": [...]}'
# → HTTP 409
# → {"detail": "Project 42 already has an active screening job"}
```

```
# Recovery: find the existing job and decide whether to wait or cancel
curl -s "$BASE_URL/v1/screening/active-job?project_id=42" -H "X-API-Key: $API_KEY"
# → {"job_id": "abc-123", "status": "running", ...}   (or null if none)
```

On 409, the choice is yours: poll the existing job to completion and use its result, cancel and resubmit, or queue your new request in your own app and fire it when the existing job finishes. The server does not queue for you. The service's in-memory job store is ephemeral — a restart loses running jobs (you'll see `error_category: "server_restart"` on next poll) and old completed jobs are garbage-collected over time. Persist the `results` payload on your side the moment a job completes.

**Done when:**

- Every Crystallise job you create carries a stable `project_id` from your app's schema.
- Your app stores the `results` JSON in its own database as soon as a job reaches `status: "completed"`.
- A 409 from POST is handled explicitly — either by polling the existing active job or by cancelling and resubmitting.

## 6. Cost management

Three levers control spend: a preflight cost estimate, a hard ceiling carried in the POST, and per-tenant billing via the OpenAI key header. Use all three on any production path. The numbers the service reports are approximate — the pricing table is hardcoded in `crystallise.llm.cost.DEFAULT_PRICING_PER_1M` and can drift from OpenAI's current rates by single-digit percent.

```
# Preflight estimate — no papers submitted, just sizing
curl -sX POST "$BASE_URL/v1/screening/estimate" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-5-nano", "papers_count": 500, "repetitions": 5, "criteria_count": 10}'
# → {"estimated_input_tokens": 775000, "estimated_output_tokens": 75000,
#    "estimated_cost_usd": 0.0625, "confidence": "approximate",
#    "disclaimer": "Estimate based on empirical averages. Actual cost may vary +-30%..."}
```

```
# Submit with a hard cap — service rejects with 400 if the live estimate exceeds it
curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "X-OpenAI-API-Key: $OPENAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{"papers": [...], "criteria": [...], "max_estimated_cost_usd": 0.50}'
# → 400: {"detail": "Estimated cost $1.23 exceeds max_estimated_cost_usd=$0.50"}
```

Run the estimate first for any job whose size isn't bounded by hand, then pass a `max_estimated_cost_usd` that gives you some headroom over the estimate but aborts on pathological inputs (a 50,000-paper submission by accident, say). For user-triggered jobs, forward the tenant's OpenAI key as `X-OpenAI-API-Key` so the charge lands on their invoice rather than yours.

**Caveat:** treat `estimated_cost_usd` — both pre- and post-job — as a planning figure, not a billing line. Reconcile monthly against the OpenAI invoice. If the drift is more than a few percent, flag it to the backend team so the pricing table can be refreshed.

**Done when:**

- Every production job carries a `max_estimated_cost_usd` cap sized against the preflight estimate.
- User-triggered jobs pass `X-OpenAI-API-Key` per request so the charge flows to the right tenant.
- You reconcile the sum of `estimated_cost_usd` across jobs against OpenAI's invoice at least monthly.

## 7. Testing in CI

Run everything in `mock: true` by default in CI. Mock mode exercises the full request/response/polling contract without an OpenAI call, so it's free, fast, and deterministic. Save real OpenAI traffic for a single nightly smoke test against a canary, not for every PR.

```
# PR test — mock only, no OPENAI_KEY set
curl -sX POST "$BASE_URL/v1/screening/jobs" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"mock": true, "papers": [{"id": "p1", "title": "t", "abstract": "a"}], "criteria": []}'
# → {"job_id": "...", "status": "pending"}

# Then assert structure, not content:
#   - response has job_id, status, progress, stage fields
#   - status transitions pending → completed within N polls
#   - results[] length equals papers[] length
#   - each result has id, final_score, cluster_id, reasoning
```

LLM outputs are non-deterministic — asserting on a specific score or phrasing will give you a flaky test suite. Assert the *shape* of the response and the *transitions* of the state machine, not the content. The nightly live smoke test runs one tiny real job against the canary environment with a tight `max_estimated_cost_usd` cap and alerts on failure; this catches environmental breakage (bad key, network, cost-table drift) without burning money per PR.

**Done when:**

- CI never calls OpenAI on PR builds — every fixture uses `mock: true`.
- CI asserts only structural properties (fields present, transitions valid), never LLM content.
- There is exactly one nightly live smoke test against a canary with a cost cap and an alert on failure.

## 8. Go-live checklist

Before you point the first production user at the integration, walk this list end-to-end. Each item is a dependency for the next — skipping one turns the others into theatre. Monitoring and the kill-switch come last because they're useless without the first six items in place.

1. **Rotate `CRYSTALLISE_API_KEYS`** — confirm the key in `$API_KEY` is the production key, not the dev-shared one. Per-tenant keys if you have multiple customers on one deployment.
2. **Cap every production call** — set `max_estimated_cost_usd` on all outbound POSTs. No unbounded jobs.
3. **Route all five error categories** — `auth`, `rate_limit`, `validation`, `server_restart`, `internal` each land somewhere explicit in your code (retry, surface, abort).
4. **Monitor `/health/ready`** — wire a probe that hits this every 30–60s and alerts on 503 or on the `degraded` body.
5. **Exponential backoff on `rate_limit`** — with a cap on total wait. Don't retry forever.
6. **Alert on `error_category: server_restart` spikes** — one is normal, a cluster in 10 minutes means the backend is flapping.
7. **Retain result JSON on your side** — the Crystallise job store is ephemeral; persist every completed job's `results` to your own database before it ages out.
8. **Kill-switch** — a feature flag or queue pause that lets you stop all outbound Crystallise traffic without shipping a deploy, in case of a cost incident or an upstream outage.

```
# Monitoring hook — the probe your uptime checker should hit
curl -sI "$BASE_URL/health/ready"
# → HTTP/1.1 200 OK     (healthy)
# → HTTP/1.1 503        (degraded — inspect the JSON body for which check failed)
```

Tick every item before pointing a production user at the integration.

**Done when:**

- Every item in the eight-point list above has an owner and is ticked off.
- You have a rollback plan (feature-flag flip, queue pause) if a post-launch signal looks off.
- A dashboard showing rolling job cost, error-category mix, and `/health/ready` status is wired up and reviewed.

**Next:** once the playbook is done, bookmark [Recipes](recipes.md) for reusable PHP snippets and [Troubleshooting](troubleshooting.md) for when something doesn't go to plan.