# Crystallise AI Backend — Troubleshooting

Top confusion points and their fixes. Check here before filing a ticket.

### Contents

- [Authentication & Keys](#auth)
- [Jobs & Polling](#jobs)
- [Costs & Models](#cost)
- [Local Dev & Docker](#devops)
- [Unexpected Responses](#responses)

## Authentication & Keys

### "401 Unauthorized but my API key is correct"

Diagnosis

Two different keys flow through this service. `X-API-Key` authenticates your call to the Crystallise backend; `X-OpenAI-API-Key` is a per-request passthrough to OpenAI. A 401 on an otherwise well-formed request usually means these got swapped — your OpenAI key is being treated as the service API key.

Fix

Inspect your request headers and confirm `X-API-Key` matches an entry in `CRYSTALLISE_API_KEYS` (or is any non-empty string in dev mode). If you also need per-user OpenAI billing, set `X-OpenAI-API-Key` separately. See the [Authentication section](api-reference.md#auth) for header examples.

### "Job finishes with `status: failed`, `error_category: auth`"

Diagnosis

The service-level `X-API-Key` passed the gate — otherwise the POST would have 401'd synchronously — but the downstream OpenAI call rejected its key. That key came either from the per-request `X-OpenAI-API-Key` header or from the server-side `CRYSTALLISE_OPENAI_API_KEY` environment variable. One of those is invalid, revoked, or out of quota.

Fix

1. If you sent `X-OpenAI-API-Key`, mint a fresh key in the OpenAI dashboard and retry.
2. If you rely on the server env var, have the operator verify `CRYSTALLISE_OPENAI_API_KEY` is set and still valid.
3. See [Backend Guide § LLM](backend-guide.md#llm) for how these two sources are resolved at request time.

### "Dev mode (empty `CRYSTALLISE_API_KEYS`) accepts any key — is that safe?"

Diagnosis

When `CRYSTALLISE_API_KEYS` is unset or empty, the auth middleware falls through and accepts any non-empty `X-API-Key` value. That is deliberate for local development so integrators don't need credentials to run the stack, but it means anyone who can reach the port can submit jobs that spend OpenAI credits.

Fix

For any deployment beyond a developer laptop, set `CRYSTALLISE_API_KEYS` to a comma-separated list of strong random strings and restart the service. Treat the list as a shared secret — it gates all access to the API.

### "How do I rotate the service API key?"

Diagnosis

The service has no per-user key database; `CRYSTALLISE_API_KEYS` is the single source of truth and is read at process start. Rotation therefore means editing that env var and cycling the process, not issuing revocations in a dashboard.

Fix

1. Add the new key to `CRYSTALLISE_API_KEYS` alongside the old one and restart.
2. Roll out the new value in the client's `X-API-Key` header.
3. Once all clients are migrated, remove the old key from `CRYSTALLISE_API_KEYS` and restart again.

### "Can I send the OpenAI key once and have it remembered?"

Diagnosis

No. The service is intentionally [stateless](glossary.md#stateless-compute) and never persists per-request keys — that's why a job's `X-OpenAI-API-Key` only affects that job. There is no login, no session, no server-side store of caller credentials.

Fix

Either send `X-OpenAI-API-Key` on every request that needs per-caller OpenAI billing, or configure the server-side `CRYSTALLISE_OPENAI_API_KEY` env var as a shared fallback. The per-request header always wins when both are present.

## Jobs & Polling

### "Job is stuck in `status: pending` forever"

Diagnosis

Screening jobs run as a background task kicked off at POST time. "Forever pending" means that task either never started (the kickoff raised before it scheduled) or crashed without updating the job record. The in-memory job store still holds the `pending` row because no writer ever moved it forward.

Fix

Check the server logs for an exception traced to the job id. If the process has restarted since submission, the job is gone — resubmit. Otherwise cancel via DELETE and resubmit; see [API Reference § Screening](api-reference.md#screening).

### "Results are empty but `status: completed`"

Diagnosis

Completion just means the pipeline ran cleanly end-to-end; it doesn't guarantee any paper survived filtering. A strict [threshold](glossary.md#threshold-screening) combined with a narrow cluster type can eliminate every candidate, and you'll see an empty result set rather than an error.

Fix

Inspect the `clusters` array in the response to see how papers were bucketed before filtering. Lower the threshold (default `1.0`) or widen the cluster configuration and resubmit. The [threshold glossary entry](glossary.md#threshold-screening) explains the sensitivity/specificity trade-off.

### "Job returns `status: failed`, `error_category: server_restart`"

Diagnosis

The server restarted while your job was in flight. Because job state lives in process memory, any partial progress was lost and the job is marked failed with this category so clients can distinguish it from a real pipeline error. It is not retriable in place — there is no row to resume from.

Fix

Resubmit the same request. If you see this category repeatedly, the service is probably crash-looping — have the operator check logs and `CRYSTALLISE_OPENAI_API_KEY` wiring before you retry further.

### "How often should I poll?"

Diagnosis

There is no push notification; clients discover completion by polling GET on the job id. Poll too fast and you burn request budget on a status field that hasn't changed; poll too slow and your end-to-end latency is dominated by sleep, not by inference.

Fix

Every 1–2 seconds is a sensible default. For very long jobs you can back off after the first minute, but don't go below 1s — there is no coalescing on the server side and short intervals add nothing but load.

### "I see `409 Conflict` on POST /v1/screening/jobs"

Diagnosis

The service enforces one active job per `project_id` to keep cost and concurrency bounded. If a project already has a job in `pending` or `running`, a new POST with the same `project_id` is rejected with 409 rather than queued.

Fix

1. Poll the existing job to completion, or DELETE it if you want to abandon that run.
2. Alternatively, drop `project_id` entirely for a one-off submission — the uniqueness check only fires when the field is present.

## Costs & Models

### "`estimated_cost_usd` doesn't match my OpenAI billing"

Diagnosis

The estimate is computed from a hardcoded table (`DEFAULT_PRICING_PER_1M`) snapshotted from OpenAI's public rate card at build time. OpenAI adjusts those rates periodically and the table does not auto-update, so the estimate drifts from your real invoice — usually by single-digit percent, but more during a pricing change.

Fix

Treat `estimated_cost_usd` as a cost ceiling for planning, not a billing line. Reconcile against the OpenAI dashboard for actuals. If the drift is large, the table in the server code needs refreshing; flag it to the backend team.

### "How do I cap spend on a job?"

Diagnosis

The screening request accepts an optional `max_estimated_cost_usd`. The server computes the estimate up-front and, if it exceeds your cap, rejects the request with a 400 before any OpenAI call is made. The cap is a preflight check, not a runtime kill switch.

Fix

Set `max_estimated_cost_usd` in the POST body to your budget. On rejection, either raise the cap, narrow the paper set, or switch to a cheaper model (see [c3](#c3)). See [API Reference § Screening](api-reference.md#screening) for the field.

### "Which model should I pick for screening?"

Diagnosis

Screening is I/O-bound over many small prompts, so model choice is a straight cost/quality trade. `gpt-5-nano` is the default and the cheapest; `gpt-5-mini` improves labelling quality, especially on borderline abstracts, at roughly 4x the cost per token.

Fix

Start with `gpt-5-nano` and only upgrade if you see too many low-confidence or clearly-wrong labels in the [reasoning](glossary.md#reasoning-screening) output. For tight budgets, keep `max_estimated_cost_usd` in place as a guardrail regardless of model.

### "Model X isn't supported — why?"

Diagnosis

Every request is validated against `crystallise.config.model_capabilities`, which encodes context window and feature support (structured outputs, function calling) for the models we've qualified. Unknown models, or models lacking a required feature for the endpoint, fail preflight with a `validation` error rather than being forwarded to OpenAI.

Fix

Pick a model listed in `model_capabilities`. If you need a newer or specialist model, have the backend team add an entry — the gate is deliberate so an incompatible model can't silently produce malformed results.

## Local Dev & Docker

### "`ModuleNotFoundError: No module named 'fastapi'` running `pytest`"

Diagnosis

Your shell resolved `pytest` to a system-wide install (often `/usr/bin/pytest` or a pyenv shim) instead of the project's virtualenv. That interpreter has no access to the project's dependencies, so the first FastAPI import blows up.

Fix

1. Activate the venv: `source .venv/bin/activate` before running `pytest`.
2. Or bypass `PATH`: `.venv/bin/python -m pytest`.

### "Port 5337 already in use on Docker Compose"

Diagnosis

The `docker-compose.yml` maps the Postgres container's `5432` to host port `5337` to avoid clashing with a default local Postgres on `5432`. If another process (a different Postgres, a prior Compose stack still running, or a leftover bind) holds `5337`, the bind fails.

Fix

Change the host-side port in `docker-compose.yml` (e.g. `"5338:5432"`) and update any local connection strings. Alternatively, find the offender with `ss -ltn 'sport = :5337'` (or `lsof -i :5337`) and stop it.

### "Mock-mode tests pass but real calls 401"

Diagnosis

The test suite installs an auth-bypass fixture that short-circuits the `X-API-Key` check so unit tests don't need credentials. When you point the same client code at a real deployment, that fixture isn't there and a missing or wrong `X-API-Key` header 401s immediately.

Fix

Confirm your production client is setting `X-API-Key` to a value present in the server's `CRYSTALLISE_API_KEYS`. See [a1](#a1) for the key-confusion variant and [mock mode](glossary.md#mock-mode) for what the fixture actually bypasses.

### "How do I run only the integration (live) tests?"

Diagnosis

Unit tests live alongside the code and run in [mock mode](glossary.md#mock-mode); integration tests under `tests/integration/` hit a real OpenAI endpoint and are skipped by default unless an API key is present in the environment. They're separated so CI doesn't accidentally spend money.

Fix

Export a real key and run the directory explicitly: `CRYSTALLISE_OPENAI_API_KEY=sk-... pytest tests/integration -v`. Either `CRYSTALLISE_OPENAI_API_KEY` or `OPENAI_API_KEY` is accepted.

## Unexpected Responses

### "`/criteria/consolidate` returns empty lists but a populated `warnings` array"

Diagnosis

A server-side quality filter sits between the LLM output and the response body. Proposals that are too low-confidence, too long, or structurally malformed are dropped and logged into `warnings` rather than returned. A fully-empty response with warnings means every candidate failed that filter.

Fix

Read the `warnings` entries for the specific reason (confidence, length, schema). Usually the input criteria are too sparse to consolidate — expand them and retry. See [Backend Guide § Criteria](backend-guide.md#criteria) for the filter rules.

### "`/indexer/run` returns per-record `indexing_status` other than `ok`"

Diagnosis

The indexer is a best-effort batch: one malformed abstract doesn't fail the whole call. Records that couldn't be extracted — typically because the abstract is missing, extremely short, or structurally broken — come back with a non-`ok` `indexing_status` and an `extraction_error` message, while successful records sit next to them in the same response.

Fix

Walk the `errors` array and each record's `extraction_error` to identify the bad inputs. Usually the fix is upstream in your data cleaning (fill missing abstracts, strip HTML), not in the API call. See the [AutoIndexer glossary entry](glossary.md#autoindexer) for what fields the extractor expects.

**Still stuck?** Check [Backend Guide § LLM](backend-guide.md#llm) for retry semantics, or [the glossary](glossary.md) for terminology.