# Crystallise AI Backend

Stateless AI processing API for systematic review screening and structured data extraction. Built for integration with Evidence Mapper.

This is a **backend-only service** — no UI, no data CRUD. Evidence Mapper owns projects, papers, and decisions. This service receives input, runs AI processing, and returns results.

## Contents

1. [What This Does](#what-this-does)
2. [Quick Start](#quick-start)
3. [Mock Mode](#mock-mode) — exercise every endpoint without OpenAI credits
4. [Authentication](#authentication)
5. [API Endpoints](#api-endpoints) (overview — full reference in [`docs/md/api-reference.md`](docs/md/api-reference.md))
6. [Environment Variables](#environment-variables)
7. [Project Layout](#project-layout)
8. [Testing](#testing)
9. [End-to-End Smoke Test](#end-to-end-smoke-test) — drive every AI endpoint with real CSV fixtures
10. [Documentation](#documentation)

## What This Does

| Capability | Description |
|-----------|-------------|
| **AI Screening** | 4-stage pipeline (score, reason, cluster, assign) for title/abstract screening |
| **AutoIndexer** | Structured field extraction from title/abstract via function calling with evidence + confidence |
| **Criteria AI** | Generate, refine, and consolidate eligibility criteria with PICO extraction |

## Quick Start

Requires **Python 3.11+**.

```bash
# 1. Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate            # bash/zsh — keep this activated for all commands below

# 2. Install (runtime deps + pytest, ruff, etc. from pyproject.toml [dev] extras)
pip install -e ".[dev]"

# 3. Configure
cp .env.example .env
# Edit .env — at minimum set CRYSTALLISE_OPENAI_API_KEY, or leave blank and
# pass the key per-request via the X-OpenAI-API-Key header.

# 4. Run
uvicorn api.main:app --reload --port 8005

# 5. Smoke-test
curl http://localhost:8005/health
# → {"status":"ok"}

# 6. Explore the interactive API docs
open http://localhost:8005/docs      # Swagger UI
open http://localhost:8005/redoc     # ReDoc
```

> If you prefer not to activate the venv, prefix every command with `.venv/bin/` instead — e.g. `.venv/bin/pip install -e ".[dev]"`, `.venv/bin/pytest tests/`. **Do not** mix the two (activated venv + bare `pytest` usually works; un-activated venv + bare `pytest` silently runs your system Python's pytest, which won't have the project's dependencies and will produce a wall of `ModuleNotFoundError`s).

### Docker

```bash
docker compose up --build
```

| Service | Host port | Description |
|---------|-----------|-------------|
| `backend` | 8005 | FastAPI server |
| `db` | 5337 | PostgreSQL 16 (5432 inside the container, remapped to 5337 on the host to avoid clashes) |

## Mock Mode

Every mutating endpoint accepts a `"mock": true` flag in the request body. In mock mode the endpoint returns canned-but-realistic data **without calling OpenAI** — no API key needed, no cost, deterministic output.

Use this during integration development to:

- exercise the full request/response shape of every endpoint
- write and run Evidence Mapper's client code against a real server
- run CI against this service without leaking keys

```json
{
  "mock": true,
  "papers": [ ... ],
  "criteria": [ ... ]
}
```

All 314 offline tests in this repo run in mock mode — they require no OpenAI connectivity and complete in ~2 seconds. (Two additional integration tests hit real OpenAI; they auto-skip when no key is configured — see [Testing](#testing).)

## Authentication

**Service API key** — required on every non-public endpoint:

```
X-API-Key: your-service-key
# or
Authorization: Bearer your-service-key
```

Configure valid keys via `CRYSTALLISE_API_KEYS` (comma-separated). If the env var is empty, any non-empty key is accepted (dev mode — **do not use in production**).

Public paths that skip auth: `/health`, `/health/ready`, `/docs`, `/redoc`, `/openapi.json`.

**OpenAI key passthrough** — each request can carry a user-specific OpenAI key:

```
X-OpenAI-API-Key: sk-proj-...
```

If the header is absent the service falls back to the `CRYSTALLISE_OPENAI_API_KEY` env var. This lets each Evidence Mapper user/organisation pay their own OpenAI costs.

## API Endpoints

All routes are available at both `/v1/*` (canonical) and `/*` (unversioned alias). Three capability families plus health/config:

| Family | Pattern | Endpoints |
|---|---|---|
| **Screening** | async — `POST /v1/screening/jobs` then poll `GET /v1/screening/jobs/{id}` | + `estimate`, `active-job`, list |
| **AutoIndexer** | sync `POST /v1/indexer/run` for small batches; async `/jobs` for large; plus `suggest-fields`, `refine-fields`, `group-tags`, `estimate` | one route per operation |
| **Criteria AI** | sync — `generate`, `picos`, `refine-context`, `refine`, `consolidate`, `analyze-question` | one round-trip each |
| **Config / Health** | `GET/PUT /v1/config/services[/{id}]`, `GET /v1/config/prompts`, `GET /health`, `GET /health/ready` | |

Every mutating endpoint accepts `"mock": true` to return canned data without an OpenAI call. Async-job errors surface inside the job body (`status: "failed", error_category, error_retryable`); sync errors return a typed HTTP body (`{detail: {message, error_code, retryable}}`).

**Full reference** — request/response shapes, every parameter, every example: [`docs/md/api-reference.md`](docs/md/api-reference.md). Live OpenAPI at `http://localhost:8005/docs` when the server is running.

**Integration pattern** for Evidence Mapper / NetReady: see [`docs/md/playbook.md`](docs/md/playbook.md) for the first-hour walkthrough; [`docs/md/recipes.md`](docs/md/recipes.md) for PHP/Laravel snippets; [`docs/md/troubleshooting.md`](docs/md/troubleshooting.md) for common errors.

Same pattern for indexer jobs. Criteria endpoints are synchronous (one request, one response).

## Environment Variables

| Variable | Required | Description | Default |
|----------|----------|-------------|---------|
| `CRYSTALLISE_OPENAI_API_KEY` | No | Fallback OpenAI API key. Can be overridden per-request via `X-OpenAI-API-Key` header. | — |
| `CRYSTALLISE_API_KEYS` | No | Comma-separated valid service API keys. Empty = dev mode (accept any non-empty key). | — |
| `CRYSTALLISE_DATABASE_URL` | No | PostgreSQL connection string. Empty = local SQLite file. | SQLite |
| `CRYSTALLISE_DEFAULT_MODEL` | No | Default screening model | `gpt-5-nano` |
| `CRYSTALLISE_CLUSTERING_MODEL` | No | Clustering model (needs 400K context) | `gpt-4.1` |
| `CRYSTALLISE_API_HOST` | No | Host to bind | `0.0.0.0` |
| `CRYSTALLISE_API_PORT` | No | Port to bind | `8005` |

See `.env.example` for a ready-to-copy template.

## Project Layout

```
api/                          FastAPI HTTP layer
├── main.py                   App factory, CORS, auth middleware, health endpoints
├── auth_middleware.py        API key validation
├── dependencies.py           Shared FastAPI deps (OpenAI client)
├── openai_key.py             Per-request OpenAI key resolution
├── utils.py                  classify_and_raise() for LLM errors → HTTPException
├── routers/                  One router per service (screening, indexer, criteria, config)
└── schemas/                  Pydantic request/response models

src/crystallise/              Pure Python package (no FastAPI deps)
├── llm/                      OpenAI client, retry with backoff, cost tracking, error taxonomy
├── screening/                4-stage screening pipeline + MockAIService
├── indexer/                  AutoIndexer: field extraction, refinement, grouping
├── criteria/                 Criteria AI: generate, refine, consolidate, PICO
├── prompts/                  Centralised prompt registry + metadata
├── batch/                    ThreadPoolExecutor (sync) + async semaphore runners
├── db/                       SQLite/PostgreSQL abstraction (job state only)
├── config/                   Settings, service registry, model capabilities
├── common/                   JSON parsing, export helpers, HTTP session
└── openai_resources/         File uploads, vector store management

tests/
├── api/                      FastAPI endpoint tests (use TestClient + auth bypass)
├── unit/                     Pure unit tests for the crystallise package
├── integration/              Live OpenAI smoke test (auto-skips without a key)
└── conftest.py               Shared fixtures; auto-bypasses auth middleware in tests

scripts/
├── smoke_ai_endpoints.py     End-to-end CLI: hits every AI endpoint with real fixtures
├── _fixtures.py              CSV → request-payload loader (single source of truth)
├── view_results.html         Self-contained HTML viewer for the JSON dumps
└── output/                   Per-run JSON dumps (one reference run committed)

data/                         Real-data fixtures consumed by scripts/_fixtures.py
├── README.md                 Per-file column reference + endpoint-input mapping
├── screening/                Citations, criteria, questions, project description
└── indexing/                 Indexer records + the 11 extraction-field specs

docs/
├── md/                       GitHub-rendered Markdown — primary doc surface
│   ├── api-reference.md      Every endpoint: request, response, examples
│   ├── backend-guide.md      Python-package internals (retry, cost, error taxonomy)
│   ├── playbook.md           First-hour integration walkthrough
│   ├── recipes.md            PHP / Guzzle / Laravel snippets
│   ├── glossary.md           Systematic-review terminology
│   └── troubleshooting.md    Symptom → diagnosis → fix entries
└── html/                     Same content as styled standalone HTML (offline browsing)
```

## Architecture

**Dual-layer design**: `crystallise` Python package (importable, testable, no FastAPI deps) + `api/` FastAPI service (HTTP, auth, async jobs). Defaults: `gpt-5-nano` for screening, `gpt-5-mini` for indexer + criteria, `gpt-4.1` for clustering (needs the 400K context window).

For the layer-by-layer breakdown — what's persisted vs ephemeral, retry/cost/error semantics, why the mock pipeline mirrors the real one verbatim, and the full model-routing rationale — see [`docs/md/backend-guide.md`](docs/md/backend-guide.md).

## Testing

All 314 offline tests run in ~2 seconds (mocked OpenAI, no network):

```bash
# With the venv activated (see Quick Start):
pytest tests/ -v
ruff check src/ api/ tests/

# Or without activating, using explicit venv paths:
.venv/bin/python -m pytest tests/ -v
.venv/bin/ruff check src/ api/ tests/
```

Test layout:

- `tests/api/` — FastAPI endpoint tests using `TestClient`. Auth middleware is bypassed via an autouse fixture in `tests/conftest.py`.
- `tests/unit/` — focused unit tests for the `crystallise` package (retry, cost, pipeline, prompts, etc.).
- `tests/integration/` — **live OpenAI smoke test** (see below).

### Live OpenAI integration test

`tests/integration/test_openai_live.py` makes a real call to OpenAI to verify the full stack works against your key. It auto-skips when no key is configured, so it never breaks a default `pytest` run.

```bash
# Run it (requires a real OpenAI key):
CRYSTALLISE_OPENAI_API_KEY=sk-... .venv/bin/python -m pytest tests/integration -v

# Or select by marker:
.venv/bin/python -m pytest -m integration -v
```

`OPENAI_API_KEY` works too. Cost per run is a fraction of a cent on `gpt-4.1`.

## End-to-End Smoke Test

`scripts/smoke_ai_endpoints.py` drives **every AI endpoint** with the real CSV fixtures under `data/` and reports pass/fail per endpoint. It is a separate tool from `pytest` — pytest validates code; this validates that the **deployed service** answers correctly.

Two-pass execution:

1. **Mock pass** — every endpoint that supports `mock: true` (full corpus, free, ~30s).
2. **Live pass** — all 14 AI endpoints against OpenAI on a downsampled 20-paper / 20-record subset (~5 min, ~$0.30–$0.80 with `gpt-5-nano` / `gpt-5-mini`).

```bash
# Two-pass run (mock → halts on failure → live)
python scripts/smoke_ai_endpoints.py

# Mock-only (no OpenAI key needed; safe for CI)
python scripts/smoke_ai_endpoints.py --mock-only

# Live-only with a smaller sample for fast iteration
python scripts/smoke_ai_endpoints.py --live-only --live-sample 5

# Pick subset of endpoint families
python scripts/smoke_ai_endpoints.py --only screening,indexer
```

**Auth & OpenAI key:** the script defaults to `--api-key dev-key`, which is accepted in dev mode (empty `CRYSTALLISE_API_KEYS`). The OpenAI key is resolved by the **backend** (not the script) via `CRYSTALLISE_OPENAI_API_KEY` in `.env`, so no host-shell env var is needed in the typical Docker setup.

Each run writes JSON dumps (request + response per endpoint, plus `_inputs.json` capturing the full fixture snapshot) to `scripts/output/<ISO timestamp>/`. Two reference runs are checked in:

- `scripts/output/20260424T133623Z/` — full mock + live pass across screening, criteria, and indexer.
- `scripts/output/20260520T142002Z/` — indexer-only live pass on `gpt-5-nano` / `gpt-5-mini`, captured after aligning `get_openai_client` with the shared `CRYSTALLISE_OPENAI_API_KEY` chain (commit `45f64f4`).

### Results viewer

`scripts/view_results.html` is a self-contained HTML page (no build, no CDN, no upload) that renders any run directory. Open it locally and click **📁 Select results folder**:

```bash
xdg-open scripts/view_results.html               # Linux
# or serve via local HTTP if your browser blocks file:// folder pickers
python -m http.server 8080 --directory scripts
# then visit http://localhost:8080/view_results.html
```

What you get:

- Pass/fail summary per pass
- Test inputs snapshot (project, criteria, questions, indexer fields, papers — all collapsible)
- Per-endpoint cards grouped by **mock / live × screening / criteria / indexer**
- Smart drill-downs: screening papers as a table with score + decision; indexer records as a table with one column per extracted field showing `value (confidence%)`
- Failed endpoints auto-expand with the error body in red

### Fixture inventory

See [`data/README.md`](data/README.md) for the per-file column reference and endpoint-input mapping. The 11 indexer extraction fields live in `data/indexing/TestMVP2_extraction_fields.csv` (the `examples` column is JSON-encoded so it round-trips through Excel/Sheets).

## Documentation

Markdown lives in `docs/md/` (renders inline on GitHub); the same content as styled standalone HTML lives in `docs/html/` for offline browsing.

| Topic | Markdown (GitHub) | HTML (offline) |
|---|---|---|
| Every endpoint, request/response shape, examples | [`api-reference.md`](docs/md/api-reference.md) | [`api-reference.html`](docs/html/api-reference.html) |
| Python-package internals — retry, cost, error taxonomy, mock-real parity | [`backend-guide.md`](docs/md/backend-guide.md) | [`backend-guide.html`](docs/html/backend-guide.html) |
| First-hour integration walkthrough (auth → go-live) | [`playbook.md`](docs/md/playbook.md) | [`playbook.html`](docs/html/playbook.html) |
| PHP / Guzzle / Laravel snippets for common flows | [`recipes.md`](docs/md/recipes.md) | [`recipes.html`](docs/html/recipes.html) |
| Symptom → diagnosis → fix for the top confusion points | [`troubleshooting.md`](docs/md/troubleshooting.md) | [`troubleshooting.html`](docs/html/troubleshooting.html) |
| Systematic-review terminology used throughout the API | [`glossary.md`](docs/md/glossary.md) | [`glossary.html`](docs/html/glossary.html) |

**Companion artifacts:**

- [`data/README.md`](data/README.md) — what's in each fixture CSV under `data/screening/` and `data/indexing/`, with column → API-field mapping
- [`scripts/view_results.html`](scripts/view_results.html) — open locally to browse the JSON dumps produced by `scripts/smoke_ai_endpoints.py`
- Swagger UI at `http://localhost:8005/docs` and ReDoc at `http://localhost:8005/redoc` when the server is running