Crystallise AI Backend API Reference
Stateless AI processing service for Evidence Mapper integration
Base URL: http://localhost:8005 | All routes at /v1/* and /*
This reference covers every HTTP endpoint on the Crystallise AI backend. The service provides three AI capabilities for systematic literature reviews —
AI Screening (scoring studies against eligibility criteria),
AutoIndexer (extracting structured fields from title/abstract), and
Criteria AI (building/refining eligibility criteria and analysing research questions).
Every endpoint is stateless: Evidence Mapper owns all persistent data. Start with Health to verify the service is reachable, then Configuration for service defaults, then the three capability sections.
Service Auth
X-API-Key: <key> or Authorization: Bearer <key> on all non-public requests.
OpenAI Key
X-OpenAI-API-Key: sk-... per-request passthrough. Falls back to server env var.
Async Jobs
Screening and Indexer batches: POST /jobs, poll GET /jobs/{id}.
Sync Jobs
Criteria AI and POST /indexer/run: one request, one response. No polling needed.
Mock Mode
Every mutating endpoint accepts "mock": true — canned data, no OpenAI call, no key needed.
Health
Public endpoints for uptime monitoring. /health is a liveness probe (is the process running?); /health/ready also verifies database connectivity and OpenAI-key presence. Neither requires authentication.
Returns { "status": "ok" }. No checks — "am I running?" only.
Response codes: 200
Response 200 — healthy
{"status": "ready", "checks": {"database": "ok", "openai_key": "configured"}}
Response 503 — degraded
{"status": "degraded", "checks": {"database": "error: connection refused", "openai_key": "missing"}}
Response codes: 200, 503
Configuration
Inspect and (carefully) update runtime model/temperature/prompt settings per AI service. Reads are listings of service configs and the centralised prompt registry; the single PUT is for admin-level tuning. Everything here is metadata — no user data flows through.
Response 200 — array of ServiceConfigResponse
| Field | Type | Description |
| service_id | string | e.g. screening, extraction, criteria |
| model | string | |
| system_prompt | string | Current prompt (may be inherited from registry) |
| prompt_template_id | string | |
| temperature | float | |
| max_output_tokens | integer | |
| extra | dict | Service-specific knobs |
Response codes: 200, 401
Returns a single ServiceConfigResponse. Unknown service_id returns a default config rather than 404.
Response codes: 200, 401
Request Body — all fields optional, patch-style
| Field | Type | Required |
| model | string | No |
| system_prompt | string | No |
| prompt_template_id | string | No |
| temperature | float | No |
| max_output_tokens | integer | No |
| extra | dict | No |
Response 200
The updated ServiceConfigResponse (same fields as GET).
Example
curl -s -X PUT http://localhost:8005/v1/config/services/screening \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"model": "gpt-5-mini", "temperature": 0.2}'
{
"service_id": "screening",
"model": "gpt-5-mini",
"temperature": 0.2,
"max_output_tokens": 8192,
...
}
Response codes: 200, 400, 401, 500
Response 200 — array of prompt metadata
| Field | Type | Description |
| name | string | e.g. criteria.question_analysis |
| service | string | screening, criteria, indexer |
| description | string | One-line purpose |
| has_variables | boolean | Whether the prompt has templated parameters |
| system_or_user | string | system, user, or both |
Response codes: 200, 401
AI Screening
Title/abstract screening pipeline in four stages — scoring each paper against eligibility criteria across multiple AI "repetitions", generating human-readable reasoning, grouping that reasoning into thematic clusters, and assigning each paper to a cluster. Designed for batch runs: submit a job, poll for results. Typical throughput: hundreds to low-thousands of papers per run.
Core
Request Body
| Field | Type | Required |
| papers | dict[] | Yes — each with id, title, abstract |
| criteria | dict[] | No |
| questions | string[] | No |
| model | string | No (default: gpt-5-nano) |
| repetitions | integer | No (default: 5) |
| threshold | float | No (default: 1.0) |
| clusters_type | "include" | "exclude" | No |
| project_id | integer | No — opaque correlation key from EM |
| mock | boolean | No — deterministic scoring, no OpenAI call |
| max_estimated_cost_usd | float | No — job rejected with 400 if estimate exceeds this |
Response 200
| Field | Type | Description |
| job_id | string | UUID for polling |
| status | string | "pending" initially |
| progress | float | 0.0 initially; 0–1 once running |
| stage | string | Empty initially; "labelling", "reasoning", "clustering", "assignment" while running |
Example (success)
curl -s -X POST http://localhost:8005/v1/screening/jobs \
-H "X-API-Key: dev-key" \
-H "Content-Type: application/json" \
-d '{
"papers": [
{"id": "p1", "title": "RCT of drug X in adults", "abstract": "Randomized trial..."}
],
"criteria": [{"name": "Population", "type": "include", "value": "Adults"}],
"questions": ["Is drug X effective?"],
"mock": true
}'
{"job_id": "abc-123", "status": "pending", "progress": 0.0, "stage": ""}
Example (error — cost cap exceeded)
{"detail": "Estimated cost $1.23 exceeds max_estimated_cost_usd=$0.50"}
4-stage pipeline: labelling → reasoning → clustering → assignment. Poll GET /screening/jobs/{job_id} for progress.
Response codes: 200, 400, 401, 500
Response 200
| Field | Type | Description |
| job_id | string | |
| status | string | "pending", "running", "completed", "failed" |
| progress | float | 0–1 |
| stage | string | Current pipeline stage |
| results | dict[] | Per-paper scores + reasoning (when completed) |
| clusters | dict[] | Reason clusters (when completed) |
| error | string | Error message (when failed) |
| error_category | string | Same taxonomy as error_code |
| error_retryable | boolean | |
| duration_ms | integer | Wall-clock duration |
| estimated_cost_usd | float | Final cost (approximate) |
| stage_timings | dict | Per-stage duration in ms |
Example
curl -s http://localhost:8005/v1/screening/jobs/abc-123 -H "X-API-Key: dev-key"
{
"job_id": "abc-123",
"status": "completed",
"progress": 1.0,
"stage": "assignment",
"results": [{"id": "p1", "final_score": 4.2, "cluster_id": 1, "reasoning": "..."}],
"clusters": [{"cluster_id": 1, "label": "Eligible RCTs", "count": 1}],
"duration_ms": 3421,
"estimated_cost_usd": 0.002
}
Response codes: 200, 401, 404 (unknown job_id)
Query Parameters
| Param | Type | Default |
| limit | integer | 50 |
Response 200 — array of ScreeningJobListItem
| Field | Type | Description |
| job_id | string | |
| status | string | |
| progress | float | |
| stage | string | |
| papers_count | integer | |
| model | string | |
| project_id | integer | From EM, if provided at create |
| duration_ms | integer | |
| estimated_cost_usd | float | |
| created_at | string | ISO 8601 timestamp |
| completed_at | string | ISO 8601 timestamp (when completed) |
Response codes: 200, 401
Query Parameters
| Param | Type | Required |
| project_id | integer | Yes |
Returns the running/pending job for this project_id, or null. Use to dedupe: EM should not start a new job while one is running for the same project.
Response codes: 200, 401
Optional
Request Body
| Field | Type | Required |
| papers_count | integer | Yes |
| model | string | No (default: gpt-5-nano) |
| repetitions | integer | No (default: 5) |
| criteria_count | integer | No (default: 0) |
Response 200
| Field | Type | Description |
| estimated_input_tokens | integer | Sum across labelling + reasoning stages |
| estimated_output_tokens | integer | |
| estimated_cost_usd | float | Based on hardcoded model pricing |
| model | string | Echoes request |
| papers_count | integer | Echoes request |
| repetitions | integer | Echoes request |
| confidence | string | "approximate" |
| disclaimer | string | Expected variance (±30%) |
Example
curl -s -X POST http://localhost:8005/v1/screening/estimate \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"model": "gpt-5-nano", "papers_count": 500, "repetitions": 5, "criteria_count": 10}'
{
"estimated_input_tokens": 775000,
"estimated_output_tokens": 75000,
"estimated_cost_usd": 0.0625,
"model": "gpt-5-nano",
"papers_count": 500,
"repetitions": 5,
"confidence": "approximate",
"disclaimer": "Estimate based on empirical averages. Actual cost may vary +-30%..."
}
Caveat: pricing is hardcoded in crystallise.llm.cost.DEFAULT_PRICING_PER_1M and may drift from OpenAI's public pricing over time. Treat estimated_cost_usd as approximate and cross-check against OpenAI's current rates before relying on it for budget caps.
Response codes: 200, 400, 401, 500
AutoIndexer
Structured data extraction from title + abstract. Define your extraction fields (by hand, or via the optional AI-suggest / AI-refine helpers), submit a batch, and receive per-paper values with evidence spans (the quote that justified each value) and per-field confidence scores. Use POST /run for small batches synchronously or POST /jobs for larger batches asynchronously.
Core
Request Body
| Field | Type | Required |
| records | dict[] | Yes — each with ID, Title, Abstract |
| fields | IndexerField[] | Yes — see Shared Types |
| model | string | No (default: gpt-5-mini) |
| project_context | ProjectContext | No — {description, research_questions} |
| mode | "test" | "sample" | "full" | No (default: full). test processes first 5 records, sample 20, full all. |
| max_workers | integer | No (default: 4) |
| batch_size | integer | No (default: 50) |
| project_id | integer | No — opaque correlation key |
Response 200
| Field | Type | Description |
| results | dict[] | One per record with extracted field values + evidence + confidence |
| errors | string[] | Per-record error messages |
| usage | dict | Token usage: {input_tokens, output_tokens, total_tokens, estimated_cost_usd} |
| model_version | string | Actual OpenAI model string returned (may include date suffix) |
Example (success)
curl -s -X POST http://localhost:8005/v1/indexer/run \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{
"records": [{"ID": "p1", "Title": "RCT of drug X", "Abstract": "150 adults..."}],
"fields": [
{"name": "study_design", "description": "Type of study", "data_type_primary": "text"},
{"name": "sample_size", "description": "Number of participants", "data_type_primary": "number"}
],
"mode": "test"
}'
{
"results": [{
"ID": "p1",
"indexing_status": "ok",
"study_design": {"value": "RCT", "confidence": 0.95, "evidence": [...]},
"sample_size": {"value": 150, "confidence": 0.9, "evidence": [...]}
}],
"errors": [],
"usage": {"input_tokens": 320, "output_tokens": 85, "total_tokens": 405, "estimated_cost_usd": 0.0002},
"model_version": "gpt-5-mini-2025-02-01"
}
Example (error — invalid field type)
{"detail": {"message": "...", "error_code": "validation", "retryable": false}}
Response codes: 200, 400, 401, 429, 500
Request Body
Identical to POST /v1/indexer/run. Summarised here for completeness.
| Field | Type | Required |
| records | dict[] | Yes — each with ID, Title, Abstract |
| fields | IndexerField[] | Yes |
| model | string | No (default: gpt-5-mini) |
| project_context | ProjectContext | No |
| mode | "test" | "sample" | "full" | No (default: full) |
| project_id | integer | No |
Response 200
| Field | Type | Description |
| job_id | string | UUID for polling |
| status | string | "pending" initially |
| progress | float | 0.0 initially |
Example
curl -s -X POST http://localhost:8005/v1/indexer/jobs \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"records": [...], "fields": [...], "mode": "full"}'
{"job_id": "xyz-789", "status": "pending", "progress": 0.0}
Response codes: 200, 400, 401, 500
Response 200
| Field | Type | Description |
| job_id | string | |
| status | string | "pending", "running", "completed", "failed" |
| progress | float | 0–1 |
| partial_results | dict[] | Records processed so far |
| errors | string[] | |
| usage | dict | Token usage to date |
| error | string | Terminal error message (when failed) |
| error_category | string | |
| error_retryable | boolean | |
| duration_ms | integer | |
| estimated_cost_usd | float | |
| model_version | string | |
| created_at | string | ISO 8601 |
| completed_at | string | ISO 8601 |
Response codes: 200, 401, 404 (unknown job_id)
Query Parameters
| Param | Type | Default |
| limit | integer | 50 |
Response 200 — array of IndexerJobListItem
| Field | Type | Description |
| job_id | string | |
| status | string | |
| progress | float | |
| model | string | |
| record_count | integer | |
| duration_ms | integer | |
| estimated_cost_usd | float | |
| created_at | string | ISO 8601 |
| completed_at | string | ISO 8601 |
Response codes: 200, 401
Query Parameters
| Param | Type | Required |
| project_id | integer | Yes |
Returns the running/pending indexer job for this project_id, or null.
Response codes: 200, 401
Optional
Request Body
| Field | Type | Required |
| fields | IndexerField[] | Yes |
| record_count | integer | Yes |
| model | string | No (default: gpt-5-mini) |
Response 200
| Field | Type | Description |
| estimated_input_tokens | integer | |
| estimated_output_tokens | integer | |
| estimated_cost_usd | float | Based on hardcoded pricing |
| confidence | string | "approximate" |
| disclaimer | string | ±30% expected variance |
Example
curl -s -X POST http://localhost:8005/v1/indexer/estimate \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"fields": [{"name": "study_design", "description": "...", "data_type_primary": "text"}], "record_count": 100}'
{
"estimated_input_tokens": 32000,
"estimated_output_tokens": 8500,
"estimated_cost_usd": 0.0234,
"confidence": "approximate",
"disclaimer": "Estimate based on empirical averages..."
}
Caveat: pricing is hardcoded in crystallise.llm.cost.DEFAULT_PRICING_PER_1M and may drift from OpenAI's public pricing. Treat this as a rough sizing, not a bill.
Response codes: 200, 400, 401, 500
Request Body
| Field | Type | Required |
| project_context | ProjectContext | No — description + research questions |
| pico | dict | No — PICOS elements from /criteria/picos |
| sample_records | dict[] | No — sample papers for grounding |
| existing_fields | string[] | No — field names already defined |
| model | string | No (default: gpt-4.1) |
| mock | boolean | No |
Response 200
| Field | Type | Description |
| fields | IndexerField[] | Suggested extraction fields |
| warnings | ExtractionWarning[] | Per-field risk flags (e.g. low-signal fields) |
Example
curl -s -X POST http://localhost:8005/v1/indexer/suggest-fields \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"project_context": {"description": "RCTs of exercise for depression"}, "mock": true}'
{
"fields": [
{"name": "study_design", "description": "Type of study", "data_type_primary": "text", "examples": ["RCT", "cohort"]},
{"name": "sample_size", "description": "Number of participants", "data_type_primary": "number"}
],
"warnings": []
}
Response codes: 200, 400, 401, 429, 500
Request Body
| Field | Type | Required |
| fields | IndexerField[] | Yes — current field set to review |
| project_context | ProjectContext | No |
| sample_records | dict[] | No — ground suggestions against real papers |
Response 200
| Field | Type | Description |
| suggestions | FieldSuggestion[] | Proposed add, modify, remove, or merge actions |
Example
curl -s -X POST http://localhost:8005/v1/indexer/refine-fields \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"fields": [{"name": "outcome", "description": "...", "data_type_primary": "text"}]}'
{
"suggestions": [
{"action": "modify", "field": {"name": "primary_outcome", ...}, "rationale": "...",
"original_field_name": "outcome"}
]
}
Response codes: 200, 400, 401, 429, 500
Request Body
| Field | Type | Required |
| field_name | string | Yes — field the values belong to |
| values | string[] | Yes — extracted values to cluster |
| project_context | ProjectContext | No |
| num_groups_hint | integer | No |
Response 200
| Field | Type | Description |
| groups | TagGroup[] | Clustered buckets with labels |
| usage | dict | Token usage |
Example
curl -s -X POST http://localhost:8005/v1/indexer/group-tags \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"field_name": "study_design", "values": ["RCT", "randomised controlled trial", "cohort study", "case-control"]}'
{
"groups": [
{"name": "Randomised controlled trials", "values": ["RCT", "randomised controlled trial"], "rationale": "..."},
{"name": "Observational", "values": ["cohort study", "case-control"], "rationale": "..."}
],
"usage": {"total_tokens": 150, "estimated_cost_usd": 0.0002}
}
Response codes: 200, 400, 401, 429, 500
Criteria AI
Helpers for building and refining the eligibility criteria a screening pipeline runs against. The core endpoint /analyze-question checks whether a single research question is PICOS-ready for a literature search; the optional endpoints generate criteria from context, extract PICOS elements, refine project descriptions, or consolidate duplicate criteria. All endpoints are synchronous — one request, one response.
Core
Request Body
| Field | Type | Required |
| research_question | string | Yes |
| model | string | No (default: gpt-5-mini) |
| mock | boolean | No |
Response 200
| Field | Type | Description |
| status | string | "ready" or "could_improve" |
| missing_elements | string[] | PICOS elements that are unclear or absent |
| suggestion | string | Actionable improvement or confirmation message |
Example (success, mock)
curl -s -X POST http://localhost:8005/v1/criteria/analyze-question \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"research_question": "Does exercise help depression?", "mock": true}'
{
"status": "could_improve",
"missing_elements": [
"Population is not specified",
"Outcome measures are vague"
],
"suggestion": "Mock mode: specify the population, intervention, and primary outcome to make the question searchable. Run without mock for real analysis."
}
Example (error — missing required field)
{"detail": [{"type": "missing", "loc": ["body", "research_question"], "msg": "Field required"}]}
Demo lineage: this endpoint mirrors the behaviour of the Streamlit research-question demo (demo.py) shared with NetReady earlier. It's the recommended entry point for the "is this question ready for a literature search?" flow.
Response codes: 200, 400, 401, 429, 500
Optional
Request Body
| Field | Type | Required |
| project_description | string | Yes |
| research_questions | string[] | No |
| additional_notes | string | No |
| existing_criteria | dict[] | No — for deduplication |
| criterion_type | "include" | "exclude" | No (default: exclude) |
| model | string | No (default: gpt-4.1) |
| mock | boolean | No |
Response 200
| Field | Type | Description |
| criteria | CriterionResponse[] | Generated criteria — see Shared Types |
Example (success, mock)
curl -s -X POST http://localhost:8005/v1/criteria/generate \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"project_description": "RCTs of exercise for depression in adults", "mock": true}'
{
"criteria": [
{"category": "Study Design", "text": "Review articles, systematic reviews, meta-analyses", "criterion_type": "exclude", "description": "..."},
{"category": "Publication Type", "text": "Conference abstracts without full publication", "criterion_type": "exclude", "description": "..."}
]
}
Response codes: 200, 400, 401, 429, 500
Request Body
| Field | Type | Required |
| project_description | string | Yes |
| research_questions | string[] | No |
| model | string | No (default: gpt-4.1) |
| mock | boolean | No |
Response 200
| Field | Type | Description |
| elements | dict | Keys: population, intervention, comparison, outcome, study_design |
| gap_flags | string[] | Missing or ambiguous elements |
| contraindications | dict[] | Potential conflicts between elements |
Example (success, mock)
curl -s -X POST http://localhost:8005/v1/criteria/picos \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"project_description": "RCTs of metformin vs placebo in adults with type 2 diabetes", "mock": true}'
{
"elements": {
"population": "Adults with the condition described in the project",
"intervention": "The primary intervention or exposure under review",
"comparison": "Standard of care, placebo, or no intervention",
"outcome": "Primary clinical outcomes, efficacy, and safety measures",
"study_design": "Study designs relevant to the research question"
},
"gap_flags": ["Mock mode: PICOS elements are placeholders — run without mock for real extraction"],
"contraindications": []
}
Response codes: 200, 400, 401, 429, 500
Request Body
| Field | Type | Required |
| description | string | Yes |
| research_questions | string[] | No |
| model | string | No (default: gpt-4.1) |
| mock | boolean | No |
Response 200
| Field | Type | Description |
| refined_description | string | Improved, more specific project description |
| refined_research_questions | string[] | Questions rewritten for search precision |
| explanation | string | Why these refinements were made |
Example (success, mock)
curl -s -X POST http://localhost:8005/v1/criteria/refine-context \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"description": "Review of drug X", "research_questions": ["Is drug X effective?"], "mock": true}'
{
"refined_description": "Review of drug X\n\n[Refined for clarity and specificity in systematic review screening.]",
"refined_research_questions": ["Is drug X effective? [refined for precision]"],
"explanation": "Mock mode: minor refinements applied as placeholders. Run without mock for real AI refinement."
}
Response codes: 200, 400, 401, 429, 500
Request Body
| Field | Type | Required |
| current_criteria | dict[] | Yes — the active criteria set |
| conflicts | dict[] | No — AI-vs-human disagreement records |
| project_description | string | No |
| model | string | No (default: gpt-4.1) |
| mock | boolean | No |
Response 200
| Field | Type | Description |
| criteria | CriterionResponse[] | Refined criteria derived from the conflict patterns |
Example (success, mock)
curl -s -X POST http://localhost:8005/v1/criteria/refine \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{
"current_criteria": [{"category": "Population", "text": "Adults only"}],
"conflicts": [{"paper_title": "Study A", "decision_a": "include", "decision_b": "exclude"}],
"mock": true
}'
{
"criteria": [
{"category": "Study Design", "text": "Exclude retrospective observational studies without a control arm", "criterion_type": "exclude", "confidence": 0.72, "rationale": "Derived from 1 reviewer conflict(s) on study design."},
{"category": "Outcome Reporting", "text": "Exclude studies that do not report the primary outcome quantitatively", "criterion_type": "exclude", "confidence": 0.65, "rationale": "Pattern across 1 conflict(s) flagged insufficient outcome data."}
]
}
Response codes: 200, 400, 401, 429, 500
Request Body
| Field | Type | Required |
| criteria | dict[] | Yes — criteria to analyse |
| project_description | string | No |
| research_questions | string[] | No |
| model | string | No (default: gpt-4.1) |
| mock | boolean | No |
Response 200
| Field | Type | Description |
| duplicate_groups | DuplicateGroup[] | Groups of criteria with overlapping scope — see Shared Types |
| consolidation_proposals | ConsolidationProposal[] | Proposed merged criteria |
| warnings | string[] | Low-confidence rejections or notes |
Example (success, mock)
curl -s -X POST http://localhost:8005/v1/criteria/consolidate \
-H "X-API-Key: dev-key" -H "Content-Type: application/json" \
-d '{"criteria": [{"id": 1, "category": "Population", "text": "Adults 18+"}, {"id": 2, "category": "Population", "text": "Adult participants over 18"}], "mock": true}'
{
"duplicate_groups": [],
"consolidation_proposals": [],
"warnings": ["Mock mode: no consolidation performed"]
}
Response codes: 200, 400, 401, 429, 500
Error Responses
The same status codes and body shapes apply everywhere — read this section once and cross-reference from each endpoint. Classified LLM errors carry a structured error_code and a retryable flag so client code can decide whether to back off, surface the message, or abort. Async jobs additionally report terminal errors inside the job response rather than as HTTP error codes.
| HTTP | error_code | Retryable | When you see it |
| 400 | validation | no | Malformed request body, missing required field, Pydantic validation failed |
| 401 | auth | no | Missing/invalid X-API-Key, or invalid X-OpenAI-API-Key |
| 404 | — | no | Resource not found (e.g. unknown job_id) |
| 429 | rate_limit | yes | OpenAI rate limit — retry with exponential backoff |
| 500 | unknown | — | Unexpected server error |
| 503 | — | — | /health/ready only, when DB or OpenAI key check fails |
Standard body — classified LLM error
{
"detail": {
"message": "Rate limit exceeded",
"error_code": "rate_limit",
"retryable": true
}
}
Standard body — FastAPI validation / missing resource
{ "detail": "Field required: research_question" }
Async job in-body failure (screening, indexer)
{
"job_id": "abc-123",
"status": "failed",
"error": "Invalid OpenAI key",
"error_category": "auth",
"error_retryable": false
}
Async jobs report terminal errors inside the job response (HTTP 200), not as HTTP error codes. Poll the job and check status === "failed" + error_category.
Shared Types
Data types referenced by multiple endpoints. These mirror the Pydantic models in api/schemas/ (source of truth) — documented once here to avoid per-endpoint repetition.
IndexerField
| Field | Type | Description |
| name | string | Field identifier (e.g. study_design) |
| description | string | What the AI should extract |
| data_type_primary | string | text, number, yes_no, list_text, list_number |
| data_type_secondary | string | Sub-type qualifier (default NA) optional |
| examples | string[] | Example values optional |
| examples_mode | "guide" | "enum" | "guide" = suggestions; "enum" = strict list optional |
| depth | "minimal" | "full" | Extraction effort level optional |
ProjectContext
| Field | Type | Description |
| description | string | Free-text project description |
| research_questions | string[] | |
CriterionResponse
| Field | Type | Description |
| category | string | PICOS category (Population, Intervention, Outcome, etc.) |
| text | string | The criterion itself |
| description | string | Expanded definition |
| criterion_type | "include" | "exclude" | |
| confidence | float | 0–1 AI confidence optional |
| rationale | string | Why this criterion was suggested optional |
| title_abstract_assessable | boolean | Whether the criterion can be decided from title/abstract alone |
DuplicateGroup
| Field | Type | Description |
| group_type | string | e.g. "exact", "semantic" |
| category | string | PICOS category these criteria share |
| criterion_ids | integer[] | IDs of criteria in this group |
| recommended_primary_id | integer | Which criterion to keep |
| merge_rationale | string | |
| ai_confidence | float | 0–1; groups below 0.75 are filtered out server-side |
ConsolidationProposal
| Field | Type | Description |
| category | string | |
| criterion_ids | integer[] | Criteria to merge |
| proposed_merged_criterion | string | New label — rejected server-side if > 10 words |
| proposed_description | string | |
| proposed_type | "include" | "exclude" | |
| merge_rationale | string | |
| ai_confidence | float | 0–1; proposals below 0.75 are filtered out server-side |
TagGroup
| Field | Type | Description |
| name | string | Group label |
| values | string[] | Member values |
| rationale | string | Why these cluster together optional |
ExtractionWarning
| Field | Type | Description |
| field | string | Field name the warning applies to |
| risk_level | "low" | "medium" | "high" | Default medium |
| reason | string | Why the field is at risk (ambiguous, hard to extract from title/abstract, etc.) |
| suggested_fallback | string | Recommended mitigation |
FieldSuggestion
| Field | Type | Description |
| action | "add" | "modify" | "remove" | "merge" | |
| field | IndexerField | The proposed (new or revised) field |
| rationale | string | |
| original_field_name | string | For modify/remove/merge — which field this applies to optional |
| target_field_name | string | For merge — the name to merge into optional |