Crystallise AI Backend — Glossary

Systematic-review terminology used by the API and accompanying docs.

Terms

Terms

AutoIndexer

Structured field extraction from a paper's title and abstract, driven by an OpenAI function call that returns a typed JSON object. Each extracted field carries an evidence span quoting the source text and a confidence score so a human reviewer can audit the result.

Clustering (screening)

Stage 3 of the 4-stage screening pipeline. It groups the free-text reasoning produced in stage 2 into thematic buckets so a reviewer can see why papers were scored similarly rather than only seeing the scores.

Confidence (extraction)

The model's self-reported 0–1 score attached to each extracted field value by AutoIndexer. Treat it as a signal for human review, not as a calibrated probability — two fields with equal "0.85" confidence may not be equally reliable.

Criteria (eligibility)

The include/exclude rules a screener applies to decide whether a study belongs in the review. Split into inclusion criteria and exclusion criteria, usually derived from the project's PICOS.

Evidence span

A verbatim quote taken from the title or abstract that supported an extracted value. Returned alongside each field by AutoIndexer so reviewers can verify the extraction without re-reading the source.

Exclusion criterion

A rule that disqualifies a study from the review, for example "non-English language" or "animal model only". A single matched exclusion criterion is enough to drop a paper regardless of how well it matches the inclusion criteria.

Gap flag (PICOS)

An element of PICOS that is missing or ambiguous in the project description, returned by POST /criteria/picos. Gap flags tell the caller which dimensions still need user input before the question is ready to drive a literature search.

Inclusion criterion

A rule that a study must satisfy to be considered in scope, for example "adult human participants" or "reports mortality outcomes". A study typically has to meet every inclusion criterion and fail every exclusion criterion.

Labelling (screening)

Stage 1 of the screening pipeline. Each paper is scored on a 1–5 relevance scale across N repetitions, and the per-paper mean_score becomes the primary signal used by downstream stages and by the include threshold.

Mock mode

Setting "mock": true in a request body returns a canned response shaped like the real one, without calling OpenAI. Useful for wiring up the integration, writing tests, or reproducing bugs without spending tokens.

PICO / PICOS

The standard framework for specifying a clinical research question: Population, Intervention, Comparator, Outcome, and — in the PICOS variant — Study design. The API uses PICOS throughout: it structures eligibility criteria, drives gap flags, and feeds search-readiness.

Reasoning (screening)

Stage 2 of the screening pipeline. It produces a short human-readable explanation of why a paper received its labelling score, and those explanations are what the clustering stage groups.

Repetitions

The number of independent AI calls made per paper during labelling. More repetitions give a more stable mean_score but cost proportionally more tokens; three to five is typical.

Screening

Title/abstract eligibility assessment — deciding which studies to include in a review from a larger candidate set. In this API it is a 4-stage pipeline: labelling, reasoning, clustering, and final selection against a threshold.

Search-readiness

Whether a research question has enough PICOS specificity to feed a literature search without returning noise. A question that is "search-ready" has concrete terms for each PICOS element; one that isn't will come back with gap flags.

Sensitivity (SR)

In systematic-review terminology, the recall of a search: the proportion of truly relevant studies that the search actually captured. A sensitive search errs toward retrieving too much rather than missing anything.

Specificity (SR)

In systematic-review terminology, the precision of a search: the proportion of retrieved studies that are actually relevant. A specific search errs toward a clean result set at the risk of missing borderline papers.

Stateless compute

This service holds no user data beyond transient job state — inputs come in on the request, outputs go back on the response, and nothing persists after the job completes. The Evidence Mapper application owns all durable storage of projects, papers, and results.

Study design

The methodological type of a study: randomised controlled trial (RCT), cohort, case-control, cross-sectional, case series, systematic review, and so on. It's the "S" in PICOS and is commonly used as an inclusion or exclusion criterion.

Systematic review

A literature review conducted with an explicit, reproducible methodology — pre-specified question, search strategy, eligibility criteria, and extraction protocol. The endpoints in this API are building blocks for that workflow, not a finished review product.

Threshold (screening)

The mean_score cutoff above which a paper is treated as "include" after labelling. The default is 1.0; raising it makes screening stricter (higher specificity), lowering it makes it more permissive (higher sensitivity).

See also: API Reference for how these terms map to HTTP endpoints; Backend Guide for how they're implemented.