Status page
The public /status surface — what each component reports, what the rollup means, and how external monitors consume it.
4 min read
Tale exposes a public status surface on every instance, at /status (HTML) and /status.json (JSON). Both reflect the same probe: a five-second-cached health check against the three internal backends — application, knowledge base, and web & document services — rolled up into a single operational / degraded / outage verdict. The page is for two readers: the operator who wants a single URL to check before reporting an incident, and the external monitoring agent that polls Tale's public surface.
This page is the wire reference: what each field means, what the values can be, and what the page intentionally doesn't tell you. For per-request error rates or AI-provider availability, the observability stack documented at Operations is the right surface.
Worked example — fetch the status feed
The smallest possible monitor probe is one GET against /status.json:
curl -s https://your-tale-instance.com/status.jsonWhen everything is healthy, the response is:
{
"status": "operational",
"checkedAt": "2026-05-15T13:45:07.123Z",
"components": [
{ "id": "convex", "status": "operational" },
{ "id": "rag", "status": "operational" },
{ "id": "crawler", "status": "operational" }
]
}Both endpoints respond with 200 OK and Cache-Control: public, max-age=5 — even during an outage, so external monitors get a stable response shape rather than a timeout.
The two endpoints
| Endpoint | Use |
|---|---|
/status | Human-readable HTML page. Locale picked from Accept-Language (English, German, French). No JavaScript, no auto-refresh. |
/status.json | Machine-readable feed for external monitors — BetterStack, UptimeRobot, Atlassian Statuspage, Datadog Synthetics, anything else. |
Both endpoints share the same probe (a single in-memory cache fronts both) so the HTML page and the JSON feed cannot drift. They only differ in representation.
Wire shape (/status.json)
| Name | Type | Description |
|---|---|---|
status | string | Rollup verdict: operational (every component up), degraded (some up, some down), outage (all down). |
checkedAt | string | ISO 8601 timestamp of the most recent probe round. |
components | array | Per-component health. The shape and order are stable across versions. |
components[].id | string | Stable component identifier: convex, rag, or crawler. |
components[].status | string | operational or outage. There's no per-component degraded value today. |
The fields are stable across versions: new fields may be added, existing ones won't be renamed or removed. Keyword-based uptime monitors can alert on the case-sensitive substring "status":"outage" and trust that match across upgrades.
What each component covers
The IDs map to subsystems, not to the underlying stack names — a deliberate choice so the public surface stays readable when the stack changes.
| ID | Covers |
|---|---|
convex | The application backend (reads, writes, real-time sync). If this is down, the UI is down. |
rag | The knowledge base — indexing new documents and searching existing ones. |
crawler | Web & document services — site crawls and on-demand URL fetches. |
The rollup is binary at the component level: each subsystem is either reachable and serving (operational) or not (outage). A future per-component degraded value (e.g. latency-based) can land without breaking consumers, because status already accepts the wider OverallStatus vocabulary.
How the probe works
A single probe round fans out three HTTP requests in parallel — one to each backend health endpoint — with a 2-second per-probe timeout. The result is cached for five seconds in process memory, so an unauthenticated /status route can't be turned into a probe amplifier by a hostile caller. Only the HTTP status of each upstream is inspected; response bodies are discarded immediately, so a misbehaving upstream can't push bytes into the public response.
The platform process itself is implicit in the rollup: if /status is responding at all, the platform is reachable. outage therefore means every backend probe failed — which is what users effectively see, since none of the user-facing flows work without at least one of the three.
What's not on the page
/status is a coarse-grained surface — "is the platform reachable" — not a metric-level health view. It doesn't report:
- Per-request error rates. Use the Sentry stack documented at Operations.
- AI-provider availability. The provider's own status page is the authoritative source for that.
- Queue depth, latency histograms, or per-tenant metrics. Those live in the Prometheus endpoints, also covered under Operations.
- Internal-only services. The database, the proxy, the background workers — their failure modes route through one of the three named components anyway, so exposing them separately would add noise without information.
What to scrape
For an external uptime monitor, GET /status.json on the interval that suits the alert window — 1–5 minutes is typical. The response is small (~500 bytes) and the endpoint is unauthenticated; it intentionally doesn't gate behind sign-in so monitors can reach it without provisioning credentials.
For internal alerting that goes deeper than the rollup, scrape the Prometheus endpoints documented at Operations instead. /status is the URL you put in an incident channel; Prometheus is the URL Grafana queries.
Where this fits
The status page is the lightest-weight operator surface — the URL someone hits before reporting an incident, the endpoint a third-party monitor polls. The API counterpart to this page is the rest of API reference; the deeper observability stack for self-hosted operators lives at Operations, and the in-app communication channel for upgrades and known issues is What's new.