Observability

Every run is a trace — latency, cost, quality, and scope decisions in one timeline.

Observability on Mynd starts from an unusual luxury: there is exactly one unit of work, and it is born instrumented. Every run emits a structured trace — the plan, each context fetch with the scope that authorized it, each step with its latency and token spend, every kernel decision, the result — and the observability pillar is that trace made navigable at fleet scale. The dashboard gives you the four views teams actually use. Timelines: any run rendered step by step, with the slow span obvious. Aggregates: p50/p95 latency, cost per run, and step-ceiling utilization sliced by workflow, agent, model, or key. Quality: y0-judge scores from the evals pillar overlaid on the same slices, so 'cheaper but worse' is visible as one chart. And the audit view: every kernel allow/deny, filterable by scope, key, and resource — the view your security review asks for. Everything is queryable by API as well as visible, traces export as structured JSON, and OpenTelemetry-compatible export streams run spans into the observability stack you already run, with your trace IDs propagated via standard headers. Retention follows your plan tier; deletion of a run removes its trace while the kernel's access records remain, by design.

[ 01 ]Key features

Born instrumented

Traces are not sampling or guesswork — every run records its plan, fetches, steps, costs, and kernel decisions by construction.

Cost and latency by slice

p50/p95, spend per run, and ceiling utilization broken down by workflow, agent, model, and key — the bill becomes explainable.

Quality overlay

Eval scores chart on the same axes as cost and latency, so trade-offs are decisions instead of surprises.

OpenTelemetry export

Run spans stream into your existing stack with trace-ID propagation — Mynd becomes one more service in your traces, not a silo.

[ 02 ][ trace query ]

GET /v1/observability/runs
    ?workflow=weekly-brief
    &period=7d
    &metric=p95_latency,cost,judge_score

→ { "p95_ms": 1840, "cost_usd": 0.011,
    "judge_score": 0.90, "runs": 312,
    "ceiling_hits": 4 }

Troubleshooting

Keep exploring

[next]APIpillar 09

[prev]Evalspillar 07