Y0 Benchmarks
availableThe numbers we hold ourselves to, published as measured.
[ 01 ]Spec sheet
This page is not a model — it is the scoreboard. Mynd maintains an internal benchmark suite that every Y0 release must clear before promotion, and we publish the current results here rather than quoting the public leaderboards everyone has learned to discount. The suite is built from the work the platform actually does: long-document faithfulness, schema-exact extraction, plan quality under enforced step ceilings, retrieval precision over realistic private corpora, and end-to-end agent task completion with scope checks on. Two columns matter: y0-fast, the interactive profile most requests use, and y0-deep, the deliberate profile behind reasoning and agent planning. Numbers are re-measured on every release by the evaluation family, judged against frozen rubrics, and the history is kept — when a number moves, the changelog says why. Read the notes column; a benchmark without its caveat is an advertisement.
[ 02 ]Current numbers
| benchmark | metric | y0-fast | y0-deep | notes |
|---|---|---|---|---|
| LongDoc-Faithful | claim accuracy vs. source, 80–120 page docs | 91.2% | 97.4% | Judged by y0-judge against frozen rubric rb_faith_v2; human-calibrated quarterly. |
| SchemaExact | valid-JSON extraction, strict schema match | 96.8% | 98.9% | Invoices, contracts, and forms; a single wrong field fails the whole case. |
| PlanBench-Y0 | plan executes within declared max_steps | 78.5% | 93.1% | Multi-step operational tasks; failure includes both overrun and stall. |
| GraphRecall@5 | retrieval precision over 10k-item private corpora | 88.3% | 88.3% | Retrieval layer is shared; measured with y0-embed-l and hybrid filters. |
| AgentComplete | end-to-end task completion, scopes enforced | 71.9% | 86.2% | Includes correctly halting at approval gates; partial credit not awarded. |
| TraceReplay | byte-identical trace replay across releases | 100% | 100% | A regression here blocks release unconditionally; it has fired twice. |
Keep exploring