Y0 Vision

preview

Documents, screens, and images become queryable context.

[ 01 ]Spec sheet

statuspreview

context window64k tokens + 20 images

latency p50420 ms first token

price$1.20 in / $3.20 out

Y0 Vision exists because most of the context that matters in real work was never born as clean text. It arrives as a scanned invoice, a whiteboard photo, a dashboard screenshot, a signed PDF with handwriting in the margins. Y0 Vision reads those artifacts and writes them into the context graph as structured, queryable items — line items with amounts, charts with their underlying trends, forms with their fields filled. The model is layout-aware rather than merely OCR-accurate: it understands that the number at the bottom right of an invoice is a total, that the second column of a bank statement is a debit, and that the red line on a chart is the series the caption names. Vision runs flow through the same trust kernel as everything else, which matters more here than anywhere — images are the leakiest data type in most organizations, and every extraction is scoped, logged, and deletable. The family is in preview: API access is open to all paid tiers, accuracy on dense tables is still improving release over release, and we publish the current numbers on the benchmarks page rather than promising perfection.

[ 02 ]Capabilities

Document understanding — invoices, statements, contracts, forms

Table and chart extraction into typed, queryable structures

Screenshot reasoning over UIs, dashboards, and error states

Handwriting transcription with confidence scores per span

Visual question answering grounded in the attached image set

[ 03 ]Best for

Finance-lane ingestion — receipts, invoices, statements

Back-office digitization where layout carries meaning

Agent runs that need to read a screen before acting on it

[ 04 ]Sample request

[ request ]application/json

{
  "model": "y0-vision",
  "prompt": "Extract line items and totals from these receipts.",
  "files": ["file_8c21aa04", "file_8c21aa05"],
  "output_schema": { "type": "ledger_rows" },
  "max_steps": 3
}

Models API reference

Keep exploring

[next]Y0 Audiofamily 03

[prev]Y0 Textfamily 01