Y0 Audio

research

Speech in, structure out — meetings and voice notes as first-class context.

[ 01 ]Spec sheet

statusresearch

context window4 hours of audio per run

latency p500.3× realtime (batch)

Y0 Audio is the research track that turns the spoken layer of work — meetings, calls, voice notes dictated while walking — into context the runtime can actually use. Transcription is the table stakes; the research problem is everything after it. Who committed to what, by when? Which of the four people talking is the client, and which decision reversed the one made last Tuesday? Y0 Audio produces diarized, speaker-attributed transcripts and then a second structured pass: decisions, action items, open questions, each linked to the moment in the recording where it happened and written into the context graph alongside your documents and calendar. The trust posture is deliberately strict because audio is intimate data: recordings are processed within your project boundary, never used for training, and the kernel logs every access to a transcript the way it logs every document fetch. The family is in research status — quality on clean single-speaker audio is strong, multi-speaker meetings in noisy rooms are an open frontier, and access is limited to research-program partners who can tolerate model revisions between releases. It graduates to preview when diarization accuracy clears the internal bar published on the benchmarks page.

[ 02 ]Capabilities

Diarized transcription with speaker attribution across 4-hour sessions

Decision and action-item extraction linked to timestamps

Voice-note capture that files structure into the context graph

Cross-meeting recall — query what was said across recordings

[ 03 ]Best for

Meeting-heavy teams that lose decisions between calls

Founders and operators who think out loud into voice notes

Research partners building communication-lane prototypes

[ 04 ]Sample request

[ request ]application/json

{
  "model": "y0-audio",
  "prompt": "Transcribe, diarize, and extract action items.",
  "files": ["file_standup_0611.m4a"],
  "output_schema": { "type": "meeting_minutes" },
  "max_steps": 4
}

Models API reference

Keep exploring

[next]Y0 Reasoningfamily 04

[prev]Y0 Visionfamily 02