Red-teaming

we attack the runtime before anyone else does

We are a small company, so our red team is not a department — it is a discipline and a calendar. Before any capability that touches user context ships, it gets attacked: prompt injection through documents, scope escalation through chained tasks, and exfiltration through generated output are the standing three. Our honest position is that internal red-teaming at our scale finds the obvious failures and misses the creative ones. That is why the runtime is built so that a successful attack is bounded — a hijacked run can only read what its task scope granted — and why we treat external reports as a gift rather than an embarrassment. We would rather publish a found hole than pretend the surface is smooth.

[ what we actually run ]

[01]

Pre-ship adversarial pass on every context-touching feature

A feature that reads user context cannot promote to production until it has survived a scripted attack suite — injection payloads embedded in documents, calendar entries, and task titles — plus at least one session of unscripted manual attack by someone who did not build it.

[02]

Injection corpus, versioned and growing

Every injection attempt we find — in testing, in reports, in the wild — is added to a versioned corpus that runs in CI against the context pipeline. The corpus never shrinks. A payload that worked once is tested forever.

[03]

Blast-radius accounting

For each attack class we document the worst case under current architecture: what a fully successful attacker reads, writes, or sends. If the worst case crosses a task-scope boundary, that is an architecture bug and it blocks release — even if no concrete exploit exists yet.

[04]

External reports treated as red-team output

Disclosure reports enter the same pipeline as internal findings: reproduced, added to the corpus, blast-radius assessed, and credited. The reporter is told what changed, not just thanked.

[ open questions — honestly ]

  • Our attack suite tests the injections we can imagine. Indirect injection through content the user legitimately asked us to read — a malicious invoice, a poisoned webpage — is an open arms race, and we have no proof our defenses generalize beyond the corpus.
  • At what point does a company our size need genuinely external, paid red-teaming rather than disclosure-driven free-form testing? We suspect the answer is 'before the communication lane opens' but we cannot yet justify it with evidence.