Technology doc · architecture + roadmap

Kashi — technology

What's actually running behind Kashi today, and where we see room to improve. Split audience: investor-readable on top, engineer-level detail toward the bottom. Every claim traces to a file path in the live codebase.

2026-04-21 · Companion to business.html and governance · Live product: kashi-lilac.vercel.app

In this document

  1. 01Executive summary
  2. 02The stack (what Kashi runs on)
  3. 03The detector pipeline
  4. 04The 7 detectors, one lane at a time
  5. 05Data model (engineer-level)
  6. 06Security posture (honest snapshot)
  7. 07Gap analysis — where we can improve
  8. 07·5Next 3 builds (critical review)
  9. 08Roadmap by sprint
  10. 09What we will NOT build

Tier 1 · investor-readable (§1–§3)

Conceptual flow, honest capability list, what's running, what matters. Skip straight to §7 if you want the gap analysis first.

Two readers, two paths.

01Executive summary · investor read

Kashi is a deterministic meeting-governance pipeline. It takes meeting transcripts (Zoom webhook live; Teams and Meet ingestion staged), runs them through a stack of structural detectors, and surfaces repeated interaction asymmetries for human review — not as harassment labels, but as contestable signals with explicit uncertainty.

Detectors live
7
3 structural + 4 hybrid text-informed
Cross-meeting wrappers
2
dyadic continuity + baseline drift
Deterministic core
100%
same input → same output · no LLM in production path
TypeScript, strict mode
~3,400 LOC
covers pipeline + data model + routes
Seed eval pass rate
3/3
zero false-positives on healthy control team
Research documents behind design
42
24 business + 18 technical-dev consideration

The one-sentence architecture

A Next.js 16 + TypeScript + Supabase application with a pipeline of deterministic detectors organized into 3 lanes (structural-only · hybrid text-informed · refused), protected by row-level security, with commitments declared in typed enums and a machine-readable detector registry — and several honest gaps we're naming on the way to pilot-ready.

02The stack

LayerTechnologyWhy
Frontend frameworkNext.js 16 (App Router) · React 19 · TypeScript strictServer-rendered pages for SEO + auth-aware demo routes + client-side interactivity where needed
StylingTailwind v4 + shadcn/ui componentsUtility-first; matches the serious/governance aesthetic not playful-SaaS
ChartsRecharts 3.8BarChart with LabelList, LineChart with ReferenceLine · covers the Manager Mirror + Executive Brief visuals
AuthSupabase magic-link OTP via @supabase/ssrNo passwords. Works through corporate email infrastructure. JP-friendly.
DatabaseSupabase Postgres (Tokyo region)Row-level security native. Tenant-scoped. Pinned to ap-northeast-1.
StorageSupabase StorageTranscript files · per-tenant buckets with policy
HostingVercel productionAuto-deploys from main · edge network · honest gap primary compute is US-based (see §6)
Detector pipelinePure TypeScript (no LLM in prod path)Deterministic · auditable · fast · no per-API-call cost at runtime
LLM (seed + reasoning only)Claude Sonnet 4.6 (seed authoring) + Claude Opus 4.7 (reasoning-heavy detector validation)Used only for seed-data authoring and offline detector tuning · not called on production traffic
Transcript parsersVTT · TXT · SRT · CSV · JSONLNormalizes platform-specific formats into Turn[] · src/lib/pipeline/transcript-parser.ts
JP linguistic parsingRegex-based surface grammar (no MeCab dependency)Sufficient for keigo classification on meeting-length transcripts · tokenizer upgrade is a Sprint-2 candidate

03The detector pipeline

Every transcript flows through the same 6-layer pipeline. Every layer is deterministic. Every layer's output is typed. No hidden model calls, no black-box scoring.

[Transcript · Zoom live · Teams / Meet staged] (uploaded via /app/admin/upload) Layer 1: Ingest & normalize VTT/TXT/SRT/CSV/JSONL parsers → canonical Turn[] array src/lib/pipeline/transcript-parser.ts Layer 2: Structural detectors (deterministic, lane 1) · Intrusive interruption (overlap + truncation timing) · Chilling delta (post-trigger participation drop vs own baseline) · Floor-time Gini (speaking-share inequality) src/lib/pipeline/layer1-deterministic.ts Layer 2b: Hybrid text-informed (lane 2, tenant opt-in) · Unanswered-question rate (lexical) · Topic-credit ignored turns (embedding similarity) · Agreement asymmetry 同調圧力 (lexical + directional) · Keigo (敬語) peer-asymmetry (surface-grammar classification) src/lib/pipeline/{topic-credit, agreement-asymmetry, keigo}.ts Layer 3: Meeting-level metrics Gini · asymmetry matrix · directive density · takeover count · reciprocity Layer 4: Longitudinal aggregation Rolling 90-day window · per-speaker OWN baseline · dyadic-continuity test src/lib/pipeline/{aggregate, continuity}.ts Layer 5: Review-worthy event construction Composite scored from directionality × baseline-drop × persistence Layer 6: Role-based presentation RBAC · k-anonymity (declared) · Manager Mirror · Executive Brief · Victim view (planned)

What it means in plain English

Tier 2 · engineer-level (§4 onward)

File paths, algorithms, type signatures, thresholds, RLS. Everything below can be audited against the actual codebase. Investors who want to skip to the gap analysis: jump to §7.

04The 7 detectors, one lane at a time

Each detector declares its lane at compile time in src/lib/pipeline/detector-registry.ts. Employer-facing surfaces default to lane 1 (structural-only). Lane 2 requires tenant opt-in via semantic_lane_enabled feature flag.

Auto-ingest architecture (design target).

Kashi does not record meetings or upload transcripts manually in production. The detectors below are designed to be fed by platform webhooks when the platform finishes transcribing. Current maturity is asymmetric: Zoom webhook is wired and live; Microsoft Teams and Google Meet integrations are staged.

  • Zoom: Marketplace app (Server-to-Server OAuth) → webhook recording.transcript_completedPOST /api/webhooks/zoom (HMAC-SHA256 verify via x-zm-signature, 5-minute replay window, URL-validation handshake implemented). Requires org-level "Cloud Recording + Audio Transcript" toggle on.
  • Microsoft Teams: Azure AD app + app permission OnlineMeetingTranscript.Read.All → Graph change-notification subscription → POST /api/webhooks/teams. Requires Teams Premium OR tenant-enforced AllowTranscription policy.
  • Google Meet: Workspace OAuth + domain-wide delegation → Workspace Events API + Cloud Pub/Sub push → POST /api/webhooks/meet. Transcript entries arrive structured; no VTT parsing needed.

All three paths normalize through src/lib/pipeline/transcript-parser.tsTurn[] → detector pipeline below. The /demo/ingest surface exposes the same normalizer + detectors to a stateless paste for hands-on demos. Production stores structural metrics only; transcript body text is discarded after detector run.

Lane 1 — structural-only (3 detectors, default employer-facing)

1. Intrusive-interruption

Detects when speaker B starts while speaker A is still speaking AND A's turn ends within a threshold.

// src/lib/pipeline/layer1-deterministic.ts
const OVERLAP_THRESHOLD_MS = 500;      // overlap window
const TRUNCATION_WINDOW_MS = 500;   // A's turn must end within this

export function detectIntrusiveInterruptions(turns: Turn[]): IntrusiveInterruption[] {
  // O(n) scan: for each adjacent pair (B, A+1), check if B.startMs < A.endMs
  // AND A.endMs - B.startMs < OVERLAP_THRESHOLD_MS
  // AND A's turn ends within TRUNCATION_WINDOW_MS → count it
}

Research anchor: Anderson & Leaper 1998 (meta-analysis, 43 studies, d=0.33).

Known confounds: facilitator/chair role · incident-bridge meeting type · overlap-heavy audio. Handled by caveat surface on the Mirror UI; should be handled by meetingType gating (§7 gap).

2. Chilling-delta

Per-speaker participation drop in a 5-minute window after a trigger turn, compared to that speaker's own rolling baseline (not team avg).

const CHILLING_DELTA_WINDOW_MS = 300000;  // 5 min
const CHILLING_DELTA_THRESHOLD = 0.4;    // 40% drop
const COLD_START_MIN_MEETINGS = 5;

Research anchor: Morrison 2014 (organizational silence); Detert & Burris 2007.

Cold-start rule: no baseline signal emitted if the speaker has fewer than 5 prior meetings. Today this rule lives in perSpeakerBaseline() at layer1-deterministic.ts:118 — it returns null when comparable meetings < 5. See §7 gap: should be lifted to global abstention doctrine.

3. Floor-time Gini

Standard Gini coefficient over speaker durations within a meeting. Gini 0 = perfectly equal, 1 = one person dominates.

Research anchor: Schmid Mast 2002 (Human Communication Research meta-analysis).

Lane 2 — hybrid text-informed (4 detectors, tenant opt-in required)

Why these are lane 2, not lane 1. All four of these detectors read transcript text — either via lexical pattern matching, embedding similarity, or surface-grammar classification. The v1 deck's blanket "metadata only, no content" claim was falsified by the existence of these detectors. The registry makes the lane explicit and gates them behind a tenant feature flag that defaults off.

4. Unanswered-question rate

A turn ending in lexical question markers (「か?」/「?」/「かな」) that receives no substantive response within N turns. Short acknowledgment responses (<2s or lexical "yes/なるほど/そうですね") don't count as a response for this detector.

Research anchor: Stivers et al. 2009 (PNAS cross-linguistic turn-taking).

5. Topic-credit ignored-turns

A substantive turn (>3s) from speaker A, followed either by (a) silence ≥ 2× the meeting's median inter-turn gap, or (b) a topic redirect by speaker B that captures credit for a similar proposal later in the meeting. Similarity is computed via embedding distance.

Research anchor: Sacks / Schegloff / Jefferson 1974; maps to MHLW パワハラ 類型 3 + 5.

6. Agreement asymmetry (同調圧力)

Directional rate at which position-statements shift toward a specific speaker. Position-statements detected via lexical cues; shift detected by comparing turn-order positional claims.

7. Keigo (敬語) peer-addressee asymmetry

Per-speaker-per-addressee politeness-register score from surface grammar. Classifies each turn into sonkeigo / kenjougo / teineigo / plain / imperative / mixed. Detects when the same speaker uses meaningfully lower register toward one addressee than toward peers.

// src/lib/pipeline/keigo.ts — excerpt
const SONKEIGO_PATTERNS = [/いらっしゃ/, /おっしゃ/, /なさる/, /お[一-龯]+になる/];
const PLAIN_PATTERNS = [/だよ/, /だね/, /でしょ/, /よね/, /じゃん/];

const PEER_ASYMMETRY_THRESHOLD = 0.25;
const MIN_TURNS_PER_TARGET = 2;

On our Kimura/Nao seed: Kimura scores 0.88 toward Nakamura (honorific) vs 0.38 toward Nao (plain form). Gap = 0.50. The largest gap in our control team (Sato's): 0.0 — zero asymmetries.

Research anchor: Cook 2011 (J. Pragmatics 43(15)); Saito 2011 (J. Pragmatics 43(6)); Pizziconi 2003; Ide wakimae framework. Unique to Kashi. No Western product has this.

Cross-meeting wrappers (in continuity.ts)

Aggregation thresholds (live in aggregate.ts)

const WINDOW_DAYS = 90;
const CONCERN_DIRECTIONALITY = 3.0;  // 3× peer rate → concern
const CONCERN_BASELINE_DROP = 0.4;   // 40% drop → concern
const WATCH_DIRECTIONALITY = 2.0;
const WATCH_BASELINE_DROP = 0.2;
const COST_PER_CASE_MIN_YEN = 3_000_000;
const COST_PER_CASE_MAX_YEN = 8_000_000;

Known simplification: These thresholds are universal. The meeting-type-normalization research says they should be per-meeting-type. Fix in the §7 gap list.

05Data model

Core types at src/lib/types.ts:

// The atomic unit of input.
export type Turn = {
  speakerId: string;
  startMs: number;
  endMs: number;
  text: string;
};

export type Meeting = {
  id: string;
  dateIso: string;
  teamId: string;
  title: string;
  turns: Turn[];
  // Declared (not yet threaded through detectors):
  meetingType?: MeetingType;
  meetingTypeConfidence?: number;
  scoringMode?: ScoringMode;
};

// New types committed 2026-04-21 · awaiting end-to-end threading:
export type DetectorClass =
  | "STRUCTURAL_ONLY"
  | "TEXT_DERIVED_DETERMINISTIC"
  | "HYBRID_TEXT_INFORMED"
  | "GENERATIVE_ASSIST"
  | "REFUSED";

export type EvidenceGrade =
  | "BLOCKED"           // input quality below gate
  | "INSUFFICIENT"      // thin exposure
  | "WEAK"
  | "EMERGING"
  | "STABLE"
  | "HIGH_CONFIDENCE_STABLE";

export type AbstentionState =
  | "NO_COMPUTE"
  | "COMPUTE_NO_INTERPRETATION"
  | "WATCH_ONLY"
  | "INTERPRETABLE_PRIVATE_ONLY"
  | "INTERPRETABLE_ROLE_BOUNDED";

export type ConfidenceBundle = {
  inputQuality: number;
  contextSupport: number;
  exposureSupport: number;
  detectorConfidence: number;
  aggregationSupport: number;
  reasonCodes: ReasonCode[];
  abstention: AbstentionState;
  grade: EvidenceGrade;
};

Database schemas at src/lib/db/types.ts — multi-tenant with org_id scoped RLS on every table. Raw transcript text is stripped before DB write; only length_chars is persisted.

Supabase schema (abbreviated)

orgs(id, name, created_at)
profiles(id, org_id, email, role, created_at)
  -- role enum: admin | ceo | member (see §7 gap: needs 6)
teams(id, org_id, name)
meetings(id, org_id, team_id, title, date_iso, turns_metadata jsonb)
  -- turns_metadata = TurnMetadata[] with text STRIPPED
meeting_metrics(meeting_id, speaking_share jsonb, intrusive_interruptions jsonb, ...)
manager_mirrors(manager_profile_id, week_ending_iso, ...)
pattern_summaries(manager_profile_id, pattern_intensity enum, ...)
user_keys(user_id, public_key_jwk jsonb)  -- evidence vault (planned)
evidence_vault(id, user_id, event_id, encrypted_snippet, encrypted_data_key, iv)

Migrations applied

06Security posture (honest snapshot)

What we've actually built vs what we've only declared. Naming the gap is the first step to closing it.

GuaranteeBuilt in code?Declared in docs?Gap (if any)
Tenant isolation via RLS YES YES No automated RLS isolation tests in CI yet — fix in Sprint 1
Magic-link OTP auth YES YES
Raw transcript stripping (length only stored) YES YES
Japan data residency (Supabase Tokyo) PARTIAL YES Vercel primary-compute region is US. Regulated content flowing through Vercel runtime contradicts the claim. Fix in Sprint 1.
ConfidenceBundle on every event YES (v3) YES Emitted end-to-end as of 2026-04-21. Headline EvidenceGrade + AbstentionState + ReasonCode[] visible on every Manager Mirror + Executive Brief event. Per-detector bundles remain Sprint-1.
k-anonymity (k≥5) on aggregates DECLARED YES Enforced at UI layer today, not at query layer. Sprint 2: move enforcement into SECURITY DEFINER RPCs.
Differential privacy (ε≤1) on exec dashboards NOT YET YES Math declared in governance page; not implemented. Sprint 3.
Audit log on every drill-down PARTIAL YES Individual drill-downs logged; app-layer audit event schema with reason codes + SIEM streaming is Sprint 2.
4-tier retention (14d / 24mo / 12mo / legal-hold) PARTIAL YES Retention bands defined in schema, purge jobs exist for raw layer. Tombstones + restore-reconciliation: Sprint 3.
No admin-level content access YES YES Architecturally: Kashi staff do not have routine access to customer transcripts. Break-glass procedures are documented and audit-logged; tabletop exercise Sprint 2.
SOC 2 Type II / ISO 27001 PRE-CERT honestly disclosed Year-1 target. Architecture ready for audit; controls documentation is the gap.
The Japan data-residency gap is the single most urgent fix before any Enterprise-tier JP sale. Solvable without rearchitecture: pin Vercel function regions to Tokyo for sensitive routes, ensure no transcript or vault content flows through Vercel logs/previews, update subprocessor map to disclose Vercel-US honestly.

07Gap analysis — where we can improve

This section is the honest answer to "what's not yet done?" All items below are surfaced directly from the Ideas_wave3 technical-dev research library (17 memos) + the business research memos' technical implications. Every item has a path to code.

P0 Critical — pilot-blockers

1. ConfidenceBundle · HEADLINE SHIPPED (2026-04-21)

Backward-compatible wrapper shipped at src/lib/pipeline/confidence-bundle.ts. Every ManagerMirrorData + PatternSummary now carries overallGrade + overallAbstention + overallReasonCodes. UI renders grade badge with hover-tooltip rationale on /demo/mirror + /demo/ceo.

Remaining Sprint-1 scope: refactor each of the 7 detectors to emit their OWN bundle natively (not just at the aggregation boundary). The UX commitment is landed; the per-detector audit trail is next.

1b. Per-detector ConfidenceBundle emission (Sprint-1)

Currently the bundle is computed at aggregate.ts boundary. Research wants each detector to emit its own bundle so reviewers can see which detector is weak and why. Full signature refactor of 7 detectors + threading through aggregate + eval harness re-verification. ~1 week.

2. Meeting-type normalization missing

Universal thresholds (CONCERN_DIRECTIONALITY=3.0) pool across all meeting types. A weekly sync and an incident bridge and a 1:1 and a training session are treated identically. The meeting-type-normalization research says this is the largest red-team attack surface.

Fix: add meetingType to Meeting (declared, not used); gate detector execution on scoringMode; block cross-type baseline pooling in aggregate.ts. ~2h.

3. Role enum too narrow (3 values, research requires 6)

Current: admin | ceo | member. Research requires: Individual, Manager, HR/Compliance, Executive, Restricted Investigator, System Admin. Current enum structurally cannot satisfy the §3 visibility matrix in the role-and-visibility architecture memo.

Fix: migration 0004 · expand profiles.role enum · update RLS policies · split admin into support-admin vs behavioral-data investigator. ~3h including RLS policy tests.

4. Contestability state machine doesn't exist

Research (legal-procedural-fairness memo §5) is explicit: challenge / dispute / correction workflow is the single biggest product gap. Without it, every review-worthy event is uncontestable — which breaks the entire fairness story and fails EU AI Act Annex III §4 meaningful-human-review requirement.

Fix: 8 new tables (disputable_object, dispute_ticket, dispute_evidence, review_decision, correction_patch, recompute_job, aggregation_exclusion, access_history) · dispute API endpoints · server-enforced DRAFT privacy. Sprint 2, ~2 weeks.

5. Japan data residency gap · CLOSED (PR #1, 2026-04-25)

Vercel functions pinned to hnd1 (Tokyo) via vercel.json. Verified post-deploy: production responses now return x-vercel-id: hnd1::.... Three-week soak (2026-04-25 → 2026-05-16) showed no region drift. Marketing claim is now accurate.

6. RLS isolation tests in CI · CLOSED (PR #1, 2026-04-25)

RLS isolation suite added at test/rls/ running against a real Postgres + Supabase auth shim in CI. Mutation-test ritual documented in each test file. Migration 0012_rls_helper_security_definer.sql shipped alongside, fixing a latent RLS-helper recursion bug discovered during the test build-out. Now gating every PR.

P1 High priority — next sprint

7. Semantic-detector quarantine in code (not just registry)

The registry declares which detectors are lane-2 hybrid. But the employer-facing output endpoint doesn't filter by detectorsAllowedForTenant(flags). Today the tenant flag is decorative.

Fix: wrap detector invocation in aggregate.ts with detectorsAllowedForTenant(). Employer-facing default = lane-1-only. Semantic lane requires explicit per-tenant flag flip in orgs.feature_flags jsonb. ~4h.

8. Speaker-ID provenance chain

Today Turn.speakerId is a flat string. Research wants 5-layer provenance: UtteranceSegment → DiarizationCluster → MeetingParticipantInstance → CanonicalPerson, with status enum (RESOLVED / UNKNOWN / WRONG / SPLIT_SUSPECTED / MERGE_SUSPECTED / OVERLAP_AMBIGUOUS).

Fix: refactor Turn type + migration + identity-mapping tables with versioning. Metric-eligibility gate suppresses person-level output when unknown_speaker_duration > 15%. ~1 week.

9. Telemetry partitioning (anti-retaliation)

Employer must not be able to infer that an employee opened their pattern page, created a vault, marked a confound, or filed a dispute. Today these events could in theory appear in business-analytics. Research (retaliation-risk memo) flags this as directly actionable under MHLW retaliation-prohibition.

Fix: split telemetry namespace · protected-route events go to a separate store (not exposed to tenant BI) · small-team (<5 user) inference suppression · batching + delay on employee-facing events. ~1 week.

10. Per-meeting influence cap + leave-one-out fragility

Current aggregation: any single meeting can dominate the 90-day signal. A noisy meeting creates a fake trend. Research wants: cap single-meeting contribution at 20% of weighted evidence; run leave-one-out to confirm signal survives without any single meeting.

Fix: weight normalization in aggregate.ts. Degrade evidence grade if signal disappears leave-one-out. ~3 days.

11. 7-gate input-quality pipeline

Research wants every meeting to pass 7 sequential quality gates BEFORE any detector runs: substrate presence · parser integrity · speaker attribution · transcript text · language regime · meeting type · sample sufficiency. Gates produce a detector_eligibility_map telling each detector if it's allowed to run.

Fix: new InputQualityGate layer before Layer 2. Meetings failing Gate N get scoringMode="observation_only". ~3-4 days.

12. Vault metadata suppression

E2E evidence vault is planned (ciphertext employer cannot decrypt). But the existence of a vault, its snippet count, last-activity timestamp, and draft state still leak via database metadata. Research: metadata leakage is as dangerous as content leakage.

Fix: vault tables in a separate Postgres schema with no cross-schema joins from tenant BI · admin access requires break-glass · metadata counts suppressed on exports. ~1 week.

P2 Medium — this quarter

13. Adaptation-watch layer (anti-gaming)

Once managers know the detector surface, some will game it (channel-shift dominance into async/1:1). Research: flag suspiciously clean metric improvement as a signal, not success. Multi-metric corroboration required before declaring an "improvement."

14. Multi-metric corroboration rule

No single-metric victory claims. An "improvement" requires ≥2 detector families moving in the right direction + no adjacent metric worsening. Enforce in aggregation layer before event emission.

15. Remediation-quality + human-recovery measurement

Business plan §11 declares the 6-layer success model. Layers 5 (remediation quality) and 6 (human recovery) aren't measured in code yet. Requires post-Lane-B fairness-rating capture from the affected employee + speaking-share-recovery tracking at 30/60/90 days.

16. Dispute API endpoints

The contestability state machine (§P0) needs paired API: POST /disputes · POST /disputes/:id/submit|triage|resolve · GET /objects/:id/explanation · GET /access-history/:object_id. Build once the state machine lands.

17. 3-lane accountability in code (currently only in docs)

The Lane-A (private self-correction) → Lane-B (governed remediation) → Lane-C (formal review) model is declared on governance page. The automatic transition trigger (pattern persists after notice → escalates to Lane B) is not implemented. Today it's a policy, not a product feature.

Strategic 6-12 month horizon

18. NAQ-R outcome validation study

Partnership with a Japanese workplace-harassment research lab. 24-month validation: do Kashi's structural signals predict NAQ-R self-report outcomes at 6-month follow-up in treatment vs matched controls? Publishable result. Cohort of 5 JP companies.

19. Cross-platform ASR matrix

Platform × language × feature support matrix. Teams vs Meet vs Zoom, each with different transcription quality. Japanese-specific: the 46.8% single-channel tcpWER baseline is dangerous. Overlap flags, L2 caution surfaces, per-platform eligibility maps.

20. Multilingual locale packs

Universal detector core + per-locale calibration layers. Locale registry: country × language × mixed-language flag × platform × transcript-confidence band × calibration pack × legal pack × UX copy pack. Enables Singapore / NL / UK expansion.

21. Victim-explainer page

Research (manager-adoption + trust memos) endorses. When a review-worthy event fires, the affected individual sees a private explainer page. What happened, what it probably means, what their options are, who they can contact, how to enable the evidence vault.

22. Victim-owned E2E evidence vault

Planned. WebCrypto RSA-OAEP-2048 + AES-256-GCM envelope encryption. Employer stores ciphertext; victim holds private key. Vault metadata suppression (§12) is a prerequisite.

23. SOC 2 Type II + ISO 27001

Year-1 SOC 2 target. Year-2 ISO 27001. Architecture is ready; what's missing is control documentation, quarterly tabletops, third-party pen-test cycle.

07·5Critical review — if you can only ship 3 things before pilot

§7 lists 23 improvements ranked by priority. But priority lists ≠ sequencing decisions. This section re-ranks the §7 items by pitch impact × pilot necessity × feasibility with current team size, and picks the three that should be Sprint-1 scope after pre-seed funding.

Pick 1 · Japan data residency fix (§7 item 5)

Smallest implementation cost, biggest procurement-unblock. Configuring Vercel function regions + auditing data flow is ~1 day. Without it, the Japan-data-residency claim is technically false and the first Enterprise-tier JP prospect will catch it in their security review. Must land before any Enterprise conversation starts.

Pick 2 · meeting_type gating (§7 item 2)

Universal thresholds pool across meeting types — a known validity hole that a hostile technical reviewer will notice in the first pilot post-mortem. Adding meetingType + scoringMode to the Meeting schema + gating detector execution in aggregate.ts is ~2h. It closes the biggest red-team attack surface and enables honest observation-only mode for unsupported formats (incident bridges, 1:1s, exec reviews).

Pick 3 · RLS isolation tests in CI (§7 item 6)

One engineer-day. Writes a test harness that logs in as each role, attempts forbidden queries, asserts they fail. Gated in GitHub Actions on every PR. Without this, the tenant-isolation claim is "verified by faith." With it, every PR is automatically checked. The cheapest credibility upgrade in the entire roadmap.

What NOT to build first (despite being P0)

  • Role enum expansion (§7 item 3) — important but 3h of migration + RLS rework. Not the first blocker.
  • Contestability state machine (§7 item 4) — 2-week Sprint-2 scope. Too big to jam into pre-pilot prep.
  • Per-detector ConfidenceBundle (§7 item 1b) — UX commitment is already landed via the wrapper. Full refactor can wait.

Principle: the first three builds post-funding should be ones where shipping pays for itself in unblocked sales conversations, not code hygiene. Data residency, meeting-type gating, and RLS tests all unblock specific investor/customer objections in the first technical-review meeting.

What the gap analysis missed (v3 addition)

08Roadmap by sprint

Tied to the financing plan (business.html §6).

SprintWeeksFunded byDeliverables
Sprint 1 1–4 Pre-seed §7 items 1 (ConfidenceBundle threading) · 2 (meeting-type gating) · 3 (role enum expansion) · 5 (Japan data residency fix) · 6 (RLS isolation tests in CI)
Sprint 2 5–8 Pre-seed §7 items 4 (contestability state machine) · 7 (semantic-detector quarantine enforced) · 9 (telemetry partitioning) · start 16 (dispute API)
Sprint 3 9–12 Seed §7 items 8 (speaker provenance) · 10 (influence cap) · 11 (input-quality gates) · 12 (vault metadata suppression) · start 22 (evidence vault MVP)
Sprint 4 13–16 Seed §7 items 13 (adaptation-watch) · 14 (multi-metric corroboration) · 15 (remediation-quality measurement) · 17 (3-lane in code) · 21 (victim-explainer page)
Sprint 5-6 17–24 Seed §7 items 18 (NAQ-R study kickoff) · 19 (ASR matrix) · SOC 2 control documentation · third-party pen-test #1
Sprint 7+ 25–52 Series A §7 items 20 (multilingual locale packs) · 23 (SOC 2 Type II achieved · ISO 27001 in progress) · Singapore regional rollout

Velocity assumption

Sprints 1-2 deliverable on the 2-person founding team. Sprint 3+ assumes 1 additional engineer (the security/DevSecOps hire from business.html §5). Sprint 5+ assumes 2 additional engineers + CS lead from the seed round.

09What we will NOT build

The companion to governance's "Kashi will not do" list, at the technology level. These are architectural refusals — not "we haven't gotten around to it yet," but "we will actively not ship this."

Every refusal above is simultaneously a technology decision and a market-positioning decision. A competitor who wants to copy Kashi's safety story has to reproduce all 10 refusals. They can't cherry-pick the detectors and skip the discipline.