What's actually running behind Kashi today, and where we see room to improve. Split audience: investor-readable on top, engineer-level detail toward the bottom. Every claim traces to a file path in the live codebase.
Kashi is a deterministic meeting-governance pipeline. It takes meeting transcripts (Zoom webhook live; Teams and Meet ingestion staged), runs them through a stack of structural detectors, and surfaces repeated interaction asymmetries for human review — not as harassment labels, but as contestable signals with explicit uncertainty.
A Next.js 16 + TypeScript + Supabase application with a pipeline of deterministic detectors organized into 3 lanes (structural-only · hybrid text-informed · refused), protected by row-level security, with commitments declared in typed enums and a machine-readable detector registry — and several honest gaps we're naming on the way to pilot-ready.
| Layer | Technology | Why |
|---|---|---|
| Frontend framework | Next.js 16 (App Router) · React 19 · TypeScript strict | Server-rendered pages for SEO + auth-aware demo routes + client-side interactivity where needed |
| Styling | Tailwind v4 + shadcn/ui components | Utility-first; matches the serious/governance aesthetic not playful-SaaS |
| Charts | Recharts 3.8 | BarChart with LabelList, LineChart with ReferenceLine · covers the Manager Mirror + Executive Brief visuals |
| Auth | Supabase magic-link OTP via @supabase/ssr | No passwords. Works through corporate email infrastructure. JP-friendly. |
| Database | Supabase Postgres (Tokyo region) | Row-level security native. Tenant-scoped. Pinned to ap-northeast-1. |
| Storage | Supabase Storage | Transcript files · per-tenant buckets with policy |
| Hosting | Vercel production | Auto-deploys from main · edge network · honest gap primary compute is US-based (see §6) |
| Detector pipeline | Pure TypeScript (no LLM in prod path) | Deterministic · auditable · fast · no per-API-call cost at runtime |
| LLM (seed + reasoning only) | Claude Sonnet 4.6 (seed authoring) + Claude Opus 4.7 (reasoning-heavy detector validation) | Used only for seed-data authoring and offline detector tuning · not called on production traffic |
| Transcript parsers | VTT · TXT · SRT · CSV · JSONL | Normalizes platform-specific formats into Turn[] · src/lib/pipeline/transcript-parser.ts |
| JP linguistic parsing | Regex-based surface grammar (no MeCab dependency) | Sufficient for keigo classification on meeting-length transcripts · tokenizer upgrade is a Sprint-2 candidate |
Every transcript flows through the same 6-layer pipeline. Every layer is deterministic. Every layer's output is typed. No hidden model calls, no black-box scoring.
Each detector declares its lane at compile time in src/lib/pipeline/detector-registry.ts. Employer-facing surfaces default to lane 1 (structural-only). Lane 2 requires tenant opt-in via semantic_lane_enabled feature flag.
Kashi does not record meetings or upload transcripts manually in production. The detectors below are designed to be fed by platform webhooks when the platform finishes transcribing. Current maturity is asymmetric: Zoom webhook is wired and live; Microsoft Teams and Google Meet integrations are staged.
All three paths normalize through src/lib/pipeline/transcript-parser.ts → Turn[] → detector pipeline below. The /demo/ingest surface exposes the same normalizer + detectors to a stateless paste for hands-on demos. Production stores structural metrics only; transcript body text is discarded after detector run.
Detects when speaker B starts while speaker A is still speaking AND A's turn ends within a threshold.
// src/lib/pipeline/layer1-deterministic.ts
const OVERLAP_THRESHOLD_MS = 500; // overlap window
const TRUNCATION_WINDOW_MS = 500; // A's turn must end within this
export function detectIntrusiveInterruptions(turns: Turn[]): IntrusiveInterruption[] {
// O(n) scan: for each adjacent pair (B, A+1), check if B.startMs < A.endMs
// AND A.endMs - B.startMs < OVERLAP_THRESHOLD_MS
// AND A's turn ends within TRUNCATION_WINDOW_MS → count it
}
Research anchor: Anderson & Leaper 1998 (meta-analysis, 43 studies, d=0.33).
Known confounds: facilitator/chair role · incident-bridge meeting type · overlap-heavy audio. Handled by caveat surface on the Mirror UI; should be handled by meetingType gating (§7 gap).
Per-speaker participation drop in a 5-minute window after a trigger turn, compared to that speaker's own rolling baseline (not team avg).
const CHILLING_DELTA_WINDOW_MS = 300000; // 5 min
const CHILLING_DELTA_THRESHOLD = 0.4; // 40% drop
const COLD_START_MIN_MEETINGS = 5;
Research anchor: Morrison 2014 (organizational silence); Detert & Burris 2007.
Cold-start rule: no baseline signal emitted if the speaker has fewer than 5 prior meetings. Today this rule lives in perSpeakerBaseline() at layer1-deterministic.ts:118 — it returns null when comparable meetings < 5. See §7 gap: should be lifted to global abstention doctrine.
Standard Gini coefficient over speaker durations within a meeting. Gini 0 = perfectly equal, 1 = one person dominates.
Research anchor: Schmid Mast 2002 (Human Communication Research meta-analysis).
A turn ending in lexical question markers (「か?」/「?」/「かな」) that receives no substantive response within N turns. Short acknowledgment responses (<2s or lexical "yes/なるほど/そうですね") don't count as a response for this detector.
Research anchor: Stivers et al. 2009 (PNAS cross-linguistic turn-taking).
A substantive turn (>3s) from speaker A, followed either by (a) silence ≥ 2× the meeting's median inter-turn gap, or (b) a topic redirect by speaker B that captures credit for a similar proposal later in the meeting. Similarity is computed via embedding distance.
Research anchor: Sacks / Schegloff / Jefferson 1974; maps to MHLW パワハラ 類型 3 + 5.
Directional rate at which position-statements shift toward a specific speaker. Position-statements detected via lexical cues; shift detected by comparing turn-order positional claims.
Per-speaker-per-addressee politeness-register score from surface grammar. Classifies each turn into sonkeigo / kenjougo / teineigo / plain / imperative / mixed. Detects when the same speaker uses meaningfully lower register toward one addressee than toward peers.
// src/lib/pipeline/keigo.ts — excerpt
const SONKEIGO_PATTERNS = [/いらっしゃ/, /おっしゃ/, /なさる/, /お[一-龯]+になる/];
const PLAIN_PATTERNS = [/だよ/, /だね/, /でしょ/, /よね/, /じゃん/];
const PEER_ASYMMETRY_THRESHOLD = 0.25;
const MIN_TURNS_PER_TARGET = 2;
On our Kimura/Nao seed: Kimura scores 0.88 toward Nakamura (honorific) vs 0.38 toward Nao (plain form). Gap = 0.50. The largest gap in our control team (Sato's): 0.0 — zero asymmetries.
Research anchor: Cook 2011 (J. Pragmatics 43(15)); Saito 2011 (J. Pragmatics 43(6)); Pizziconi 2003; Ide wakimae framework. Unique to Kashi. No Western product has this.
const WINDOW_DAYS = 90;
const CONCERN_DIRECTIONALITY = 3.0; // 3× peer rate → concern
const CONCERN_BASELINE_DROP = 0.4; // 40% drop → concern
const WATCH_DIRECTIONALITY = 2.0;
const WATCH_BASELINE_DROP = 0.2;
const COST_PER_CASE_MIN_YEN = 3_000_000;
const COST_PER_CASE_MAX_YEN = 8_000_000;
Known simplification: These thresholds are universal. The meeting-type-normalization research says they should be per-meeting-type. Fix in the §7 gap list.
Core types at src/lib/types.ts:
// The atomic unit of input.
export type Turn = {
speakerId: string;
startMs: number;
endMs: number;
text: string;
};
export type Meeting = {
id: string;
dateIso: string;
teamId: string;
title: string;
turns: Turn[];
// Declared (not yet threaded through detectors):
meetingType?: MeetingType;
meetingTypeConfidence?: number;
scoringMode?: ScoringMode;
};
// New types committed 2026-04-21 · awaiting end-to-end threading:
export type DetectorClass =
| "STRUCTURAL_ONLY"
| "TEXT_DERIVED_DETERMINISTIC"
| "HYBRID_TEXT_INFORMED"
| "GENERATIVE_ASSIST"
| "REFUSED";
export type EvidenceGrade =
| "BLOCKED" // input quality below gate
| "INSUFFICIENT" // thin exposure
| "WEAK"
| "EMERGING"
| "STABLE"
| "HIGH_CONFIDENCE_STABLE";
export type AbstentionState =
| "NO_COMPUTE"
| "COMPUTE_NO_INTERPRETATION"
| "WATCH_ONLY"
| "INTERPRETABLE_PRIVATE_ONLY"
| "INTERPRETABLE_ROLE_BOUNDED";
export type ConfidenceBundle = {
inputQuality: number;
contextSupport: number;
exposureSupport: number;
detectorConfidence: number;
aggregationSupport: number;
reasonCodes: ReasonCode[];
abstention: AbstentionState;
grade: EvidenceGrade;
};
Database schemas at src/lib/db/types.ts — multi-tenant with org_id scoped RLS on every table. Raw transcript text is stripped before DB write; only length_chars is persisted.
orgs(id, name, created_at)
profiles(id, org_id, email, role, created_at)
-- role enum: admin | ceo | member (see §7 gap: needs 6)
teams(id, org_id, name)
meetings(id, org_id, team_id, title, date_iso, turns_metadata jsonb)
-- turns_metadata = TurnMetadata[] with text STRIPPED
meeting_metrics(meeting_id, speaking_share jsonb, intrusive_interruptions jsonb, ...)
manager_mirrors(manager_profile_id, week_ending_iso, ...)
pattern_summaries(manager_profile_id, pattern_intensity enum, ...)
user_keys(user_id, public_key_jwk jsonb) -- evidence vault (planned)
evidence_vault(id, user_id, event_id, encrypted_snippet, encrypted_data_key, iv)
What we've actually built vs what we've only declared. Naming the gap is the first step to closing it.
| Guarantee | Built in code? | Declared in docs? | Gap (if any) |
|---|---|---|---|
| Tenant isolation via RLS | YES | YES | No automated RLS isolation tests in CI yet — fix in Sprint 1 |
| Magic-link OTP auth | YES | YES | — |
| Raw transcript stripping (length only stored) | YES | YES | — |
| Japan data residency (Supabase Tokyo) | PARTIAL | YES | Vercel primary-compute region is US. Regulated content flowing through Vercel runtime contradicts the claim. Fix in Sprint 1. |
| ConfidenceBundle on every event | YES (v3) | YES | Emitted end-to-end as of 2026-04-21. Headline EvidenceGrade + AbstentionState + ReasonCode[] visible on every Manager Mirror + Executive Brief event. Per-detector bundles remain Sprint-1. |
| k-anonymity (k≥5) on aggregates | DECLARED | YES | Enforced at UI layer today, not at query layer. Sprint 2: move enforcement into SECURITY DEFINER RPCs. |
| Differential privacy (ε≤1) on exec dashboards | NOT YET | YES | Math declared in governance page; not implemented. Sprint 3. |
| Audit log on every drill-down | PARTIAL | YES | Individual drill-downs logged; app-layer audit event schema with reason codes + SIEM streaming is Sprint 2. |
| 4-tier retention (14d / 24mo / 12mo / legal-hold) | PARTIAL | YES | Retention bands defined in schema, purge jobs exist for raw layer. Tombstones + restore-reconciliation: Sprint 3. |
| No admin-level content access | YES | YES | Architecturally: Kashi staff do not have routine access to customer transcripts. Break-glass procedures are documented and audit-logged; tabletop exercise Sprint 2. |
| SOC 2 Type II / ISO 27001 | PRE-CERT | honestly disclosed | Year-1 target. Architecture ready for audit; controls documentation is the gap. |
This section is the honest answer to "what's not yet done?" All items below are surfaced directly from the Ideas_wave3 technical-dev research library (17 memos) + the business research memos' technical implications. Every item has a path to code.
Backward-compatible wrapper shipped at src/lib/pipeline/confidence-bundle.ts. Every ManagerMirrorData + PatternSummary now carries overallGrade + overallAbstention + overallReasonCodes. UI renders grade badge with hover-tooltip rationale on /demo/mirror + /demo/ceo.
Remaining Sprint-1 scope: refactor each of the 7 detectors to emit their OWN bundle natively (not just at the aggregation boundary). The UX commitment is landed; the per-detector audit trail is next.
Currently the bundle is computed at aggregate.ts boundary. Research wants each detector to emit its own bundle so reviewers can see which detector is weak and why. Full signature refactor of 7 detectors + threading through aggregate + eval harness re-verification. ~1 week.
Universal thresholds (CONCERN_DIRECTIONALITY=3.0) pool across all meeting types. A weekly sync and an incident bridge and a 1:1 and a training session are treated identically. The meeting-type-normalization research says this is the largest red-team attack surface.
Fix: add meetingType to Meeting (declared, not used); gate detector execution on scoringMode; block cross-type baseline pooling in aggregate.ts. ~2h.
Current: admin | ceo | member. Research requires: Individual, Manager, HR/Compliance, Executive, Restricted Investigator, System Admin. Current enum structurally cannot satisfy the §3 visibility matrix in the role-and-visibility architecture memo.
Fix: migration 0004 · expand profiles.role enum · update RLS policies · split admin into support-admin vs behavioral-data investigator. ~3h including RLS policy tests.
Research (legal-procedural-fairness memo §5) is explicit: challenge / dispute / correction workflow is the single biggest product gap. Without it, every review-worthy event is uncontestable — which breaks the entire fairness story and fails EU AI Act Annex III §4 meaningful-human-review requirement.
Fix: 8 new tables (disputable_object, dispute_ticket, dispute_evidence, review_decision, correction_patch, recompute_job, aggregation_exclusion, access_history) · dispute API endpoints · server-enforced DRAFT privacy. Sprint 2, ~2 weeks.
Vercel functions pinned to hnd1 (Tokyo) via vercel.json. Verified post-deploy: production responses now return x-vercel-id: hnd1::.... Three-week soak (2026-04-25 → 2026-05-16) showed no region drift. Marketing claim is now accurate.
RLS isolation suite added at test/rls/ running against a real Postgres + Supabase auth shim in CI. Mutation-test ritual documented in each test file. Migration 0012_rls_helper_security_definer.sql shipped alongside, fixing a latent RLS-helper recursion bug discovered during the test build-out. Now gating every PR.
The registry declares which detectors are lane-2 hybrid. But the employer-facing output endpoint doesn't filter by detectorsAllowedForTenant(flags). Today the tenant flag is decorative.
Fix: wrap detector invocation in aggregate.ts with detectorsAllowedForTenant(). Employer-facing default = lane-1-only. Semantic lane requires explicit per-tenant flag flip in orgs.feature_flags jsonb. ~4h.
Today Turn.speakerId is a flat string. Research wants 5-layer provenance: UtteranceSegment → DiarizationCluster → MeetingParticipantInstance → CanonicalPerson, with status enum (RESOLVED / UNKNOWN / WRONG / SPLIT_SUSPECTED / MERGE_SUSPECTED / OVERLAP_AMBIGUOUS).
Fix: refactor Turn type + migration + identity-mapping tables with versioning. Metric-eligibility gate suppresses person-level output when unknown_speaker_duration > 15%. ~1 week.
Employer must not be able to infer that an employee opened their pattern page, created a vault, marked a confound, or filed a dispute. Today these events could in theory appear in business-analytics. Research (retaliation-risk memo) flags this as directly actionable under MHLW retaliation-prohibition.
Fix: split telemetry namespace · protected-route events go to a separate store (not exposed to tenant BI) · small-team (<5 user) inference suppression · batching + delay on employee-facing events. ~1 week.
Current aggregation: any single meeting can dominate the 90-day signal. A noisy meeting creates a fake trend. Research wants: cap single-meeting contribution at 20% of weighted evidence; run leave-one-out to confirm signal survives without any single meeting.
Fix: weight normalization in aggregate.ts. Degrade evidence grade if signal disappears leave-one-out. ~3 days.
Research wants every meeting to pass 7 sequential quality gates BEFORE any detector runs: substrate presence · parser integrity · speaker attribution · transcript text · language regime · meeting type · sample sufficiency. Gates produce a detector_eligibility_map telling each detector if it's allowed to run.
Fix: new InputQualityGate layer before Layer 2. Meetings failing Gate N get scoringMode="observation_only". ~3-4 days.
E2E evidence vault is planned (ciphertext employer cannot decrypt). But the existence of a vault, its snippet count, last-activity timestamp, and draft state still leak via database metadata. Research: metadata leakage is as dangerous as content leakage.
Fix: vault tables in a separate Postgres schema with no cross-schema joins from tenant BI · admin access requires break-glass · metadata counts suppressed on exports. ~1 week.
Once managers know the detector surface, some will game it (channel-shift dominance into async/1:1). Research: flag suspiciously clean metric improvement as a signal, not success. Multi-metric corroboration required before declaring an "improvement."
No single-metric victory claims. An "improvement" requires ≥2 detector families moving in the right direction + no adjacent metric worsening. Enforce in aggregation layer before event emission.
Business plan §11 declares the 6-layer success model. Layers 5 (remediation quality) and 6 (human recovery) aren't measured in code yet. Requires post-Lane-B fairness-rating capture from the affected employee + speaking-share-recovery tracking at 30/60/90 days.
The contestability state machine (§P0) needs paired API: POST /disputes · POST /disputes/:id/submit|triage|resolve · GET /objects/:id/explanation · GET /access-history/:object_id. Build once the state machine lands.
The Lane-A (private self-correction) → Lane-B (governed remediation) → Lane-C (formal review) model is declared on governance page. The automatic transition trigger (pattern persists after notice → escalates to Lane B) is not implemented. Today it's a policy, not a product feature.
Partnership with a Japanese workplace-harassment research lab. 24-month validation: do Kashi's structural signals predict NAQ-R self-report outcomes at 6-month follow-up in treatment vs matched controls? Publishable result. Cohort of 5 JP companies.
Platform × language × feature support matrix. Teams vs Meet vs Zoom, each with different transcription quality. Japanese-specific: the 46.8% single-channel tcpWER baseline is dangerous. Overlap flags, L2 caution surfaces, per-platform eligibility maps.
Universal detector core + per-locale calibration layers. Locale registry: country × language × mixed-language flag × platform × transcript-confidence band × calibration pack × legal pack × UX copy pack. Enables Singapore / NL / UK expansion.
Research (manager-adoption + trust memos) endorses. When a review-worthy event fires, the affected individual sees a private explainer page. What happened, what it probably means, what their options are, who they can contact, how to enable the evidence vault.
Planned. WebCrypto RSA-OAEP-2048 + AES-256-GCM envelope encryption. Employer stores ciphertext; victim holds private key. Vault metadata suppression (§12) is a prerequisite.
Year-1 SOC 2 target. Year-2 ISO 27001. Architecture is ready; what's missing is control documentation, quarterly tabletops, third-party pen-test cycle.
§7 lists 23 improvements ranked by priority. But priority lists ≠ sequencing decisions. This section re-ranks the §7 items by pitch impact × pilot necessity × feasibility with current team size, and picks the three that should be Sprint-1 scope after pre-seed funding.
Smallest implementation cost, biggest procurement-unblock. Configuring Vercel function regions + auditing data flow is ~1 day. Without it, the Japan-data-residency claim is technically false and the first Enterprise-tier JP prospect will catch it in their security review. Must land before any Enterprise conversation starts.
Universal thresholds pool across meeting types — a known validity hole that a hostile technical reviewer will notice in the first pilot post-mortem. Adding meetingType + scoringMode to the Meeting schema + gating detector execution in aggregate.ts is ~2h. It closes the biggest red-team attack surface and enables honest observation-only mode for unsupported formats (incident bridges, 1:1s, exec reviews).
One engineer-day. Writes a test harness that logs in as each role, attempts forbidden queries, asserts they fail. Gated in GitHub Actions on every PR. Without this, the tenant-isolation claim is "verified by faith." With it, every PR is automatically checked. The cheapest credibility upgrade in the entire roadmap.
Principle: the first three builds post-funding should be ones where shipping pays for itself in unblocked sales conversations, not code hygiene. Data residency, meeting-type gating, and RLS tests all unblock specific investor/customer objections in the first technical-review meeting.
Tied to the financing plan (business.html §6).
| Sprint | Weeks | Funded by | Deliverables |
|---|---|---|---|
| Sprint 1 | 1–4 | Pre-seed | §7 items 1 (ConfidenceBundle threading) · 2 (meeting-type gating) · 3 (role enum expansion) · 5 (Japan data residency fix) · 6 (RLS isolation tests in CI) |
| Sprint 2 | 5–8 | Pre-seed | §7 items 4 (contestability state machine) · 7 (semantic-detector quarantine enforced) · 9 (telemetry partitioning) · start 16 (dispute API) |
| Sprint 3 | 9–12 | Seed | §7 items 8 (speaker provenance) · 10 (influence cap) · 11 (input-quality gates) · 12 (vault metadata suppression) · start 22 (evidence vault MVP) |
| Sprint 4 | 13–16 | Seed | §7 items 13 (adaptation-watch) · 14 (multi-metric corroboration) · 15 (remediation-quality measurement) · 17 (3-lane in code) · 21 (victim-explainer page) |
| Sprint 5-6 | 17–24 | Seed | §7 items 18 (NAQ-R study kickoff) · 19 (ASR matrix) · SOC 2 control documentation · third-party pen-test #1 |
| Sprint 7+ | 25–52 | Series A | §7 items 20 (multilingual locale packs) · 23 (SOC 2 Type II achieved · ISO 27001 in progress) · Singapore regional rollout |
Sprints 1-2 deliverable on the 2-person founding team. Sprint 3+ assumes 1 additional engineer (the security/DevSecOps hire from business.html §5). Sprint 5+ assumes 2 additional engineers + CS lead from the seed round.
The companion to governance's "Kashi will not do" list, at the technology level. These are architectural refusals — not "we haven't gotten around to it yet," but "we will actively not ship this."