Kashi — Privacy / Retention / Evidence-Boundary Perspective
Technical research memo for dev handoff
Prepared: 2026-04-21

Purpose
Turn the privacy / retention / evidence-boundary issue into a concrete technical architecture and implementation guide for Kashi.

Working judgment
This is not a “store or delete” question. It is a system-boundary question.
Kashi becomes much more defensible when privacy is defined by:
1) data class,
2) who can see that class,
3) what promotes one class into another,
4) how deletion actually behaves across active storage, caches, logs, and backups,
5) whether some material is employer-readable at all.

The current materials already contain the right instincts: four-tier retention, role-based access, audit trail, k-anonymity, differential privacy, a user-held evidence vault, and no-HR-decision posture. But the technical boundary is still under-specified. The biggest missing pieces are deletion semantics, backup caveats, key lifecycle, metadata leakage control, and a brutally clear answer to what is employer-readable vs ciphertext only.

==================================================================
1. Executive technical decision
==================================================================

Kashi should implement privacy and retention around six distinct classes, not one generic “meeting data” bucket:

A. Raw intake layer
B. Derived analytics layer
C. Review-worthy event layer
D. Formal case / legal-hold layer
E. Private evidence-vault ciphertext layer
F. Security / audit / reliability telemetry layer

Those classes must differ on all of the following:
- purpose
- visibility
- retention timer
- deletion path
- backup handling
- exportability
- legal-hold behavior
- cryptographic boundary

If those are not separated, the product drifts into exactly the surveillance archive Kashi says it is refusing.

==================================================================
2. Critical correction: the privacy boundary cannot be slogan-only
==================================================================

Kashi currently says variants of:
- “patterns, not content, not affect”
- “never transcribe for analysis”
- “only structural interaction metadata”

But the shipped / named detector set already includes things like unanswered-question rate, topic-credit ignored-turns, and embedding-distance similarity. That means the real boundary is not “no text ever touches the system.” The real boundary is narrower:

Recommended truth:
- Kashi ingests transcript-linked meeting records.
- Employer-facing risk inference should be structural-first and tightly bounded.
- Any transcript-semantic use must be explicitly classified, documented, and governed separately.
- “Message body text to external services” and broad employer content browsing remain hard red lines.

For privacy architecture, this matters because deletion, retention, and access rules must be written for the actual substrate, not the branding line.

Practical implication for devs:
Do not architect the system as though transcript text “doesn’t count” because it is only used transiently. If text ever enters pipelines, queues, caches, embeddings, or support tooling, it is in scope for retention and deletion design.

==================================================================
3. Data-class architecture
==================================================================

3.1 Class A — Raw intake layer
What it is
- original transcript payloads from Zoom / Teams / Meet
- diarization labels
- timestamps
- raw turn text
- meeting metadata needed for parsing and linkage
- optionally audio pointers or storage references if retained at all

What it is for
- parsing
- quality control
- challenge / correction workflow
- limited formal review when justified

Default visibility
- no normal user access
- not visible to managers
- not visible to HR / leadership by default
- only system services + explicitly authorized restricted investigators under documented procedure

Retention recommendation
- default 7–14 days active retention
- shortest value that still supports contestability and parser QA
- current 14-day Kashi direction is reasonable for MVP

Deletion behavior
- delete from active database/storage at expiry
- purge derivative processing queues within 24 hours
- remove search index entries / vector entries / caches within 24 hours
- if legal hold not active, raw should not survive as a quasi-archive in another product table

Critical note
Raw should be isolated physically and logically. Do not let the main application query path treat raw transcript tables as a convenient read source.

3.2 Class B — Derived analytics layer
What it is
- speaking-share metrics
- interruption counts
- directionality scores
- reciprocity / concentration stats
- baseline drift stats
- meeting-type tags and confidence
- confidence / abstention objects

What it is for
- ongoing comparison over 30 / 90 / 180 days
- self-view dashboards
- aggregate organizational views
- manager self-mirror

Default visibility
- employee: own data only
- manager: self-behavior + team aggregate only
- HR / compliance: aggregate / thresholded only
- exec: aggregate / privacy-filtered only

Retention recommendation
- 24 months default is defensible
- make tenant-configurable within guardrails (for example 6 / 12 / 24 months)
- do not let customers set “forever” without exceptional contracting and governance review

Deletion behavior
- user-facing deletion request should remove subject-linked analytics from active query surfaces immediately unless legal hold blocks it
- background compaction can rebuild aggregates excluding deleted source records
- keep a deletion ledger entry, not the deleted analytics themselves

Critical note
Analytics are the long-lived class, but they should be materially less sensitive than raw because they must be query-bounded, role-bounded, and not sufficient to reconstruct content.

3.3 Class C — Review-worthy event layer
What it is
- flagged event object
- event type / rule trigger
- timestamp
- speaker IDs / pseudonymous references
- confidence and reason codes
- bounded context window reference or minimized excerpt pointer
- explanation object for contestability

What it is for
- threshold review
- employee self-understanding
- structured escalation support
- triage

Default visibility
- employee: own affected events
- manager: own behavior summaries, never named subordinate event browsing by default
- HR / compliance: thresholded / approved review objects only
- investigator: case-scoped access only

Retention recommendation
- 12 months default is reasonable
- long enough to support repeated-pattern review and formal intake timing
- shorter than analytics if event objects are more sensitive

Deletion behavior
- event deletion should destroy the event object and any linked minimized excerpt unless:
  - user has explicitly shared it into a case, or
  - a legal hold or formal institutional procedure preserves it
- event deletion should not silently leave underlying event snapshots in logs, analytics mirrors, debug tables, or notification histories

Critical note
A review-worthy event is not a free pass to keep more context forever. It is a higher-sensitivity object, not an excuse to retain raw transcript indefinitely.

3.4 Class D — Case / legal-hold layer
What it is
- formally approved case package
- preserved evidence references
- reviewer notes
- access history
- override / correction log
- decision / disposition metadata

What it is for
- formal investigation
- legal defense / regulatory response
- documented institutional workflow

Default visibility
- authorized investigators only
- no manager access unless separately justified in process rules
- no casual HR browsing

Retention recommendation
- no generic fixed period
- retained only while case is active, appeals window remains open, or legally required retention applies
- must carry hold_reason, hold_owner, hold_start_at, review_due_at

Deletion behavior
- legal hold suspends ordinary destructive deletion for linked classes only to the minimum necessary scope
- hold release should restart deletion clock or immediate purge workflow depending on reason and policy

Critical note
“Legal hold” must be an object with explicit state and auditability, not an informal excuse to keep everything.

3.5 Class E — Private evidence-vault ciphertext layer
What it is
- encrypted context snippets
- optional private notes if supported later
- encrypted package manifest
- key-envelope metadata

What it is for
- user-controlled private preservation
- later selective sharing

Default visibility
- user only
- server stores ciphertext but should not decrypt
- employer-side users must not see plaintext, vault existence, snippet count, activity timestamps, or draft state by default

Retention recommendation
- default retain until user deletes, account is deleted, or policy sunset is reached
- tenant policy may set outer maximum for dormant ciphertext objects, but only if disclosed clearly
- if user shares selected material into a formal case, that shared copy becomes a separate Class D object; the original vault copy remains user-controlled

Deletion behavior
- deleting vault material should delete ciphertext blobs and envelope metadata from active systems immediately
- caches and previews must be purged
- backups follow normal backup-expiry caveat unless cryptographic erasure is achievable by destroying the only decryptable key material

Critical note
This layer is not protected merely because content is encrypted. Metadata leakage can still expose concern formation.

3.6 Class F — Security / audit / reliability telemetry layer
What it is
- auth events
- admin actions
- drill-down access logs
- security alerts
- background job failures
- deletion job status
- hold activation / release

What it is for
- security
- compliance
- incident response
- user-visible access history for sensitive material

Default visibility
- security/admin roles for system telemetry
- affected individuals should see access history relevant to their protected materials
- employer-side business users must not get access to protected-route telemetry like pattern-page opens or vault creation

Retention recommendation
- 12–24 months depending on control requirements
- protected-route telemetry should be minimized and segregated

Deletion behavior
- audit logs generally should not be casually editable or deletable
- but they should be field-minimized so they do not become shadow content stores

Critical note
Telemetry partitioning is not optional. Security logging and employer analytics must live in different permission domains.

==================================================================
4. Employer-readable vs user-held ciphertext-only boundary
==================================================================

This is the most important technical boundary in the whole system.

4.1 Employer-readable by default
Allowed examples
- structural metrics
- aggregate trend views
- privacy-filtered review counts
- bounded event objects after thresholding / approval
- case records after formal opening
- access logs for formal case handling

4.2 Employer-readable only with procedural justification
Allowed only after review activation
- bounded context window around an approved event
- corrected transcript excerpt tied to a specific dispute
- case-package materials preserved under Class D

Must never become default browsing
- full meeting transcript archives
- universal per-employee transcript search
- raw vault material
- draft reports
- concern-formation telemetry

4.3 User-held ciphertext only
Should stay employer-inaccessible by design
- vault plaintext
- vault-derived private notes
- unshared encrypted snippets
- private awareness / concern-formation state

4.4 Not even visible as metadata to employer-side users
This is where teams screw up.
Do not expose:
- that a vault exists
- that the user opened /app/me/pattern
- snippet count
- last vault activity time
- draft creation time
- support-link clicks
- repeated private review frequency

That metadata is retaliation-sensitive even if ciphertext is perfect.

==================================================================
5. Promotion pipeline: what moves data from one class to another
==================================================================

A good architecture makes promotion explicit.

5.1 Raw -> Analytics
Automatic, deterministic, background processing
- parse transcript
- derive structural metrics
- record confidence / quality gates
- raw stays isolated

5.2 Analytics -> Review-worthy event
Thresholded and policy-bounded
Requirements should include at least:
- evidence threshold met
- meeting-type and role context valid enough
- input-quality gate passed
- confidence object present
- not suppressed by confounds / unsupported type / small-team anti-inference rules

5.3 Review-worthy event -> Case / legal hold
Never automatic just because a detector fired.
Only after:
- explicit user sharing, or
- documented institutional threshold workflow with human approval

5.4 Any class -> Deleted / suppressed state
Deletion is not a boolean. The system needs states such as:
- active
- soft-deleted pending purge
- purged from active storage
- retained in backup window only
- hold-protected
- cryptographically inaccessible

==================================================================
6. Retention policy recommendation
==================================================================

Recommended defaults for MVP / pilot

Class A Raw intake
- active retention: 14 days
- optional stricter tenant mode: 7 days
- purpose: QA, contestability, formal review if needed

Class B Analytics
- active retention: 24 months
- tenant options: 6 / 12 / 24 months
- purpose: longitudinal signal, self-view, aggregate governance

Class C Review-worthy events
- active retention: 12 months
- tenant options: 6 / 12 months
- purpose: triage, contestability, escalation support

Class D Case / legal hold
- retained only while justified
- mandatory periodic hold review (for example every 90 days)

Class E Private evidence vault ciphertext
- retained until user deletion, account deletion, or disclosed tenant sunset policy
- shared extracts become separate Class D copies when formally introduced into case handling

Class F Security / audit telemetry
- 12–24 months
- protected-route telemetry minimized and segregated

Do not let the tenant admin reduce retention in a way that destroys contestability while still claiming fairness. Some minimums should be system-enforced.

==================================================================
7. Deletion semantics: the thing the current materials still need badly
==================================================================

The user question “what can be deleted immediately vs after backup window?” must be answered in exact system terms.

7.1 Immediate active-store deletion
Should happen immediately or near-immediately for:
- vault ciphertext the user deletes
- draft reports / draft concern objects
- review-worthy events not on hold
- raw transcript copies past TTL
- cached excerpt material
- search / vector / index entries tied to deleted objects

Operational target
- remove from primary query surfaces immediately
- complete background purge from caches, queues, and search/vector indexes within 24 hours

7.2 Not immediately erasable because of backups
Be honest about this.
Items may remain in backup media until backup expiry when using standard database backups / PITR.
So the system should state:
- deletion from active systems is immediate or near-immediate
- deleted data may persist in encrypted backups until backup rotation expires
- deleted data is not restorable to normal product views except through controlled disaster-recovery operations

7.3 Cryptographic erasure option
For vault material, stronger deletion is possible if the only effective decryption path is destroyed.
That means:
- if Kashi never has the private key,
- and if no plaintext preview copy exists server-side,
- and if wrapped symmetric keys / manifests are deleted,
then vault data may become practically irrecoverable even before backup expiry.

But do not overclaim. If any preview, analytics shadow, export copy, or support dump contains plaintext, the story collapses.

7.4 Legal-hold override
Deletion requests must check hold state.
Rule:
- no destructive purge for held case materials while hold is active
- no blanket hold on unrelated classes
- private awareness and unshared vault existence should not automatically enter hold scope

7.5 Export / deletion interaction
If the user or customer exports data, export copies become a new control problem.
The system should log:
- who exported
- what class was exported
- when
- under what case or reason code

==================================================================
8. Backup caveats and disaster-recovery reality
==================================================================

This has to be documented in plain language because buyers will ask and users will deserve an honest answer.

Recommended wording logic
- Active deletion removes data from live product systems and normal query surfaces.
- Backups may continue to contain deleted data until the relevant backup retention window expires.
- Backup restoration is a controlled disaster-recovery operation, not an ordinary product read path.
- If a restore occurs, deletion tombstones / purge jobs must re-run so previously deleted records do not silently reappear in ordinary views.

Engineering implications
- maintain deletion ledger / tombstone table
- restore procedure must replay deletion state after backup restore
- caches and secondary indexes must rebuild from post-deletion truth, not restored stale state

==================================================================
9. Key management lifecycle for the evidence vault
==================================================================

The current “generate a key pair in browser, store private key locally, recovery phrase backup” direction is good, but it is incomplete until the lifecycle is defined.

Minimum lifecycle decisions required

9.1 Generation
- keys generated client-side only
- use envelope encryption: per-snippet symmetric key + user public key wrap
- current RSA-OAEP-2048 plan is serviceable for MVP
- if refactoring later, modern key-agreement-based envelope design may be cleaner, but not required to ship MVP

9.2 Storage
- private key stays local
- if browser local storage / IndexedDB is used, say that clearly
- never let server-side analytics or support tooling receive raw private key material

9.3 Recovery
- define whether recovery phrase is true escrow, local mnemonic backup, or user-exported encrypted key backup
- if no server escrow exists, say plainly that lost key may mean unrecoverable vault contents

9.4 Rotation
- define whether rotating a user key re-wraps old snippet keys
- if yes, build rewrap job
- if no, define legacy-key handling and user UX

9.5 Device migration
- specify how a user moves vault access to a new device
- do not improvise this later; otherwise support staff will pressure for a secret backdoor

9.6 Revocation
- define what happens on suspected compromise
- new key issuance
- rewrap future materials
- old materials either rewrapped or explicitly left inaccessible depending on design

9.7 Offboarding / account deletion
- if account is deleted, what happens to local private key, wrapped data, recovery artifact, and already shared case copies?
- user-private vault copy may be destroyed; formal case copies may remain under Class D if justified

9.8 No-escrow vs escrow policy
Kashi must choose and state one of these:
- no escrow: strongest privacy, weaker recoverability
- split escrow / recovery service: easier recovery, weaker trust story

For Kashi’s thesis, no-escrow or extremely narrow opt-in escrow is more aligned.

==================================================================
10. Metadata leakage and anti-retaliation controls
==================================================================

This is not optional polish. It is part of the evidence boundary.

Required controls
- no employer-facing signal when a user opens their own pattern page
- no employer-facing signal when confounds are marked
- no employer-facing signal when the vault is created or used
- no employer-facing draft state
- no employer-facing notification on repeated self-review
- protected-route telemetry segregated from product analytics
- anti-inference suppression for small teams, narrow windows, and obvious role reconstruction
- bounded event-window sharing by default, not full transcript dump

Technical pattern
- protected routes write to security namespace only
- employer analytics warehouse never ingests these events
- BI tools / product dashboards must not even be able to query them

==================================================================
11. Access model and infra boundary
==================================================================

11.1 Role separation
Current Kashi direction is broadly right:
- Individual
- Manager
- HR / Compliance
- Restricted Investigator
- System Admin

But implement with two extra rules:
- System Admin should not get substantive meeting content by default just because they are infra admins.
- Restricted Investigator access must be case-scoped, time-scoped, and reason-coded.

11.2 Tenant isolation
- enforce org-bound row-level security in database, not only UI
- minimize service-role bypasses
- log privileged access
- negative-test for tenant escape

11.3 Region / processor boundary
For Japan-sensitive pilots:
- keep regulated data plane in Tokyo-region Supabase or equivalent
- do not use Vercel as the system of record for transcripts, vault material, or sensitive case payloads
- ensure logs, analytics, previews, and optional AI paths are documented separately

11.4 Model boundary
- production detection path should remain non-generative and deterministic unless intentionally changed
- any optional model-assisted feature must be separately governed, separately retained, and separately disclosed
- no transcript body should hit external model providers on the live detector path

==================================================================
12. Recommended implementation pattern
==================================================================

Suggested storage / service split

Service 1: Ingestion service
- receives transcript payloads
- stores raw in isolated schema / bucket
- writes minimal meeting manifest
- starts derivation jobs

Service 2: Derivation pipeline
- computes structural analytics
- writes analytics objects
- writes confidence / abstention objects
- writes no employer-readable content excerpts

Service 3: Review-event service
- constructs event objects only when thresholds and policy checks pass
- stores minimal bounded context reference
- applies anti-inference / suppression rules

Service 4: Case service
- opens only on explicit share or approved threshold process
- snapshots only minimum necessary material
- activates hold state and access logging

Service 5: Vault service
- stores ciphertext blobs + envelope metadata only
- never receives decrypt capability
- exposes share-preview flow for the user

Service 6: Audit / deletion service
- manages purge jobs
- deletion tombstones
- hold state
- restore reconciliation after backups
- user-visible access history for protected materials

==================================================================
13. Critical red flags still present in the current Kashi materials
==================================================================

1. “Never transcribe for analysis” is too clean for the actual system story.
If transcript-linked records exist and some detectors use bounded semantics or similarity, say so honestly and govern it.

2. “Server cannot decrypt” is not the same as “safe.”
Metadata leakage can still expose concern formation.

3. “Not an employee-monitoring tool” is too absolute.
Technically, Kashi still processes employee-linked interaction data. The defensible claim is narrowed visibility, bounded use, and anti-surveillance architecture.

4. Four-tier retention is a good skeleton, not a full retention policy.
Until deletion SLAs, backup caveats, restore behavior, and hold scope are defined, the policy is incomplete.

5. Admin controls are not nice-to-have enterprise hardening.
They are part of the privacy boundary.
Without them, every customer deployment becomes custom governance improvisation.

==================================================================
14. Dev-ready acceptance criteria
==================================================================

A. Data classes
- System implements separate schemas / stores / object models for raw, analytics, review events, case materials, vault ciphertext, and audit telemetry.
- No generic table or bucket mixes these classes casually.

B. Visibility
- Managers cannot browse named subordinate review events by default.
- Employer-side users cannot see private awareness, concern formation, vault existence, or draft state.
- Restricted investigators can access only case-scoped material.

C. Retention
- Raw TTL automation exists and is tested.
- Analytics / event / case / vault / audit classes each have documented retention config.
- Legal hold suspends deletion only for scoped linked records.

D. Deletion
- User / admin deletion removes data from active query surfaces immediately.
- Background purge clears caches, queues, and secondary indexes within a defined SLA.
- Restore procedure replays deletion tombstones so deleted data does not resurrect.

E. Vault confidentiality
- Server cannot decrypt vault content.
- No plaintext preview copy is stored server-side by default.
- Vault metadata is hidden from employer-side users.

F. Logging
- Protected-route telemetry is segregated from business analytics.
- Drill-down and case access are reason-coded and auditable.
- Affected users can view access history for protected materials.

G. Model / processor boundary
- Production detector path uses no external LLM.
- Optional AI-assisted features are separately flagged, retained, and disclosed.
- Sensitive payloads are not routed through app-delivery logs by accident.

H. Small-team / inference protection
- Aggregate views suppress when team size or role concentration makes inference too easy.
- Event sharing defaults to bounded windows and minimal packages.

==================================================================
15. Build order recommendation
==================================================================

P0
- data-class separation
- raw TTL + purge jobs
- vault metadata suppression
- protected-route telemetry segregation
- case-state / hold-state model
- deletion tombstones + restore reconciliation

P1
- tenant admin retention controls with guardrails
- share-preview workflow
- access-history panel
- export logging
- reason-coded investigator drill-down

P2
- key rotation and device migration flow
- advanced anti-inference batching / delay controls
- tenant-facing retention / deletion assurance page
- formal incident-response hooks and audit export pack

==================================================================
16. Bottom line
==================================================================

The clean technical answer is this:
Kashi should not promise privacy through vibes like “metadata only” or “encrypted.”
It should enforce privacy through class separation, promotion rules, minimized employer readability, explicit deletion semantics, backup honesty, and a vault boundary where the institution literally cannot read what the user has not chosen to share.

That is the difference between a governance product and a surveillance archive with nicer copy.

==================================================================
17. Source notes used for this memo
==================================================================

Internal Kashi materials consulted
- Kashi — Progress & Project Overview (2026-04-21)
- Transparency That Drives Institutional Accountability (concept note)
- Kashi - Procurement / Security-Buyer Readiness Memo
- Kashi - Retaliation-Risk Research Memo
- Kashi - Measurement-Science Research Memo
- Kashi - Legal / Procedural Fairness research synthesis

Official sources re-checked on 2026-04-21
- European Commission: Navigating the AI Act FAQ
- Supabase Docs: Database Backups
- Supabase Docs: Restore to a new project / backup restore behavior
- Vercel: Data Processing Addendum
- Vercel: Shared Responsibility Model
- Anthropic Privacy Center: Data processor / controller and training-use posture for commercial products