KASHI — Detector-Boundary Perspective Comprehensive technical research memo for developers Date: 2026-04-21 Purpose: resolve the detector-boundary contradiction and define what is truly structural-only, what is hybrid structural + semantic, and what should be off-limits. ================================================== 0. Executive decision ================================================== Bottom line: Kashi cannot honestly keep all three of these claims at once: 1) "metadata only / no content" 2) "all six detectors are structural" 3) keep detectors such as unanswered-question rate, topic-credit ignored-turns, and agreement-asymmetry in the same employer-facing MVP Those three claims do not coexist. The current project materials say both: - employer-facing detection is limited to structural interaction metadata rather than semantic content classification; and - the shipped detector list includes detectors that require transcript interpretation, including "unanswered-question rate" (needs question + substantive response detection), "topic-credit ignored-turns" (already described as using embedding distance / turn-similarity), and "agreement-asymmetry" (needs stance / position-shift interpretation). Therefore Kashi needs a hard architectural doctrine, not softer wording. Recommended product doctrine: - Employer-facing MVP should be STRICT STRUCTURAL-ONLY. - Semantic or transcript-meaning analysis, if kept at all, should be moved into a separate HYBRID lane with different labels, different confidence rules, different governance, and no "metadata-only" marketing claim. - User-private, encrypted content support can exist as a separate PRIVATE EVIDENCE lane, but it must not be mixed into employer-facing scoring. If the team refuses to cut semantic detectors from MVP, then the honest alternative is: - stop saying "metadata only" - stop saying "none read meeting content" - rename the product as a constrained hybrid system - explicitly separate structural signals from semantic hypotheses everywhere in product, code, and pitch My recommendation is still the first option: strict structural-only MVP. ================================================== 1. Why this boundary matters technically ================================================== This is not just messaging hygiene. It affects: - construct validity - legal defensibility - fairness - calibration - dispute handling - procurement answers - architecture cleanliness - what confidence can even mean A detector boundary is real only if a dev can answer: - What exact input fields does this detector consume? - Can it run if transcript text is removed? - Can it run if token embeddings are removed? - Can it run if punctuation is removed? - Can it explain itself from timing / diarization / turn graph alone? - Can it abstain when upstream quality is weak? - Can a reviewer challenge it without reading broad transcript context? If the answer to any of those is "no", the detector is not structural-only. ================================================== 2. Internal contradiction already present in Kashi ================================================== Current project materials create a direct contradiction: A. The progress deck says: - Kashi takes meeting transcripts, speaker attribution, and timestamps. - Principle 02 says "Store only structural metadata: turn timing, speaking share, overlap, interruption, latency. Never transcribe for analysis, never infer emotion." - Section 04 says all six detectors are "structural" and "computed from turn timing and speaker attribution alone. None read meeting content." B. But the same deck also says: - "Unanswered-question rate" = questions that receive no substantive response within N turns. - "Topic-credit ignored-turns" = A proposes -> ignored -> B restates similar content -> B is credited. - The shipped detector list describes topic-credit as "deterministic; similarity via embedding distance." - "Agreement-asymmetry" = directional rate at which positions shift toward a speaker after they speak. - Layer 3 also mentions directive concentration, which is also not derivable from timing alone. That means the current system is already mixing two different epistemic classes: 1) timing / diarization / turn-topology measures 2) transcript-meaning measures That mix is precisely what needs to be split. ================================================== 3. Recommended boundary model: three lanes ================================================== LANE A — STRUCTURAL OBSERVATION ENGINE This is the only lane allowed for employer-facing MVP claims such as: - "structural-only" - "no content reading" - "deterministic interaction metadata" - "explainable from timestamps / turns / speaker attribution" Allowed data types: - speaker ID / pseudonymous participant ID - turn start / end timestamps - overlap intervals - silence / latency intervals - meeting type / role tags - diarization confidence - ASR confidence at segment level only as a quality gate, not meaning input - turn graph edges derived from adjacency / interruption / floor transitions - meeting metadata (meeting class, duration, participant count, organizer role) Not allowed in this lane: - raw transcript text as semantic input - embeddings - stance classification - question / intent classification - directive / sentiment / tone classification - topic similarity - content-summary reasoning - "substantive response" judgment - proposal / credit attribution judgments LANE B — HYBRID STRUCTURAL + SEMANTIC ANALYSIS This lane is honest about using transcript meaning. It may still be deterministic in implementation, but it is NOT structural-only. Possible uses: - user-private analytics - research / offline validation - future opt-in modules - analyst-only experimentation under strong governance Required labels: - "hybrid" - "text-informed" - "semantic interpretation involved" - "not available in metadata-only mode" Extra requirements before production use: - transcript-quality gating - diarization-quality gating - multilingual / code-switch gating - semantic-model documentation - subgroup error analysis - stronger abstention rules - challenge workflow with bounded context window - separate retention and access rules LANE C — OFF-LIMITS / REFUSED These should not be built for employer-facing scoring, and some should not be built at all. Refused classes: - emotion / affect / sentiment inference about workers - intent detection ("hostile", "bullying", "abusive", "retaliatory") - harassment / legality classification - personality or psychological-state inference - future-behavior prediction - company-wide relationship health score - covert cross-channel surveillance fusion (meeting + chat + email + browsing) for employer-side behavior scoring - any metric that becomes a direct employment decision input ================================================== 4. Detector taxonomy for the current Kashi set ================================================== 4.1 Safe structural-only (keep in employer-facing MVP) 1. Intrusive interruption Classification: STRUCTURAL-ONLY Why: - Can be defined from overlapping speech timing + truncation pattern + diarization - Does not require transcript semantics if implemented correctly Required inputs: - turn start/end - overlap window - speaker attribution confidence - optional ASR word-boundary timing only for stronger truncation evidence, not meaning Risks: - fails when diarization is weak - fails in heavy crosstalk - meeting-type confound (incident bridge, chair role, standup) Required gates: - diarization confidence threshold - overlap-quality threshold - meeting-type calibration - role tags where available 2. Floor-time Gini / speaking-share inequality Classification: STRUCTURAL-ONLY Why: - derived from total speaker durations or turn counts Required inputs: - speaker-attributed durations and/or counts Risks: - role / facilitator confound - meeting-type confound - all-hands / training / demo sessions Required gates: - meeting-type eligibility - role normalization - minimum participation threshold 3. Dyadic interruption continuity Classification: STRUCTURAL-ONLY Why: - cross-meeting repetition of interruption directionality can be computed from structural interruption events Required inputs: - interruption events + dyad history Risks: - weak if raw interruption detector is weak Required gates: - minimum comparable exposure - meeting-type matching - diarization confidence threshold 4. Speaker baseline drift Classification: STRUCTURAL-ONLY Why: - change in speaking share / turn frequency / interruption burden over time can be computed without semantics Required inputs: - historical structural metrics Risks: - project phase change - role change - meeting mix change Required gates: - meeting-type normalization - role normalization - baseline reset on job/role changes 5. Response-latency asymmetry Classification: STRUCTURAL-ONLY IF STRICTLY TIME-BASED Why: - time gap after a speaker turn can be measured structurally Important caveat: - once the system claims the latency means disagreement, dismissal, or cold-shoulder, interpretation becomes theory-laden and context-heavy Safe use: - keep as observational latency asymmetry Unsafe use: - treat as direct evidence of negative intent Required gates: - cultural / locale pack - meeting-type normalization - turn-boundary confidence 6. Chilling-delta Classification: STRUCTURAL-ONLY ONLY IF THE TRIGGER EVENT IS STRUCTURAL Why: - post-event participation drop can be computed from turn lengths / counts before vs after a trigger Important caveat: - if the trigger event is semantic (e.g. “harsh response”), the whole detector becomes hybrid Safe use: - trigger only from structural events like intrusive interruption or severe floor takeover Unsafe use: - trigger from “dismissive wording” or “directive response” Required gates: - comparable exposure - minimum post-event window - baseline quality - meeting-type normalization 7. Turn-taking graph / reciprocity scores Classification: STRUCTURAL-ONLY IF BASED ON ADJACENCY / FLOOR TRANSITIONS Why: - speaker transition graph can be built without semantics Important caveat: - do not over-interpret graph asymmetry as intent or fairness by itself Required gates: - enough turn volume - meeting-type normalization 4.2 Hybrid structural + semantic (quarantine from metadata-only MVP) 8. Unanswered-question rate Classification: HYBRID Why: - requires at least three semantic steps: a) detect that a turn contains a question b) determine whether a later turn is actually a response to that question c) determine whether the response is substantive - punctuation alone is not reliable enough in ASR transcripts Why it cannot stay structural-only: - "question" and "substantive response" are meaning judgments Recommendation: - remove from employer-facing MVP - if retained, relabel as hybrid and treat as lower-confidence / text-informed Possible downgraded structural proxy: - "turn followed by zero adjacent uptake within N turns" or "no return edge after interrogative cue" But that is NOT the same construct as unanswered-question rate and should be renamed if used. 9. Topic-credit ignored-turns Classification: HYBRID Why: - requires proposal detection, semantic similarity, authorship continuity, and credit attribution - deck already admits similarity via embedding distance Why it cannot stay structural-only: - embeddings are content representations - "same idea restated" is semantic - "credited" often needs lexical, discourse, or meeting-outcome interpretation Recommendation: - quarantine from MVP - do not market as metadata-only - if later used, require a human-reviewable evidence bundle rather than naked score 10. Agreement-asymmetry / position-shift Classification: HYBRID bordering on OFF-LIMITS for MVP Why: - requires representing positions before and after a speaker contribution - in realistic meetings, “agreement shift” can mean persuasion, efficient alignment, confusion, hierarchy, or simple completion of a decision process - this is not derivable from timing alone Recommendation: - remove from MVP - do not ship in employer-facing lane until there is strong meeting-type calibration, semantic validation, and challenge workflow Current status: - this is one of the most attackable detectors in the whole concept 11. Directive concentration / directive density Classification: HYBRID Why: - imperative / directive detection is linguistic classification - also heavily role-dependent (PM, EM, IC, trainer, facilitator) Recommendation: - not in metadata-only MVP - if explored later, only within meeting-type and role-tagged hybrid lane 12. Takeover events Classification: DEPENDS ON DEFINITION - If defined as repeated interruption + floor capture + prolonged hold: structural-only - If defined as topic override / proposal replacement / directional control of agenda meaning: hybrid Recommendation: - split into two versions: a) structural floor-takeover b) semantic topic-takeover - never mix them under one label 4.3 Off-limits / too unstable for employer-facing scoring 13. Harassment classifier Classification: OFF-LIMITS Why: - overclaims construct validity - legally and politically dangerous - current evidence does not support it 14. Intent / hostility classifier Classification: OFF-LIMITS Why: - cannot be reliably inferred from meeting transcripts alone - creates pseudo-psychology and dispute hell 15. Emotion / affect / tone inference Classification: OFF-LIMITS Why: - directly conflicts with Kashi’s current doctrine - prohibited in workplace settings under the EU AI Act except narrow safety/medical cases 16. Future-risk prediction on specific managers or employees Classification: OFF-LIMITS Why: - creates a prediction / employment-risk system rather than a review-support system ================================================== 5. What the dev team should do with the current shipped detector list ================================================== Recommended employer-facing MVP detector set: - Intrusive interruption - Floor-time / speaking-share inequality - Response-latency asymmetry (observational only) - Chilling-delta (structural-trigger only) - Dyadic interruption continuity - Speaker baseline drift - Optional: structural reciprocity / turn-topology imbalance Recommended removals from metadata-only MVP: - Unanswered-question rate - Topic-credit ignored-turns - Agreement-asymmetry - Directive concentration (if currently planned / implied) - Any semantic version of takeover events If the team insists on keeping the current named six: - rename the product doctrine to constrained hybrid - separate outputs by class in UI and API - do not describe all detectors as structural - do not say "none read meeting content" - do not say "metadata only" ================================================== 6. Boundary rules at code level ================================================== Every detector should declare: DetectorClass: - structural_only - hybrid_text_informed - refused SemanticDependency: - none - transcript_text - embedding_model - classifier_model - human_annotation_required InputContract: - required fields - optional fields - abstention conditions - unsupported meeting types - unsupported language conditions Example schema: { "detector_name": "topic_credit_ignored_turns", "detector_class": "hybrid_text_informed", "semantic_dependency": ["transcript_text", "embedding_model"], "requires": [ "speaker_labeled_transcript", "turn_boundaries", "embedding_service_or_local_model" ], "abstain_if": [ "diarization_confidence < threshold", "asr_confidence < threshold", "mixed_language_unresolved", "meeting_type in unsupported_types" ], "employer_facing_allowed": false, "user_private_allowed": true } Non-negotiable rule: No detector may be labeled structural_only if it depends on transcript lexical meaning, embedding similarity, question classification, stance detection, intent classification, or semantic role labeling. ================================================== 7. Acceptance criteria for calling a detector "structural-only" ================================================== A detector is structural-only only if ALL of these are true: 1. It can execute with transcript text removed. 2. It does not use embeddings. 3. It does not use lexical or semantic classification. 4. Its logic can be expressed over timestamps, turn boundaries, speaker IDs, overlaps, durations, adjacency, and meeting metadata only. 5. A reviewer can understand the score from event topology without reading transcript meaning. 6. The detector has explicit abstention conditions tied to upstream quality. 7. The detector has meeting-type and role normalization or else is confined to observation-only mode. 8. The detector has a dispute path for diarization / turn-boundary errors. If any one fails, classify it as hybrid. ================================================== 8. Confidence model: one score is not enough ================================================== Current Kashi rhetoric risks compressing different uncertainties into one composite confidence. That is too mushy. Recommended confidence decomposition: A. Input quality confidence - diarization confidence - ASR confidence - overlap-quality confidence - speaker-identity stability - language / code-switch confidence B. Detector confidence - how directly the construct follows from inputs - sensitivity of the metric to missing / noisy turns - cold-start status C. Context confidence - supported meeting type? - supported locale pack? - role tags known? - enough comparable exposure? D. Evidence grade - observational / weak - repeated / moderate - repeated + well-calibrated / stronger No employer-facing detector should fire review-worthy events unless all four layers clear threshold. Hybrid detectors need stricter thresholds than structural detectors. ================================================== 9. Input-quality gates required before any serious pilot ================================================== The detector boundary is meaningless without upstream gates. Required gates: 1. Transcript-confidence gating 2. Speaker-diarization confidence gating 3. Overlap-quality flagging 4. Device-switch / hidden-identity / room-audio detection 5. Multilingual / code-switch detection 6. Meeting-type classification confidence 7. Role-tag completeness check Why this matters: - Teams transcripts can include speaker names and timestamps, but users can hide identity in captions/transcripts, and room audio may be attributed to the room rather than the individual in some setups. That means speaker-level metrics can become invalid or misleading even before detector logic starts. - Structural-only does not mean input quality stops mattering. It means the detector logic is simpler; the substrate can still fail. Minimum rule: If speaker attribution is weak, any dyadic or person-targeted detector must abstain. ================================================== 10. Meeting-type normalization is part of detector boundary, not a side note ================================================== A detector can be structurally pure and still contextually invalid. Examples: - Incident bridge: high directive concentration and fast interruption may be normal - Standup: short turns are normal - Training: one person dominating floor time is normal - Executive review: challenge patterns differ - 1:1: asymmetry is built into the format - Brainstorm: unfinished turns and overlap can reflect production blocking rather than suppression Therefore: - no universal meeting norm - no cross-type pooling for risk interpretation - unsupported meeting types fall back to observational dashboards only - review-worthy events should only be allowed for supported meeting classes This matters especially for hybrid detectors because semantic meaning also changes by meeting type. ================================================== 11. Legal / governance implications of each lane ================================================== Structural-only lane: - strongest defensibility - easiest procurement story - cleanest "contestable interaction signals" story - still requires human review, audit logs, access control, and challenge rights - should be the only lane visible to employers in MVP Hybrid lane: - weaker defensibility - must be explicitly labeled - should not be smuggled inside "metadata-only" claims - requires stronger review protocol and narrower access - should be pilot-limited or user-private first Off-limits lane: - keep refused - do not create "just for internal use" versions that later drift into customer expectations Important governance point: Once Kashi drifts from observational structural support into monitoring/evaluating worker behaviour for decisions affecting work relationships, AI-act and worker-management risk arguments become much sharper. That is another reason to keep the boundary hard. ================================================== 12. Recommended wording changes ================================================== Current weak wording: - "metadata only" - "none read meeting content" - "all six detectors are structural" - "never transcribe for analysis" Recommended truthful wording if adopting strict structural-only MVP: "Kashi ingests transcript-linked meeting records for speaker attribution, turn boundaries, and timestamps. Employer-facing detection is limited to structural interaction metadata rather than semantic content classification. Structural review signals are computed from timing, overlap, turn topology, latency, and comparable-baseline patterns only." Recommended truthful wording if hybrid detectors remain: "Kashi has two detector classes: structural interaction signals and text-informed hybrid signals. Employer-facing MVP uses structural signals only. Hybrid text-informed analyses are separately governed, explicitly labeled, and not included in metadata-only claims." ================================================== 13. Immediate action list for the team ================================================== P0 — Decide doctrine now Choose one: A. strict structural-only MVP B. honest hybrid MVP Do not keep the current contradiction alive. P0 — Reclassify every detector Tag each detector as: - structural_only - hybrid_text_informed - refused P0 — Cut or quarantine the current semantic detectors from employer-facing MVP At minimum: - unanswered-question rate - topic-credit ignored-turns - agreement-asymmetry P0 — Split API / UI by detector class Structural and hybrid outputs must not be visually mixed as if they have the same epistemic status. P0 — Add abstention logic No output when: - diarization weak - ASR weak - multilingual unresolved - unsupported meeting type - insufficient comparable exposure P1 — Add confidence decomposition Replace monolithic confidence with: - input quality - detector confidence - context confidence - evidence grade P1 — Add meeting-type support table Per detector: - supported meeting types - disallowed meeting types - observational-only meeting types P1 — Add challenge log Every review-worthy event needs: - raw turn IDs - timestamps - detector class - abstention/gating status - known confounds - correction path P1 — Clean the pitch and governance page Remove lines that say all detectors are structural if they are not. ================================================== 14. Sharp final judgment ================================================== The cleanest version of Kashi is not "we can infer more." The cleanest version is "we infer less, but we can defend it." That is the actual moat. A strict structural-only employer-facing MVP is narrower, but it is: - more coherent - more defensible - easier to audit - easier to explain - harder to politically attack - easier to validate - less fragile under transcript error - less likely to become fake certainty The semantic detectors are not worthless. They are just a different product class. Treating them as the same class as interruption and floor-time math is the mistake. ================================================== 15. Source notes ================================================== Internal Kashi materials used - Kashi — Progress & Project Overview (2026-04-21) - Kashi Measurement-Science Research Memo (2026-04-21) - Kashi — Research Synthesis: Legal Defensibility, Procedural Fairness, and Governance Design (2026-04-21) - Kashi Meeting-Type Normalization Research Memo (2026-04-21) Key external sources used - European Commission / AI Act Service Desk FAQ: - prohibited emotion recognition in workplace / education - phased AI Act timeline - NIST AI RMF 1.0 - Microsoft Support: - Teams live transcription includes speaker names and timestamps - speaker identity can be hidden in captions/transcripts - in-room speech may be attributed to the room when speaker recognition is absent - Koenecke et al., PNAS 2020, Racial disparities in automated speech recognition - Mujtaba et al., NAACL 2024, Lost in Transcription: bias against disfluent speech External reference URLs - https://ai-act-service-desk.ec.europa.eu/en/faq - https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-ai-rmf-10 - https://support.microsoft.com/en-gb/office/view-live-transcription-in-microsoft-teams-meetings-dc1a8f23-2e20-4684-885e-2152e06a4a8b - https://support.microsoft.com/en-us/office/hide-your-identity-in-meeting-captions-and-transcripts-in-microsoft-teams-a1095e1a-a2f0-453e-8101-eca76429ff04 - https://support.microsoft.com/en-gb/office/use-microsoft-teams-intelligent-speakers-to-identify-in-room-participants-in-a-meeting-transcription-a075d6c0-30b3-44b9-b218-556a87fadc00 - https://pubmed.ncbi.nlm.nih.gov/32205437/ - https://aclanthology.org/2024.naacl-long.269/