CRB L4 Spec — Module Q: Loss Prevention
Governing note: All L4 activities in this file are derived from the Canary Python prototype. They are implementation-level specifications for the CanaryGO Go build. Go engineer reads this file; the Python code is the reference for behavior; the Go implementation is the deliverable.
Q.1 — Substrate Ingestion (5 L3s)
The bridge from TSP canonical event stream → Chirp detection engine. Sub4 is the consumer; Chirp rule_engine receives the parsed payload.
Source: tsp/consumers/sub4_detect.py, chirp/rule_engine.py
| L4 ID | Activity | Function / Contract |
|---|---|---|
| Q.1.1.1 | XREADGROUP sub4-detect | Consumer group = sub4-detect. BLOCK_MS configurable. BATCH_SIZE=1. |
| Q.1.1.2 | Extract stream message fields | Fields: event_id, merchant_id, source, event_type, event_hash, raw_payload, received_at, parse_failed |
| Q.1.2.1 | Route by event_type | LP-critical event types routed to detection dispatch; log-only types ACK'd without evaluation |
| Q.1.2.2 | Skip on parse_failed=true | If upstream sub2 flagged parse failure, no rule evaluation; XACK and continue |
| Q.1.3.1 | Load active merchant rules | ThresholdManager.get_all_active_rules(merchant_id) — Valkey chirp:active_rules:{merchant_id} (TTL=300s) → DB MerchantRuleConfig → RULE_CATALOG defaults |
| Q.1.4.1 | Load per-rule thresholds | ThresholdManager.get_thresholds(merchant_id, rule_id) — Valkey chirp:thresholds:{merchant_id}:{rule_id} (TTL=300s) → DB → rule.default_thresholds |
| Q.1.5.1 | Build detection payload | JSON-decode raw_payload → merchant-annotated parsed dict (amount_cents, transaction_type, transaction_date, employee_id, location_id, delay_action, approved_amount_cents, etc.) |
Q.2 — Detection Rules (10 L3s → 37 rules, 3 tiers)
Source: chirp/rule_definitions.py, chirp/stateless_engine.py, chirp/rule_engine.py
Rule Catalog — Complete (37 rules, 10 categories)
| Rule ID | Name | Category | Severity | Tier | Threshold Keys |
|---|---|---|---|---|---|
| C-001 | RAPID_REFUND | payment | high | 3 | seconds=900 |
| C-002 | EXCESSIVE_REFUND_RATE | payment | high | 2 | percent=15, min_transactions=5 |
| C-003 | ROUND_AMOUNT_PATTERN | payment | medium | 2 | count=5, window_seconds=3600 |
| C-004 | AFTER_HOURS_TRANSACTION | payment | medium | 1 | open_hour=6, close_hour=22 |
| C-005 | CARD_VELOCITY | payment | high | 2 | count=5, window_seconds=3600 |
| C-006 | SPLIT_TENDER_PATTERN | payment | medium | 2 | count=3, window_seconds=3600 |
| C-007 | HIGH_VALUE_REFUND | payment | high | 1 | amount_cents=10000 |
| C-008 | MANUAL_ENTRY_SPIKE | payment | medium | 2 | count=5, window=shift |
| C-009 | SQUARE_DELAY_HOLD | payment | critical | 1 | (no threshold — fires on delay_action presence) |
| C-010 | PARTIAL_AUTHORIZATION | payment | high | 1 | variance_cents=0 |
| C-011 | NO_SALE_DETECTED | payment | high | 1 | (no threshold — fires on transaction_type=NO_SALE) |
| C-101 | NO_SALE_ABUSE | cash_drawer | high | 2 | count=5, window=shift |
| C-102 | CASH_VARIANCE | cash_drawer | high | 3 | amount_cents=2000 |
| C-103 | PAID_OUT_ANOMALY | cash_drawer | medium | 3 | amount_cents=5000 |
| C-104 | AFTER_HOURS_DRAWER | cash_drawer | critical | 3 | open_hour=6, close_hour=22 |
| C-201 | EXCESSIVE_DISCOUNT_RATE | order | high | 3 | percent=50 |
| C-202 | LINE_ITEM_VOID_RATE | order | high | 3 | percent=10, min_items=10 |
| C-203 | SWEETHEARTING | order | high | 3 | amount_cents=2000 |
| C-204 | UNTENDERED_ORDER | order | critical | 3 | stale_hours=24 |
| C-301 | OFF_CLOCK_TRANSACTION | timecard | critical | 3 | (no threshold) |
| C-302 | BREAK_TRANSACTION | timecard | high | 3 | (no threshold) |
| C-303 | WRONG_LOCATION | timecard | high | 3 | (no threshold) |
| C-501 | HIGH_VOID_RATE | void | high | 2 | count=5, window=shift |
| C-502 | POST_VOID_ALERT | void | critical | 2 | immediate_max_seconds=120 (no alert), watch_max_seconds=900 (high), suspicious_max_seconds=28800 (critical), self_refund_score_boost=10, off_clock_score_boost=15 |
| C-601 | GIFT_CARD_LOAD_VELOCITY | gift_card | high | 2 | count=3, window_seconds=3600 |
| C-602 | GIFT_CARD_DRAIN | gift_card | critical | 3 | seconds_after_load=1800 |
| C-801 | RAPID_POINT_ACCUMULATION | loyalty | medium | 2 | count=5, window_seconds=3600 |
| C-802 | BULK_REDEMPTION | loyalty | high | 3 | points=5000 |
| C-803 | CROSS_LOCATION_VELOCITY | loyalty | high | 2 | location_count=3, window_seconds=7200 |
| C-804 | ENROLLMENT_FRAUD | loyalty | medium | 2 | count=10, window_seconds=86400 |
| C-901 | SRA_THRESHOLD_BREACH | composite | high | 3 | sra_pct_sales_max=3.0 |
| C-D01 | DISPUTE_CREATED | dispute | high | 1 | (no threshold) |
| C-D02 | DISPUTE_LOST | dispute | critical | 1 | (no threshold) |
| C-D03 | DISPUTE_VELOCITY | dispute | high | 2 | count=3, window_days=30 |
| C-I01 | INVOICE_OVERDUE | invoice | medium | 1 | (no threshold) |
| C-I02 | INVOICE_CHARGE_FAILED | invoice | high | 1 | (no threshold) |
| C-I03 | HIGH_VALUE_INVOICE_UNPAID | invoice | high | 1 | amount_cents=50000 |
Tier 1 rules (12 total): C-004, C-007, C-009, C-010, C-011, C-D01, C-D02, C-I01, C-I02, C-I03 — stateless, pure function, no I/O
Tier 2 rules (12 total): C-002, C-003, C-005, C-006, C-008, C-101, C-501, C-502, C-601, C-801, C-803, C-804, C-D03 — Valkey counters
Tier 3 rules (13 total): C-001, C-102, C-103, C-104, C-201, C-202, C-203, C-204, C-301, C-302, C-303, C-602, C-802, C-901 — full DB
Q.2.1 — Tier 1 Stateless Evaluation
Source: chirp/stateless_engine.py → evaluate_stateless()
evaluate_stateless(parsed: Dict, thresholds: Optional[Dict], disabled_rules: Optional[set]) → List[Dict]
| L4 ID | Activity | Implementation Detail |
|---|---|---|
| Q.2.1.1 | Guard: empty payload | Return [] if parsed is falsy |
| Q.2.1.2 | Check C-004 (AFTER_HOURS) | _check_after_hours(): parse transaction_date → extract hour → compare to [open_hour, close_hour) |
| Q.2.1.3 | Check C-007 (HIGH_VALUE_REFUND) | _check_high_value_refund(): guard transaction_type IN (RETURN, REFUND); abs(amount_cents) >= threshold |
| Q.2.1.4 | Check C-009 (DELAY_HOLD) | _check_square_delay_hold(): delay_action present AND txn_type NOT in (SALE, RETURN, VOID, POST_VOID) |
| Q.2.1.5 | Check C-010 (PARTIAL_AUTH) | _check_partial_auth(): approved_amount_cents < amount_cents AND gap > variance_cents |
| Q.2.1.6 | Check C-011 (NO_SALE) | _check_no_sale(): transaction_type == NO_SALE |
| Q.2.1.7 | Build alert dict | _make_alert(rule_id, parsed, details) → |
| Q.2.1.8 | Disabled rule skip | Each checker guarded by if "{rule_id}" not in disabled_rules |
Q.2.2–Q.2.3 — Tier 2 (Valkey) and Tier 3 (DB) Rules
Tier 2 L4 pattern (applies to all Tier 2 rules):
1. Increment Valkey counter key chirp:{rule_id}:{merchant_id}:{employee_id|card_id|location_id}:{window}
2. Set TTL = window_seconds on first write
3. Read counter value → compare to threshold count
4. If count >= threshold: fire alert via _make_alert()
Tier 3 L4 pattern (applies to all Tier 3 rules): 1. Query PostgreSQL for required aggregate (shift totals, order line items, timecard state, etc.) 2. Apply threshold comparison 3. If threshold breached: fire alert
C-502 POST_VOID_ALERT tiering (GRO-289):
- immediate_max_seconds=120: <2 min gap → NO alert (legit correction)
- watch_max_seconds=900: 2-15 min gap → WATCH tier (severity=high)
- suspicious_max_seconds=28800: 15 min - 8 hr gap → SUSPICIOUS tier (severity=critical)
- Same-employee self-refund: +10 score boost
- Off-clock refund employee: +15 score boost
Q.2.4–Q.2.5 — Rule Catalog API and Threshold Lookup
| L4 ID | Function | Behavior |
|---|---|---|
| Q.2.4.1 | get_rule(rule_id) |
RULE_MAP dict lookup → ChirpRuleDefinition or None |
| Q.2.4.2 | get_rules_by_category(category) |
Filter RULE_CATALOG by category |
| Q.2.4.3 | get_rules_by_tier(tier) |
Filter RULE_CATALOG by tier int |
| Q.2.4.4 | get_stateless_rules() |
Alias for get_rules_by_tier(1) |
| Q.2.5.1 | ThresholdManager.get_thresholds(merchant_id, rule_id) |
Cache key chirp:thresholds:{merchant_id}:{rule_id} → DB MerchantRuleConfig.custom_threshold → rule.default_thresholds |
| Q.2.5.2 | ThresholdManager.get_all_active_rules(merchant_id) |
Cache key chirp:active_rules:{merchant_id} (TTL=300s) → DB configs → RULE_CATALOG (rules with is_enabled=False skipped) |
| Q.2.5.3 | ThresholdManager.set_merchant_threshold(merchant_id, rule_id, thresholds) |
Upsert MerchantRuleConfig; db.flush(); call invalidate_cache(merchant_id, rule_id) |
| Q.2.5.4 | ThresholdManager.invalidate_cache(merchant_id, rule_id) |
Delete chirp:thresholds:{m}:{r} AND chirp:active_rules:{m} |
Q.3 — Case Management / Fox (5 L3s)
Source: fox/case_service.py, fox/tools.py
Key invariant: chain_hash computed by PostgreSQL trigger compute_entry_hash() BEFORE INSERT (P0-3, B-001). Application code MUST NOT set chain_hash or previous_chain_hash.
⚠️ GO SCHEMA NOTE — unified model: The Python prototype implemented separate fox_cases, fox_subjects, fox_case_actions, fox_timeline, and fox_evidence tables as a build-speed shortcut. CanaryGO does not use these tables. Fox (cmd/fox) is the LP-specific case creation pathway into the unified Hawk schema. All Fox tools write to hawk_cases, hawk_subjects, hawk_actions, hawk_timeline. The LP-to-Hawk link is the chirp_alert_id FK on hawk_cases. Fox case types are incident_type rows in hawk_incident_types (theft, return_abuse, cash_variance, etc.). The L4 function behaviors documented below are behaviorally accurate; only the table names change. See CRB-L4-W.md §"W Module vs Q Module" for the full Fox/Hawk ownership map.
Data Model (Python prototype VALID_* frozensets → Go: Hawk schema equivalents)
Case statuses: open → investigating → pending_review → escalated → closed
→ referred_to_le
Case types: theft | fraud | policy_violation | cash_variance | return_abuse | transaction_review | other
(CanaryGO: rows in hawk_incident_types with incident_class=internal)
Case priorities: low | medium | high | critical
Subject types: employee | customer | vendor | unknown (hawk_subjects.subject_type)
Action types: investigate | interview | suspend | terminate | refer_to_le | refer_to_hr | coaching | no_action | status_change
(CanaryGO: INTERNAL_ACTION_CODES in HawkCaseService)
Timeline events: created | status_change | note_added | evidence_added | assigned | escalated | closed | action_added
(hawk_timeline.event_type)
Timeline metadata schema:
- status_change: requires {old_status, new_status}
- evidence_added: requires {evidence_id}
- assigned: requires {assigned_to}
- All others: no required keys
FoxCaseService Methods (L4 Activities)
Python prototype uses FoxCase/FoxSubject models. CanaryGO equivalent operates on HawkCase/HawkSubject with chirp_alert_id set on LP-originating cases.
| L4 ID | Method | Inputs | Behavior |
|---|---|---|---|
| Q.3.1.1 | get_cases(merchant_id, status, created_after, page, limit) |
merchant_id required; status/created_after optional filters | Query FoxCase (→ hawk_cases), desc by opened_at, paginated; returns (cases, pagination) |
| Q.3.1.2 | get_case_by_id(case_id, merchant_id) |
case_id + merchant_id | .one() — raises NoResultFound if not found (Hawk equivalent uses .first() + caller handles None — see E.3.1.2) |
| Q.3.2.1 | add_subject(case_id, subject_type, entity_id, name, role) |
subject_type in VALID_SUBJECT_TYPES | Validate subject_type; resolve_entity() for employee type (checks Employee table); INSERT FoxSubject (→ hawk_subjects); flush |
| Q.3.2.2 | add_action(case_id, action_type, description, user_id) |
action_type in VALID_ACTION_TYPES | INSERT FoxCaseAction (→ hawk_actions); INSERT FoxCaseTimeline (→ hawk_timeline) event_type="action_added"; flush |
| Q.3.3.1 | get_case_evidence(case_id) |
case_id | Query FoxEvidence ordered by uploaded_at ASC |
| Q.3.3.2 | get_evidence_by_id(evidence_id, case_id) |
both IDs | .one() |
| Q.3.4.1 | validate_timeline_metadata(event_type, metadata) |
event_type string; metadata dict or JSON string | Validate required keys per _TIMELINE_METADATA_REQUIRED; raises ValueError on violation |
| Q.3.5.1 | resolve_entity(db_session, subject_type, entity_id) |
subject_type="employee" | Query Employee.id; raises ValueError if not found |
Q.3 MCP Tool Surface (from fox/tools.py → CanaryGO cmd/fox)
cmd/fox provides LP-specific case creation tools. All writes land in Hawk tables. cmd/hawk owns the full case management surface (CRUD, workflow, card generation). Fox tools are LP alert→case entry points.
| Tool | Handler | Purpose |
|---|---|---|
| create_case | _handle_create_case |
Open new LP case (sets chirp_alert_id on hawk_cases if alert-linked) |
| get_case | _handle_get_case |
Get case details by ID |
| list_cases | _handle_list_cases |
List cases with status/date filters |
| update_case_status | _handle_update_case_status |
Transition status (validates state machine) |
| add_subject | _handle_add_subject |
Link employee/vendor/party to case |
| get_timeline | _handle_get_timeline |
Get append-only hash-chained audit trail |
| verify_chain | _handle_verify_chain |
Verify evidence hash chain integrity |
| link_alert | _handle_link_alert |
Link alert to existing case (sets chirp_alert_id FK) |
Q.4 — Alert Triggers (4 L3s)
Source: alert_lifecycle.py
Alert model: APPEND-ONLY. Status transitions written to AlertHistory; alert record is immutable.
| L4 ID | Function | Behavior |
|---|---|---|
| Q.4.1.1 | Alert creation on rule fire | Chirp evaluation returns alert dict → persist to alerts table with status="new" |
| Q.4.1.2 | is_stale(alert_created_at, ttl_days=14) |
Returns bool: alert_created_at < (now - ttl_days) |
| Q.4.2.1 | resolve_status(alert_id, history_by_alert) |
Walk AlertHistory entries sorted ASC; return latest status or "new" |
| Q.4.2.2 | is_terminal(status) |
status in TERMINAL_STATUSES |
| Q.4.3.1 | get_active_alerts(alerts, history_by_alert, ttl_days) |
Filter: not terminal AND not stale |
| Q.4.3.2 | get_stale_alerts(alerts, history_by_alert, ttl_days) |
Filter: not terminal AND is_stale |
| Q.4.4.1 | build_archive_entries(stale_alerts, ttl_days) |
Create ArchiveEntry(alert_id, status="archived", changed_by="system:ttl", notes="Auto-archived: unactioned for N+ days") |
| Q.4.4.2 | age_decay_factor(alert_created_at, ttl_days) |
Linear decay: 1.0 at creation → 0.0 at TTL; used by Owl for priority scoring |
| Q.4.4.3 | alert_age_label(alert_created_at) |
Human label: "just now" / "{m}m ago" / "{h}h ago" / "{d}d ago" breakpoints |
| Q.4.4.4 | lifecycle_summary(alerts, history_by_alert) |
Returns {total, active, stale, archived, resolved, dismissed, case_opened} counts |
Alert status model:
Active statuses: new | investigating | escalated
Terminal statuses: resolved | dismissed | case_opened | archived
Q.5 — Alert Delivery (4 L3s)
Source: alerts/tools.py, notification_service.py, notification_dispatcher.py, notification_preferences.py
| L4 ID | Tool / Function | Behavior |
|---|---|---|
| Q.5.1.1 | lifecycle_summary tool |
Pure function — computes active/stale/archived/resolved/dismissed/case_opened counts from alert list + history dict |
| Q.5.1.2 | resolve_alert tool |
Write AlertHistory(status="resolved") for alert_id; merchant_id from context |
| Q.5.1.3 | dismiss_alert tool |
Write AlertHistory(status="dismissed") for alert_id; optional notes |
| Q.5.2.1 | Alert routing config | Route by severity/type/store to correct investigator (notification_preferences model) |
| Q.5.3.1 | Alert acknowledgment | Investigator calls resolve_alert or dismiss_alert via MCP tool |
| Q.5.4.1 | Escalation delivery | Unacknowledged alert beyond TTL threshold → build_archive_entries → escalate |
Q.6 — Allow-List Management (5 L3s)
Source: chirp/threshold_manager.py, alert_config.py
The allow-list IS the threshold model. "Allow-listing" a cashier/discount/void reason = setting merchant-specific thresholds that prevent rule firing for known-good patterns.
| L4 ID | Activity | Implementation |
|---|---|---|
| Q.6.1.1 | Dead-count allow-list | Set MerchantRuleConfig for C-011 (NO_SALE_DETECTED) with allowed employee_ids list |
| Q.6.2.1 | Known-discount allow-list | Set MerchantRuleConfig for C-007/C-201/C-203 with elevated threshold or approved_patterns list |
| Q.6.3.1 | Admin-void allow-list | Set MerchantRuleConfig for C-501/C-502 with allowed void reason codes |
| Q.6.4.1 | Manager-comp allow-list | Set MerchantRuleConfig for C-203 (SWEETHEARTING) with approved comp reason codes |
| Q.6.5.1 | Training-mode flag | Set MerchantRuleConfig.is_enabled=False for ALL rules for duration; invalidate_cache(merchant_id) |
| Q.6.x.1 | set_merchant_threshold(merchant_id, rule_id, thresholds) |
Upsert MerchantRuleConfig; flush; invalidate Valkey cache keys |
| Q.6.x.2 | invalidate_cache(merchant_id, rule_id=None) |
Delete specific rule key AND active_rules list key |
Q.7 — MCP Investigator Tools (6 L3s)
Source: chirp/tools.py, fox/tools.py, alerts/tools.py
Full MCP tool manifest for the Go cmd/chirp + cmd/fox + cmd/alert servers:
| Tool | Server | Handler | Purpose |
|---|---|---|---|
| get_rules | chirp | _handle_get_rules |
Filter RULE_CATALOG by category/tier; returns rule list |
| get_rule | chirp | _handle_get_rule |
Single rule lookup by rule_id |
| evaluate_stateless | chirp | _handle_evaluate_stateless |
Tier 1 eval: parsed payload → alert list |
| apply_sensitivity | chirp | _handle_apply_sensitivity |
Apply sensitivity preset (strict/default/relaxed/minimal) to merchant thresholds |
| get_templates | chirp | _handle_get_templates |
Return available templates (retail_standard/food_service/high_value/high_volume/new_merchant) |
| apply_template | chirp | _handle_apply_template |
Apply a named template to merchant threshold config |
| validate_thresholds | chirp | _handle_validate_thresholds |
Validate threshold dict against rule schema |
| get_config_summary | chirp | _handle_get_config_summary |
Return merchant's current Chirp config summary |
| estimate_sensitivity | chirp | _handle_estimate_sensitivity |
Given threshold set, estimate expected alert volume |
| get_merchant_thresholds | chirp | _handle_get_merchant_thresholds |
ThresholdManager.get_all_active_rules(merchant_id) |
| create_case | fox | _handle_create_case |
Open new LP case |
| get_case | fox | _handle_get_case |
Case detail by ID |
| list_cases | fox | _handle_list_cases |
Case list with filters |
| update_case_status | fox | _handle_update_case_status |
Status transition (state machine validated) |
| add_subject | fox | _handle_add_subject |
Link subject to case |
| get_timeline | fox | _handle_get_timeline |
Append-only timeline |
| verify_chain | fox | _handle_verify_chain |
Chain integrity check |
| link_alert | fox | _handle_link_alert |
Link alert to case |
| lifecycle_summary | alert | _handle_lifecycle_summary |
Alert count breakdown |
| resolve_alert | alert | _handle_resolve_alert |
Write resolved status |
| dismiss_alert | alert | _handle_dismiss_alert |
Write dismissed status |
Notes for Go Implementation
-
Alert model is append-only. No UPDATE on alerts table. All status changes go to AlertHistory (or equivalent). Fox enforces the same: chain_hash computed by PostgreSQL trigger — application NEVER sets it.
-
Threshold cache invalidation is mandatory on write.
set_merchant_threshold()must callinvalidate_cache()before returning. Stale Valkey thresholds cause false negatives in rule evaluation. -
Tier 1 rules are a pure function.
evaluate_stateless()has no DB or Valkey calls. It must remain that way — it's on the hot path of every webhook event. Any rule that requires a lookup is Tier 2 or 3. -
C-502 self-refund boost is a scoring modifier, not a separate rule. The +10 score boost for same-employee sale+refund is applied inside the C-502 handler, not as a separate alert.
-
Rule catalog is a compile-time constant. RULE_CATALOG is a frozen list of frozen dataclasses. It should NOT be DB-backed in Go — it's code, not config. Merchant-specific changes are all in the threshold layer on top.
-
Fox writes to Hawk tables (no fox_cases in Go). The Python prototype's separate fox_* tables are prototype debt. In CanaryGO, cmd/fox tools operate on hawk_cases, hawk_subjects, hawk_actions, hawk_timeline. The LP-specific field is chirp_alert_id FK on hawk_cases. LP incident types are rows in hawk_incident_types. Fox case vocabulary maps directly to HawkIncidentType.incident_class=internal rows.
Pass: GRO-670 / GRO-675 | Q L4 spec complete | 37 rules, all L4s derived from production Python code