What this is

The platform's standard for storing PII as a one-way hash when the field needs to be queryable for deduplication or cross-reference but the plaintext must never round-trip back. The construction is HMAC-SHA256 with a per-domain server-side secret key. Plain SHA-256 over PII is prohibited — its plaintext domain is small enough to enumerate, and any read access to the hash column trivially recovers the plaintext.

Purpose

Some PII fields cannot be encrypted-and-decrypted because the platform genuinely never needs to recover the plaintext — only to test "is this the same person who showed up before?" Loyalty phone-number lookup is the canonical case. The naive answer is plain SHA-256, justified as "one-way." That justification is wrong when the plaintext space is small.

The North American Numbering Plan is roughly 10^10 candidates. The global mobile phone space is under 10^13. An attacker who reads a phone_hash column from a leaked backup, a compromised replica, a SQL-injection vector elsewhere, or as an insider with DBA access can precompute SHA-256 for the entire NANP space in well under an hour on a single commodity GPU and join the rainbow table back against every row. Email addresses are similar — the candidate space for any specific customer base is enumerable.

The keyed HMAC blocks the offline brute force. The same input with the same key always produces the same output, so deduplication and cross-reference still work. But an attacker without the key cannot precompute the rainbow table. The cost of compromise drops from "everyone's phone number" to "no recovery without compromising the secret separately."

Construction

phone_hash = HMAC-SHA256(PHONE_HASH_KEY, normalize(phone))
customer_email_hash = HMAC-SHA256(EMAIL_HASH_KEY, normalize(email))

The Go primitive lives in internal/security/pii_hash.go as HashPII(normalized string, key []byte) []byte. The output is 32 bytes (BYTEA in Postgres, never TEXT). Storage as TEXT is a smell — it implies the hash is being treated as opaque text rather than as binary cryptographic output.

Normalization rules

Normalization must run before HMAC, never after, and must be deterministic so the same person always produces the same hash. Domain-specific:

Per-domain keys

Each PII hash key is its own key class, loaded from Secrets Manager as its own environment variable, with its own rotation schedule. Reusing one key across domains creates two failure modes from one compromise.

Key class Domain Source SDD
PHONE_HASH_KEY Phone numbers (loyalty, customer contact) tsp-parse.md loyalty parser → loyalty_accounts.phone_hash
EMAIL_HASH_KEY Email addresses written to RaaS chain payloads for cross-channel correlation ecom-channel.mdcustomer_email_hash

Each key is exactly 32 bytes (256-bit). Same constraint as CANARY_ENCRYPTION_KEY. Generated with openssl rand -hex 32. Loaded by runtime.InitSecurity() at service startup; missing key fatals the service rather than degrading silently.

Immutable chain context — key versioning

If the hash is written into an append-only or on-chain payload (e.g., the RaaS chain anchored to Bitcoin L2), key rotation cannot retroactively re-hash historical events. The chain is sealed.

The pattern: store the hash with a key version index. Chain payload carries {key_version: N, hash: <bytes>}. Verification at any point in time looks up the key version, retrieves the appropriate (still-retained) historical key, and recomputes. Old key versions are retired from new writes but never deleted. Rotation procedure must keep all prior versions available indefinitely for verification queries.

This is different from the encryption key rotation pattern, where ciphertext can be re-encrypted under a new key. Hash keys cannot be rotated forward without breaking historical hashes — they can only be added.

When to use a hash, when to use encryption

The decision is whether the platform ever needs the plaintext back.

Situation Use
Need to query equality (dedup, cross-reference) and never need plaintext back Keyed hash (this card)
Need to display the plaintext to an authorized user AES-256-GCM encryption with CANARY_ENCRYPTION_KEY
Need both — display in normal flow, but support "right to be forgotten" deletion Per-subject AES-256-GCM (see platform-cryptographic-erasure)
Plaintext is genuinely public and integrity is the only concern Plain SHA-256 is fine — content addressing of webhooks, Merkle leaf hashing, evidence chain hash

The mistake to avoid: choosing plain SHA-256 because "it's a hash, hashes are one-way." The one-way property only holds when the input space is large enough to defeat brute force. For PII, the input space is too small.

Invariants

Sources