Files
Shade/THREAT-MODEL.md

406 lines
26 KiB
Markdown
Raw Normal View History

# Threat Model
This document describes what Shade protects against and what it doesn't. Read this before deploying Shade in any context where the answers matter.
> Each numbered "Mitigations" entry below ends with a `[tests:]`
> footnote that links to the concrete test file(s) demonstrating the
> mitigation. If a mitigation has no `[tests:]` line, treat it as
> documentary — there is no automated test holding the line yet.
> See [SECURITY.md § Threat-/test-matrix](./SECURITY.md#threat--test-matrix)
> for the consolidated index.
## Assets
The thing we're protecting:
- **Message plaintext** — the actual content of encrypted messages between peers
- **Identity private keys** — long-term Ed25519 signing key + X25519 DH key
- **Session state** — Double Ratchet root keys, chain keys, DH keypairs
## Adversaries we consider
### 1. Network attacker (active)
Can intercept, modify, drop, replay, and inject network traffic between clients and the prekey server, and between two clients.
**Mitigations:**
- All identity-key writes to the prekey server are signed (Ed25519). Tampering is detected.
`[tests: packages/shade-server/tests/server.test.ts — "rejects unsigned registration", "rejects registration with wrong signing key"]`
- Signed requests have a 5-minute replay window.
`[tests: packages/shade-server/tests/server.test.ts — "rejects registration with stale signedAt"]`
- The Double Ratchet binds message headers to ciphertext via AES-GCM AAD, so header tampering breaks decryption.
`[tests: packages/shade-core/tests/ratchet.test.ts — "tampered ciphertext fails", "tampered header (counter) fails due to AAD"; packages/shade-streams/tests/tamper.test.ts; packages/shade-streams/tests/aead.test.ts]`
- Forward secrecy: even if an attacker captures all traffic, compromising a key later doesn't help them read past messages.
`[tests: packages/shade-crypto-web/tests/hardening.test.ts; packages/shade-core/tests/ratchet.test.ts — DH ratchet steps + out-of-order delivery]`
**NOT mitigated:**
- Initial session establishment can be MITM'd if users don't verify identity fingerprints. The prekey server could distribute a fake bundle on first contact. Always compare safety numbers out-of-band for high-stakes communications.
### 2. Malicious or compromised prekey server
The server holds identity public keys and prekey bundles. It can serve them to anyone.
**Mitigations:**
- The server only stores PUBLIC keys, never private ones.
`[tests: packages/shade-server/tests/server.test.ts — registration, bundle fetch, replenish; packages/shade-storage-sqlite/tests/sqlite-prekey-store.test.ts]`
- Write operations are signed with the identity private key, so the server can't forge new identities or replenishments without the user's key.
`[tests: packages/shade-server/tests/server.test.ts — "rejects replenishment signed by wrong identity", "rejects delete signed by wrong identity"]`
- Bundle fetches are unauthenticated, so a malicious server can serve fake bundles. Detection requires out-of-band fingerprint comparison.
`[tests: packages/shade-core/tests/fingerprint-session.test.ts]`
**NOT mitigated:**
- A malicious server can substitute one user's prekey bundle with the server operator's own keys, enabling MITM at session establishment. Users must verify safety numbers to detect this.
**Partially mitigated by V3.12 Key Transparency** (opt-in):
- When the operator runs the server with `keyTransparency: { ... }` and clients pin the operator's STH-signing public key, every bundle fetch returns a Merkle inclusion proof against an append-only Signed Tree Head. A server that swaps `alice`'s bundle for one client and not another, or rewrites history to hide an earlier swap, is detected by an independent witness. KT does **not** prevent first-contact impersonation — a never-seen-before address can still be served maliciously on its very first registration.
`[tests: packages/shade-key-transparency/tests/manager.test.ts — "rotation: new register replaces old"; packages/shade-transport/tests/kt-split-view-e2e.test.ts — "two divergent views at the same tree_size are caught by witness"; packages/shade-server/tests/kt.test.ts — "bundle response carries verified inclusion proof"]`
### 3. Compromised endpoint (post-compromise)
Attacker briefly gains code execution or filesystem access on a user's device, exfiltrates session state, then loses access.
**Mitigations:**
- Forward secrecy: messages sent BEFORE the compromise cannot be decrypted with the leaked state. Old chain keys are zeroed after use.
`[tests: packages/shade-core/tests/ratchet.test.ts — basic send/receive, ping-pong; packages/shade-crypto-web/tests/hardening.test.ts — zeroize]`
- Post-compromise security: as soon as a peer initiates a new DH ratchet step, the leaked state becomes useless for new messages.
`[tests: packages/shade-core/tests/ratchet.test.ts — "alternating messages trigger DH ratchets"]`
- Memory zeroization: message keys and chain keys are wiped from JS memory after use (best-effort — V8 may retain copies).
`[tests: packages/shade-crypto-web/tests/hardening.test.ts — "zeroize" describe block]`
- Identity rotation invalidates leaked at-rest stream-resume secrets (device-key derived from signing key).
`[tests: packages/shade-core/tests/identity-rotation.test.ts; packages/shade-transfer/tests/resume.test.ts]`
**NOT mitigated:**
- An ongoing endpoint compromise can read messages in real time and exfiltrate identity private keys.
- Attackers with persistent access can intercept new identity rotations.
### 4. Compromised device storage
Attacker gains access to the persistent storage (e.g., steals the SQLite file or dumps the PostgreSQL table).
**Mitigations (default, no at-rest encryption):**
- Stream-resume secrets *are* encrypted at rest under a device-key derived from the identity signing key, so a stolen DB without the live identity key cannot resume in-flight transfers.
`[tests: packages/shade-transfer/tests/resume.test.ts]`
- Filesystem-level encryption (LUKS, FileVault, BitLocker) is recommended but is the user's responsibility.
**Mitigations (with at-rest encryption enabled — V3.2 / `@shade/storage-encrypted`):**
- All sensitive payloads are sealed with AES-256-GCM under per-(table, column) field keys derived from a passphrase (scrypt) / OS keychain / app-injected master key. A stolen DB file alone yields no usable private key material.
`[tests: packages/shade-storage-encrypted/tests/encrypted-sqlite.test.ts]`
- AAD binds (table, column, pk) so an attacker cannot swap rows or move ciphertext between columns without triggering decrypt failure.
`[tests: packages/shade-storage-encrypted/tests/encrypted-sqlite.test.ts — "row swap (sessions) → decrypt fails due to AAD mismatch"]`
- Bit-flips in the ciphertext blob are detected by the AEAD tag; the storage layer raises rather than returning corrupt key material.
`[tests: packages/shade-storage-encrypted/tests/aead.test.ts; encrypted-sqlite.test.ts — "flipped ciphertext byte → decrypt fails"]`
- Wrong passphrase / wrong keychain entry is rejected up-front via a fingerprint check, never silently writing under the wrong key.
`[tests: packages/shade-storage-encrypted/tests/encrypted-sqlite.test.ts — "rejects open with wrong key (fingerprint mismatch)"]`
- Online key rotation re-keys every row without downtime; the old key no longer opens the DB after rotation.
`[tests: packages/shade-storage-encrypted/tests/migrate.test.ts — "re-keys all rows; old key no longer opens DB"]`
**NOT mitigated (even with at-rest enabled):**
- A live process holds the storageKey and field keys in memory; an attacker who can read process memory (e.g., via `/proc/<pid>/mem`, swap dump, hibernation file) recovers the keys and thus the data. At-rest encryption protects the DB *file*, not the running process.
- The kernel's swap partition is not encrypted by Shade. If the OS pages key material to disk, it can be recovered. Use an encrypted swap device.
- A coredump of the live process exposes plaintext private keys.
- Filesystem-level encryption of the DB *backup* (e.g. `.bak` file produced by `shade migrate-storage`) is the operator's responsibility — the backup is plaintext during the brief migration window.
- If the master key is lost (forgotten passphrase, deleted keychain entry, lost injected key) the DB is permanently unrecoverable. V3.10 (Social Recovery) is the long-term mitigation.
### 5. Side-channel attacks (timing)
Attacker measures timing of identity verification operations to recover key bits.
**Mitigations:**
- All comparisons of secret material use constant-time XOR-accumulator comparison (`constantTimeEqual`).
`[tests: packages/shade-crypto-web/tests/hardening.test.ts — "constantTimeEqual", "timing variance stays bounded across mismatch positions"]`
- AES-GCM and the underlying primitives are constant-time as implemented by SubtleCrypto and @noble/curves.
`[tests: packages/shade-crypto-web/tests/provider.test.ts; packages/shade-streams/tests/aead.test.ts]`
**NOT mitigated:**
- JavaScript JIT compilation can introduce timing variability that's hard to control.
- We don't claim resistance to power-analysis or fault-injection attacks (out of scope for a JS library).
### 6. Malicious or compromised inbox relay (V3.6 store-and-forward)
The inbox relay holds **ciphertext blobs with TTL** so senders can deliver
to offline recipients. It is a separate trust domain from the prekey
server, and exposes a different surface.
**Mitigations:**
- The relay only stores `address || msgId || ciphertext-bytes || expires_at`.
Plaintext, ratchet state, and any private keys live exclusively on the
client. A DB dump leaks no message content.
`[tests: packages/shade-inbox-server/tests/routes.test.ts; packages/shade-inbox-server/tests/lifecycle.test.ts — "Tamper resistance"]`
- Recipient identity is bound to the address via TOFU: first
`POST /v1/inbox/register` claims the slot, and subsequent fetch/ack
must be Ed25519-signed by the same key. A different key claiming an
existing address is rejected with 401.
`[tests: packages/shade-inbox-server/tests/routes.test.ts — "rejects different key claiming same address", "rejects fetch from a different signing key", "rejects ack from a different signing key"]`
- Each PUT is signed by the sender's per-PUT signing key; the relay
verifies the signature before persisting. Bad sigs return 401.
`[tests: packages/shade-inbox-server/tests/routes.test.ts — "rejects bad sender signature"]`
- `msgId = sha256(ciphertext)` is verified server-side on PUT and
recomputed client-side on FETCH. A relay that flips a bit in storage
produces a digest mismatch the recipient flags as
`inbox.message_decrypt_failed` *without* acking, so the divergence
surfaces in operator telemetry instead of being silently consumed.
`[tests: packages/shade-inbox-server/tests/routes.test.ts — "rejects mismatched msgId"; packages/shade-inbox-server/tests/lifecycle.test.ts — "Tamper resistance"; packages/shade-inbox/tests/client.test.ts — "tamper detection"]`
- Replay-window of ±5 minutes on `signedAt` (matches the prekey
server's policy). Replays past that window return 409.
`[tests: packages/shade-inbox-server/tests/routes.test.ts — "rejects stale signature (replay window)"]`
- Idempotent PUT: two clients (or a buggy retry loop) submitting the
same ciphertext do *not* create duplicate rows; the second PUT
returns 200 with `idempotent: true`.
`[tests: packages/shade-inbox-server/tests/routes.test.ts — "idempotent on duplicate ciphertext"]`
- Periodic `InboxPruneTask` drops blobs past their TTL so a slow
consumer never sees a payload past expiry.
`[tests: packages/shade-inbox-server/tests/lifecycle.test.ts — "prune removes expired blobs but keeps live ones"]`
**NOT mitigated:**
- **Sender-recipient graph leakage.** The relay sees recipient address +
per-PUT sender pubkey + ciphertext byte-counts. Privacy-sensitive
deployments should use address-hashes (`sha256(real-address || salt)`)
and rotate sender signing keys per session. Mixing/onion-routing is
out of scope for V3.6 and a candidate for a future relay tier.
- **Operator-side queue deletion.** A malicious operator can drop every
blob queued for a target, forcing senders to resend. Recipient-side
ack happens *after* successful decrypt, so a delete only burns one
delivery attempt rather than silently consuming a message.
- **TTL-based reachability signal.** A PUT silently expiring after 7
days reveals that the recipient never came online. Operators concerned
with this metadata should clamp TTLs to a fixed value via the
`quota.maxTtlSeconds` / `quota.minTtlSeconds` knobs.
### 7. Denial of service
Attacker floods the prekey server to exhaust resources or one-time prekeys.
**Mitigations:**
- Per-IP rate limiting on registration and bundle fetches.
`[tests: packages/shade-server/tests/rate-limit.test.ts — "register endpoint rate-limits per IP", "rate limit returns Retry-After header"]`
- Per-identity rate limiting on replenish and delete.
`[tests: packages/shade-server/tests/rate-limit.test.ts — "different keys have independent limits"]`
- 64 KiB body size limit on POST endpoints.
`[tests: packages/shade-server/tests/server.test.ts — body-size enforcement]`
- Address validation rejects path traversal and malformed inputs.
`[tests: packages/shade-server/tests/server.test.ts — "rejects invalid address format", "rejects invalid address in URL"]`
- Per-sender ops/byte quotas on `@shade/files` filesystem RPC.
`[tests: packages/shade-files/tests/security/quota.test.ts]`
- Per-recipient blob quota on `@shade/inbox-server` (default 1000 blobs
per address) + per-blob byte cap (default 1 MiB) so a single sender
cannot fill a recipient's queue.
`[tests: packages/shade-inbox-server/tests/routes.test.ts — "rejects ciphertext > maxBlobBytes", "enforces per-address quota"]`
- Per-IP token-bucket on inbox PUT/FETCH/DELETE/REGISTER routes.
**NOT mitigated:**
- Application-level DDoS at the network layer is your hosting platform's responsibility.
### 8. Social-recovery adversaries (V3.10)
Once a user has set up `@shade/recovery`, the guardian set becomes a
new attack surface. We split the threat into four cases:
**8a. Coalition of ≤ k-1 guardians.**
**Mitigations:**
- Shamir Secret Sharing over GF(2^8) is information-theoretically
secure: the shares are points on a polynomial whose constant term
is the secret, and any subset of `< k` points is consistent with
every possible secret. No coalition smaller than the threshold
recovers anything beyond the secret's length.
`[tests: packages/shade-recovery/tests/shamir.test.ts — "k-1 shares yield a wrong (random-looking) result", "property: any k-1 share subset yields a different output than the secret"; packages/shade-recovery/tests/adversarial.test.ts — "property: any (k-1) subset of shares fails to recover the key"]`
**8b. Single malicious guardian who forges a share.**
**Mitigations:**
- The reconstructed `recoveryKey` is authenticated by the AES-GCM
tag inside the backup blob (`Shade.exportBackup`'s ciphertext).
A forged share produces a different reconstructed key; AES-GCM
decryption fails.
- `requestRecovery` exhaustively tries every threshold-sized subset
of received grants until one authenticates; if none do, it raises
`RecoveryReconstructionError` and refuses to apply the result.
The user is told that at least one guardian is malicious.
`[tests: packages/shade-recovery/tests/adversarial.test.ts — "a corrupted share never authenticates against the backup AEAD tag"]`
**8c. Social-engineering (impersonator calls a guardian).**
**Mitigations:**
- The guardian's `approve` callback receives the new device's
TEMPORARY safety number; the spec REQUIRES out-of-band
comparison before approving.
- The shipped `<RecoveryApprove />` widget enforces a two-checkbox
gate ("fingerprint matches" + "I verified OOB") before the
release button is enabled.
- The protocol-level `share-decline` envelope is sent regardless of
whether the guardian's `approve` callback returns false or
throws, so a hard "no" terminates the requesting flow promptly.
`[tests: packages/shade-recovery/tests/adversarial.test.ts — "approve handler that REJECTS a wrong fingerprint never sends a grant", "throwing approve handler counts as decline with descriptive reason"]`
**NOT mitigated:**
- A guardian who is duped by an impersonator AND whose user clicks
through both checkboxes WILL release their share. Defense in
depth requires user education + per-guardian cool-down windows
(a follow-up release).
**8d. Guardian device compromise.**
If an attacker fully owns a guardian's device, they can:
- Read the share + backup blob → contributes one polynomial point.
- Ship `share-grant` envelopes if they convince the guardian's
`approve` callback to return true.
**Mitigations:**
- No single guardian's compromise is sufficient — the threshold
invariant still holds: the attacker needs `k-1` other shares to
rebuild the identity.
- Backup blobs are encrypted at-rest under the guardian's existing
StorageProvider scheme (V3.2 covers this for SQLite/Postgres
backends).
**NOT mitigated:**
- Compromise of `≥ k` guardians simultaneously is a complete break.
This is by design: the recovery flow is meant to survive *device*
loss, not coordinated mass compromise of the social graph.
### 9. Cross-sender X3DH state corruption
Before V3.10, `initReceiverSession` shared a reference to the
receiver's signed prekey keypair with the new session. The first DH
ratchet step zeroed the session's "previous" private key, which
silently zeroed the persisted signed prekey. A second X3DH from a
*different* sender to the same receiver then derived a divergent
root key and decryption failed with "wrong key or tampered data".
This was a pre-existing bug surfaced by the V3.10 multi-sender
recovery flow.
**Mitigations:**
- `initReceiverSession` now copies the localDHKeyPair into the
session so the eventual zeroize touches a scratch buffer, not
the persisted prekey.
`[tests: packages/shade-recovery/tests/integration.test.ts — "recovery from new device with all 5 guardians available"; packages/shade-core/tests/x3dh.test.ts]`
### 10. MITM bypass via skipped fingerprint verification (V3.3)
The strongest mitigation for §1 / §2 / §6 — out-of-band safety-number
verification — is a *user* responsibility. Shade 4.0 ships
`@shade/sdk` fingerprint gates that move it from "convention" to
"enforced policy on the operations that matter".
**Mitigations:**
- `Shade.beforeFirstLargeFile(threshold, handler)` — runs in `upload()`
when payload ≥ threshold (default 10 MiB) and the peer is unverified.
A handler that returns `false` (or throws / is missing in policy-
forbid-TOFU mode) raises `FingerprintNotVerifiedError` (HTTP 403).
`[tests: packages/shade-sdk/tests/fingerprint-gates.test.ts; packages/shade-files/tests/security/fingerprint-gate.test.ts]`
- `Shade.beforeBackupImport(handler)` — receives the *backup-embedded*
fingerprint before any state is written. Decrypted backups whose
embedded identity does not match the user's expectation are
rejected before they touch storage.
`[tests: packages/shade-sdk/tests/fingerprint-gates.test.ts]`
- `Shade.beforeNewDeviceTrust(handler)` — runs from
`Shade.acceptIdentityChange()` after the peer's identity-version is
bumped, so any prior verification automatically goes stale and the
user must re-verify.
`[tests: packages/shade-sdk/tests/fingerprint-gates.test.ts]`
- `markPeerVerified` / `isPeerVerified` / `unmarkPeerVerified` are
storage-backed; the `peer_verifications` + `peer_identity_versions`
tables are subject to V3.2 at-rest encryption when the encrypted
storage backend is used.
- `<FingerprintCompare />` and `<FingerprintGate />` widgets present
the safety number side-by-side and require an explicit "matches"
click before children render.
**NOT mitigated:**
- Apps that never register handlers default to "TOFU + warning". The
warning is logged, not rendered, so a UX that ignores the log
silently keeps TOFU semantics.
- Once verified, a peer's persisted verification stays valid until
identity rotation. A device-compromise that does **not** trigger
rotation keeps the verification alive.
### 11. WebRTC peer-to-peer transport (V3.11)
`@shade/transport-webrtc` lets two peers ship `@shade/transfer` chunks
over an `RTCDataChannel` instead of HTTP. The DTLS layer is opaque to
Shade; we treat WebRTC strictly as a **byte-pipe** — not a trust
boundary.
**Mitigations:**
- The same Double Ratchet that authenticates Shade messages
authenticates the SDP offer / answer / ICE / bye signaling
envelopes. A network attacker who replaces an SDP offer must
forge a ratcheted message — the receiver decrypts via the
existing peer session and rejects on AEAD failure.
`[tests: packages/shade-transport-webrtc/tests/signaling.test.ts; packages/shade-sdk/tests/webrtc-integration.test.ts]`
- Frame payloads on the DataChannel are AES-GCM-sealed by `@shade/streams`
with deterministic nonce + AAD bound to `streamId || laneId || seq ||
isLast`. A WebRTC implementation that returns altered bytes fails
AEAD verification and the receiver raises `StreamDecryptionError`.
`[tests: packages/shade-streams/tests/tamper.test.ts; packages/shade-transport-webrtc/tests/wire-format.test.ts]`
- Glare resolution is deterministic (lexicographic address compare)
so both sides converge on a single connection without re-running
signaling.
`[tests: packages/shade-transport-webrtc/tests/glare.test.ts]`
- When NAT traversal fails, `MultiTransportFallback([webrtc, http])`
demotes to HTTP within the configured `connectTimeoutMs` (default
5 s) without losing chunks already in flight. No silent stall.
`[tests: packages/shade-sdk/tests/webrtc-failover.test.ts]`
- `IRtcFactory` is pluggable; production uses
`globalThis.RTCPeerConnection` (browser / Workers / Deno),
`MemoryRtcFactory` is in-process for tests.
**NOT mitigated:**
- TURN relay metadata. If the deployment ships a TURN server,
the operator sees relayed-byte counts and timing for every flow
that traverses the relay. Use a TURN you control or a hosted
relay you trust.
- Browser/RTC stack vulnerabilities. A compromised
`RTCPeerConnection` implementation is outside the scope of a JS
library; we ride the platform's WebRTC.
- Public STUN exposes the client's public IP to the STUN server.
This is unavoidable without a privacy-preserving NAT discovery
mechanism (out of scope).
### 12. Web-Worker thread boundary (V3.8)
`@shade/crypto-web/worker` runs AEAD, HKDF, HMAC, X25519, Ed25519, and
per-lane stream state inside a dedicated Web Worker so the main thread
never holds key material for very long.
**Mitigations:**
- Lane keys, identity private keys and ratchet chain keys are passed
into the worker once at setup; subsequent operations move plaintext
via transferable `ArrayBuffer`s and never re-export keys.
`[tests: packages/shade-crypto-web/tests/worker-streams.test.ts; packages/shade-crypto-web/tests/worker-provider.test.ts]`
- Idle timeout (default 30 s) calls `terminate()` on the worker, which
drops the global JS heap and releases the OS-level memory backing
any keys that were not yet zeroized.
- `rotate()` and `destroy()` lifecycle controls let apps bound the
worst-case duration any lane key sits in worker memory.
- Worker-protocol version handshake on first message rejects mismatched
workers (e.g. cached old build).
**NOT mitigated:**
- The worker is still inside the same browsing context; an attacker
who can inject script into the page can post a malicious message
and read the worker's reply. CSP and SRI on the worker entrypoint
are the user's responsibility.
- Heap memory is not synchronously wiped when `postMessage` returns
ownership; the runtime may keep deallocated buffers around for
GC. Memory zeroization is best-effort for both threads.
## Assumptions
1. **The user has a secure way to bootstrap trust.** Either:
- Trust on first use (TOFU) — accept the first identity key seen for a peer
- Out-of-band verification — compare safety numbers in person/video before trusting
2. **Cryptographic primitives are sound.** We trust X25519, Ed25519, AES-256-GCM, HKDF-SHA256, HMAC-SHA256.
3. **The runtime is honest.** A malicious Bun/Node/browser runtime can defeat any JS library.
4. **The prekey server is reachable.** If it's offline, new sessions can't be established (but existing sessions continue working).
## Residual risks
| Risk | Severity | Mitigation |
|------|----------|------------|
| MITM at first session establishment | High | Compare safety numbers out-of-band; in 4.0, register `Shade.beforeFirstLargeFile` / `beforeBackupImport` / `beforeNewDeviceTrust` to enforce verification on the operations that matter (V3.3) |
| Identity private key theft from device | Critical | Filesystem encryption, secure enclave (future); V3.10 Social Recovery for *recovery* after loss |
| Prekey server operator runs a "key oracle" attack | Medium | V3.12 Key Transparency (opt-in) detects split-view + history rewrites; gossip via a `LightWitness` raises the cost of a sustained attack |
| TURN relay sees byte-counts of P2P transfers | LowMedium | Only when WebRTC fails over to TURN. Operate your own TURN if the metadata matters |
| Side-channel via JIT timing variability | Low | Constant-time primitives reduce but don't eliminate; V3.8 Web-Worker isolation bounds the lifetime of in-memory key material |
| Metadata visibility to prekey server | Low | Acceptable for most use cases; mix networks for stronger metadata protection |
| Inbox relay sees recipient address + byte-counts | LowMedium | Use address-hashes + per-session sender keys (V3.6 §6); mix-net relay tier is a future candidate |