Files
Shade/docs/recovery.md
Sterister e6fdf31b49
Some checks failed
Test / test (push) Has been cancelled
Cross-platform vectors / TypeScript vectors (bun) (push) Has been cancelled
Cross-platform vectors / Kotlin vectors (gradle) (push) Has been cancelled
Docker build and publish / docker (push) Has been cancelled
Publish / publish (push) Has been cancelled
release(v4.0.0): Shade GA — V3.x consolidation + audit prep
V3.1 → V3.12 consolidated and tagged for the first GA release. Wire
format unchanged from 0.4.x — 4.0 peers interoperate with 0.4.x peers
byte-for-byte. The version bump is semantic: audit-cycle complete,
opt-in surface fully exposed, threat model refreshed for every new
surface.

Highlights:
- All 24 @shade/* packages bumped to 4.0.0 in lockstep.
- CHANGELOG 4.0.0 section is the canonical manifest of what landed.
- THREAT-MODEL extended (§10 fingerprint gates, §11 WebRTC P2P, §12
  Web-Worker boundary) + residual-risks table refreshed.
- OpenAPI now covers all 27 routes: prekey, transfer, KT, inbox,
  bridge, observer, /metrics, /healthz, /ready.
- MIGRATION 0.3.x → 4.0 documented + smoke-tested against
  shade migrate-storage on a real SQLite DB.
- docs/audit/REVIEW-BUNDLE.md + SCOPE.md ready for external reviewer.
- scripts/soak.ts harness for the GA-stable 2-week soak window.
- All V*.md plans archived under docs/archive/ with Status: Done.
- Voice/Video carved out into V5.0; 4.0 audit focuses on the frozen
  non-realtime stack.

Tests: TS 1000/1000 + Kotlin 11/11 cross-platform vectors green.
Docker: gt.zyon.no/stian/shade-prekey:4.0.0 builds and reports
  version 4.0.0 on /health.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:35:35 +02:00

11 KiB

Social Key Recovery (@shade/recovery)

V3.10 closes the biggest UX hole in any E2EE system: "What happens if I lose my phone?". Shade's social-recovery flow lets a user designate n guardians (family / friends / co-workers) at setup time such that any threshold-many k of them can together restore the user's identity onto a new device — without any single guardian being able to do it alone, and without the prekey server ever seeing the recovered key material.

The whole flow ships entirely over existing 1:1 Shade sessions; no server-side recovery agent, no escrow service, no "cloud guardian".


Threat model recap

# Adversary Recovered?
1 Coalition of ≤ k-1 guardians No (information-theoretic, by Shamir construction)
2 Prekey server alone No (server only relays Double-Ratchet ciphertext)
3 Single malicious guardian who forges a share Detected — AES-GCM tag mismatch on the backup blob; requestRecovery exhaustively tries threshold-sized subsets and rejects when none authenticate
4 Social engineering (impersonator calls a guardian) Mitigated, not eliminated — guardians MUST OOB-confirm the new device's safety number before approving (see <RecoveryApprove />)
5 Compromised guardian device Out of scope — see "Guardian compromise" below
6 Compromised primary device at setup time Out of scope — recovery only protects the device; if setup material is exfiltrated, all bets are off

Setup

What the user does

  1. Pick n guardians from their existing peers.
  2. Pick a threshold k (typically ⌈n/2⌉ + 1 to avoid pure-majority dominance but still survive losing one or two).
  3. Run setupRecovery(...).
  4. Print / record a recovery card with:
    • The user's own address
    • setupId
    • k and n
    • The list of guardian addresses
    • Setup-time safety number

The recovery card is the only piece of state the user must remember out-of-band (or store in a password manager). Without it, the user cannot drive recovery on a new device — the new device needs to know who the guardians are.

What happens cryptographically

recoveryKey = random(32 bytes)
backupBlob  = Shade.exportBackup(passphrase = "shade-rk:" + base64url(recoveryKey),
                                 knownAddresses = [...])
shares[i]   = Shamir-split(recoveryKey, k, n)

For each guardian i:

share-deposit envelope:
  shadeRecovery: 1
  type:          "share-deposit"
  flowId, setupId, originalAddress
  threshold (k), guardianCount (n), shareIndex (i)
  shareBytes:    base64url( encodeShare(shares[i]) )
  backupBlob:    Shade.exportBackup output (identical for every guardian)
  setupFingerprint, createdAt

The envelope rides through Shade.send like any other plaintext — double-ratchet encrypted, AAD-bound, replay-safe.

The recoveryKey is zeroized on the primary device immediately after the split returns. The primary therefore retains nothing except setupId and the public roster.

What each guardian stores

Per (originalAddress, setupId):

{
  shareIndex,            // 1..n
  shareBytes,            // base64url-encoded Shamir share
  backupBlob,            // identical for every guardian
  setupFingerprint,      // for sanity-checks at recovery time
  guardianCount, threshold,
  receivedAt
}

The guardian's app provides a RecoveryStore implementation. The package ships MemoryRecoveryStore for tests and small one-shot demos; production guardian apps MUST supply a persistent store (IndexedDB, AsyncStorage, SQLite, etc.). See "Persistence recommendations" below.


Recovery

What the user does on the new device

  1. Boot a fresh Shade with a temporary identity.
  2. Read the recovery card.
  3. In the recovery widget, type / paste:
    • originalAddress
    • setupId
    • threshold
    • The guardian roster
  4. Read the new device's safety number (the widget displays it prominently) to each guardian over a side channel — phone call, in person, whatever they trust.
  5. Wait for ≥ k guardians to approve.

What happens cryptographically

For each guardian, the new device sends:

recovery-request envelope:
  shadeRecovery: 1
  type:          "recovery-request"
  flowId, originalAddress, setupId
  requesterFingerprint  (= safety number of the temporary identity)
  requestedAt

Each guardian's attachGuardian handler:

  1. Looks up its stored deposit by (originalAddress, setupId). If missing, replies with share-decline (reason = "unknown setup").
  2. Invokes the approve callback with the requester's address + fingerprint + the original device's setup-time fingerprint. The callback is the OOB-confirmation gate — it MUST require an explicit user click after they verified the fingerprint. The <RecoveryApprove /> widget enforces this with a two-checkbox gate.
  3. On approve → ships share-grant. On reject → ships share-decline with a short reason.

The new device collects grants, and as soon as k arrive:

  1. Combines the k shares via Lagrange interpolation at x = 0 to reconstruct recoveryKey.
  2. Derives passphrase = "shade-rk:" + base64url(recoveryKey).
  3. Calls Shade.importBackup(backupBlob, passphrase) — the AES-GCM tag in the blob authenticates the reconstruction. A forged share is detected here.
  4. If a guardian forged a share, importBackup throws. The reconstruction loop then tries every other threshold-sized subset of grants until one authenticates (the V3.10 acceptance criterion "no coalition of (k-1) guardians can rebuild the secret" is the safety invariant; the AEAD authenticates which subset is honest).
  5. If every subset fails, RecoveryReconstructionError is raised and the user is told that at least one guardian is malicious.

After importBackup succeeds, the new device hosts the original identity and immediately calls Shade.rotate() to retire the recovery-recovered key material from the conversation graph (the old session keys persisted in the backup blob are now considered "compromised — used for recovery").

The Shade.beforeBackupImport gate fires automatically. Without a registered handler the SDK falls back to TOFU-with-warning (consistent with the V3.3 contract). Production apps SHOULD register a handler that pops the user one more confirmation before the identity rotates.


Acceptance criteria status

  • 3-of-5 recovery works end-to-end on two separate Shade instances. See tests/integration.test.ts.
  • No coalition of (k-1) guardians can reconstruct recoveryKey. Property test asserts this with fast-check across random k/n configurations. See tests/shamir.test.ts and tests/adversarial.test.ts.
  • Guardian-side widget requires fingerprint-confirmation before sending. <RecoveryApprove /> enforces a two-checkbox gate; tests/adversarial.test.ts exercises both the matching-OOB and rejecting-OOB code paths.

Persistence recommendations

The RecoveryStore interface is intentionally small (4 methods). Pick the implementation that fits your platform:

Platform Suggested backing store
Browser (PWA) IndexedDB (one object store, idb)
Browser (extension) chrome.storage.local
React Native AsyncStorage (with crypto-protected blob)
Bun / Node server SQLite via @shade/storage-sqlite extension table OR a side file
Android (native) Room / EncryptedSharedPreferences

Whatever you pick, the records ARE NOT secret on their own — without threshold-many other guardians' shares they're useless — but they should still be stored encrypted-at-rest like any other Shade state. Do not commit them to plaintext logs or network-replicated state.


Guardian-UX guide

How many guardians?

n Survives Comment
3, k=2 1 lost guardian Minimum useful — one device away from danger
5, k=3 2 lost guardians Sweet spot for most users
7, k=4 3 lost guardians Suitable when you genuinely have 7+ trustworthy people
n=k 0 lost DO NOT USE — single point of failure

The widget defaults to k = ⌈n/2⌉ which is liberal but collusion-resistant for n ≥ 3. Apps targeting paranoid users may want to bump that to ⌈2n/3⌉.

Replacing a guardian

If a guardian dies, loses their device permanently, or you no longer trust them:

  1. Pick a replacement.
  2. Run setupRecovery again with the new roster — this generates a fresh setupId and a fresh recoveryKey. The old shares become garbage (no guardian set can use them, because the backupBlob is different).

The widget records the new setupId on the recovery card. Treat this as a hard rotation; the user MUST re-record the card.

Guardian health checks

Periodically (the V3.10 plan suggests a quarterly prompt), the user should confirm each guardian is still reachable. Any guardian who can't be reached in two consecutive prompts SHOULD trigger a re-setup with a fresh roster. The widget UX track is to be added in a follow-up release; the primitive is in place.


Wiring example

import {
  setupRecovery,
  attachGuardian,
  requestRecovery,
  MemoryRecoveryStore,
} from '@shade/recovery';

// On the primary device:
const result = await setupRecovery({
  shade,
  guardians: ['bob', 'carol', 'dan', 'eve', 'faythe'],
  threshold: 3,
  deliver: async (to, envelope) => {
    // wire to your app's existing message-delivery layer
    await myMessageOutbox.send(to, envelope);
  },
});
console.log(result.setupId);

// On each guardian device:
const stop = attachGuardian({
  shade,
  store: myPersistentStore,             // see "Persistence" above
  approve: async (ctx) => {
    // Show ctx.requesterFingerprint to the user.
    // Block until they confirm OOB and click "Release share".
    return await myUI.askApproval(ctx);
  },
  deliver: myMessageOutbox.send,
});

// On the new device:
const recovered = await requestRecovery({
  shade: temporaryShade,                // fresh identity for now
  originalAddress: 'alice',
  setupId: 'sid-from-recovery-card',
  threshold: 3,
  guardians: ['bob', 'carol', 'dan', 'eve', 'faythe'],
  deliver: myMessageOutbox.send,
  onProgress: (p) => myUI.showProgress(p),
});
// `temporaryShade` now hosts the original identity.

Out of scope (V3.10)

  • Cloud guardian / Shade-operated recovery agent. Explicit non-goal; the spec rejects any centralized component that can recover on its own.
  • Auto-distribution. The user must explicitly pick guardians.
  • Multi-share-per-guardian. Each guardian holds exactly one share. Apps that need redundancy should bump n, not give the same guardian multiple shares.
  • Guardian ZK-proofs of liveness. A guardian who refuses to respond is treated as offline; we don't try to compel them.