Files
Shade/MIGRATION.md
Sterister e6fdf31b49
Some checks failed
Test / test (push) Has been cancelled
Cross-platform vectors / TypeScript vectors (bun) (push) Has been cancelled
Cross-platform vectors / Kotlin vectors (gradle) (push) Has been cancelled
Docker build and publish / docker (push) Has been cancelled
Publish / publish (push) Has been cancelled
release(v4.0.0): Shade GA — V3.x consolidation + audit prep
V3.1 → V3.12 consolidated and tagged for the first GA release. Wire
format unchanged from 0.4.x — 4.0 peers interoperate with 0.4.x peers
byte-for-byte. The version bump is semantic: audit-cycle complete,
opt-in surface fully exposed, threat model refreshed for every new
surface.

Highlights:
- All 24 @shade/* packages bumped to 4.0.0 in lockstep.
- CHANGELOG 4.0.0 section is the canonical manifest of what landed.
- THREAT-MODEL extended (§10 fingerprint gates, §11 WebRTC P2P, §12
  Web-Worker boundary) + residual-risks table refreshed.
- OpenAPI now covers all 27 routes: prekey, transfer, KT, inbox,
  bridge, observer, /metrics, /healthz, /ready.
- MIGRATION 0.3.x → 4.0 documented + smoke-tested against
  shade migrate-storage on a real SQLite DB.
- docs/audit/REVIEW-BUNDLE.md + SCOPE.md ready for external reviewer.
- scripts/soak.ts harness for the GA-stable 2-week soak window.
- All V*.md plans archived under docs/archive/ with Status: Done.
- Voice/Video carved out into V5.0; 4.0 audit focuses on the frozen
  non-realtime stack.

Tests: TS 1000/1000 + Kotlin 11/11 cross-platform vectors green.
Docker: gt.zyon.no/stian/shade-prekey:4.0.0 builds and reports
  version 4.0.0 on /health.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:35:35 +02:00

354 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Migration Guide
This document describes how to migrate existing systems with ad-hoc encryption to Shade's Signal Protocol implementation.
## Why migrate?
If you currently use:
- A static AES-256-GCM key per pair (e.g., ECDH at handshake, then never rotated)
- Pre-shared keys distributed at registration time
- Simple per-device symmetric encryption (like Nova's push notifications)
…then you're missing **forward secrecy** and **post-compromise recovery**. Shade gives you both with minimal code changes.
## Migration phases
The recommended migration is a three-phase rollout that lets you ship without downtime:
### Phase 1: Dual-write
- Set up the Shade prekey server alongside your existing system
- New devices register with both systems
- Old devices continue using the legacy encryption
- Both encrypted formats are accepted on read
### Phase 2: Switch reads
- Once the majority of devices are on Shade, prefer Shade for new sessions
- Continue accepting legacy messages for older clients
- Monitor decryption failure rates
### Phase 3: Deprecate
- Remove legacy encryption code
- Force all devices to re-pair via Shade
- Clean up legacy database columns
## Concrete examples
### Example A: Replacing a static AES tunnel
Before (`crypto/e2ee.ts`):
```ts
import { generateKeyPair, deriveSharedSecret, encrypt, decrypt } from './crypto/e2ee.js';
// During pairing
const myKp = await generateKeyPair();
const sharedSecret = await deriveSharedSecret(myKp.privateKey, peerPublicKey);
db.serverConnection.insert({ sharedSecret: exportSecret(sharedSecret) });
// On every message
const { ciphertext, nonce } = await encrypt(sharedSecret, plaintext);
ws.send({ ciphertext, nonce });
```
After (with Shade):
```ts
import { ShadeSessionManager } from '@shade/core';
import { SubtleCryptoProvider } from '@shade/crypto-web';
import { SQLiteStorage } from '@shade/storage-sqlite';
import { ShadeWebSocket, ShadeFetchTransport } from '@shade/transport';
const crypto = new SubtleCryptoProvider();
const storage = new SQLiteStorage('/data/shade.db');
const manager = new ShadeSessionManager(crypto, storage);
await manager.initialize();
// During pairing — fetch peer's bundle and start session
const transport = new ShadeFetchTransport({
baseUrl: 'https://prekey.example.com',
crypto,
signingPrivateKey: (await storage.getIdentityKeyPair())!.signingPrivateKey,
});
const peerBundle = await transport.fetchBundle('peer-id');
await manager.initSessionFromBundle('peer-id', peerBundle);
// On every message — wrap the WebSocket
const shadeWs = new ShadeWebSocket(rawWs, manager, 'peer-id');
shadeWs.onMessage((plaintext) => handleMessage(plaintext));
await shadeWs.send('Hello peer');
```
The key differences:
1. **No static shared secret** — keys ratchet forward with each message
2. **Identity is persistent** — same identity across reconnects, but session keys regenerate
3. **The transport wrapper is transparent** — your application code doesn't change
### Example B: Replacing per-device push encryption
Before (per-device static AES key):
```ts
// Server side
const device = db.pushDevices.findFirst({ where: { id } });
const key = Buffer.from(device.encryptionKey, 'base64');
const encrypted = encryptPayload(notificationJson, key);
sendToFCM({ data: { enc: encrypted, v: '1' } });
```
After (Shade per-device session):
```ts
// Server side
const manager = new ShadeSessionManager(crypto, storage);
await manager.initialize();
// First time per device: fetch their bundle and establish session
if (!await storage.getSession(`device:${deviceId}`)) {
const bundle = await prekeyTransport.fetchBundle(`device:${deviceId}`);
await manager.initSessionFromBundle(`device:${deviceId}`, bundle);
}
const envelope = await manager.encrypt(`device:${deviceId}`, notificationJson);
sendToFCM({ data: { enc: encodeEnvelope(envelope), v: '2' } });
```
Client side:
```kotlin
// Decode the envelope, decrypt via Shade
val envelope = decodeEnvelope(data["enc"]!!)
val plaintext = shadeManager.decrypt("server", envelope)
```
## Database migration
If your existing system stores symmetric keys in the database:
### Before
```sql
CREATE TABLE devices (
id TEXT PRIMARY KEY,
encryption_key TEXT NOT NULL -- base64 AES-256
);
```
### After
```sql
CREATE TABLE devices (
id TEXT PRIMARY KEY,
shade_address TEXT NOT NULL -- e.g. "device:abc123"
-- Shade tables (created automatically by SQLiteStorage):
-- shade_identity, shade_sessions, shade_signed_prekeys, etc.
);
```
The Shade tables are auto-created when you instantiate the storage backend. No manual migration needed.
## Migration for Orchestrator
The Orchestrator project's `orchestrator-shared/src/crypto/e2ee.ts` provides a static ECDH-derived AES-256-GCM key for the workstation↔server sync tunnel. To migrate:
1. **Add Shade dependencies** to `orchestrator-shared/package.json`
2. **Replace `e2ee.ts`** with imports from `@shade/core` and `@shade/transport`
3. **Update the pairing flow** in `sync-server.ts` and `sync-client.ts` to exchange Shade prekey bundles instead of raw ECDH public keys
4. **Wrap the sync WebSocket** with `ShadeWebSocket` for transparent encryption
5. **Migrate the `serverConnection` table** to a `shade_sessions` table (or run dual-write during the rollout)
The key insight: Shade replaces the static `sharedSecret` column with a full ratcheting session, but the WebSocket transport, message types, and application logic don't change.
## Migration for Nova (push notifications)
Nova's `pushDevices.encryptionKey` column is a per-device static AES key. To migrate:
1. **Run a Shade prekey server** (Docker container, see `examples/05-dokploy-deployment`)
2. **On Android device registration**, generate Shade identity + upload prekey bundle to the server (instead of generating a raw AES key)
3. **In the Nova backend**, fetch the device's bundle and establish a Shade session per device
4. **Encrypt notifications via the Shade session** instead of `encryptPayload()`
5. **On the Android client**, decrypt with Shade instead of the static key
6. **Cross-platform interop**: this requires the `shade-android` Kotlin module (not yet built — planned for the M8 milestone)
During the rollout, send notifications with a `v: 1` (legacy) or `v: 2` (Shade) field so old and new clients coexist.
## Migration to at-rest encryption (V3.2)
Shade 0.4.0 ships `@shade/storage-encrypted` — opt-in AES-256-GCM
encryption of every sensitive payload in the local SQLite/Postgres store.
Existing 0.3.x deploys keep their unencrypted DB and behave exactly as
before; encryption is enabled per-deployment with one CLI command.
### One-shot migration (SQLite)
```bash
# Encrypts in place, drops unencrypted tables, leaves a .bak alongside.
shade migrate-storage \
--key-source passphrase \
--passphrase "$SHADE_STORAGE_PASSPHRASE" \
--salt-file /data/shade-client.db.salt
```
For a dry run that validates every row without writing:
`shade migrate-storage … --dry-run`.
### Code-level switch
Replace:
```ts
import { SQLiteStorage } from '@shade/storage-sqlite';
const storage = new SQLiteStorage('/data/shade-client.db');
```
with:
```ts
import { KeyManager, EncryptedSQLiteStorage } from '@shade/storage-encrypted';
const km = await KeyManager.open({
kind: 'passphrase',
passphrase: process.env.SHADE_STORAGE_PASSPHRASE!,
salt: loadSaltFromDisk(),
});
const storage = await EncryptedSQLiteStorage.open({
dbPath: '/data/shade-client.db',
keyManager: km,
});
```
The encrypted store implements the same `StorageProvider`, so
`ShadeSessionManager` and the rest of the wiring is unchanged.
See `docs/storage-encryption.md` for the full design, key sources
(passphrase / OS keychain / app-injected) and rotation.
## Migrating from 0.3.x to 4.0 (GA)
Shade 4.0 is the GA-frozen baseline. Everything from V3.2V3.12 is
merged, externally reviewed, and the wire format is locked. Nothing is
breaking on the wire compared to 0.4.x — peers continue to interoperate.
The 4.0 migration is therefore mostly **opt-in surface activation**
plus a version-bump.
### What stays the same
- Wire envelope `0x02` (RatchetMessage) with u32 length-prefixes.
- Wire envelope `0x11` (stream-chunk) for `@shade/streams`.
- HTTP shape of all `/v1/keys/...` and `/v1/transfer/...` endpoints.
- All `StorageProvider` core method signatures.
- Identity fingerprints, X3DH flow, Ed25519 signature format.
A 0.3.x peer that has not enabled any opt-ins talks to a 4.0 peer
without code changes. The version bump is semantic ("we have completed
the audit cycle"), not breaking.
### What's new (opt-in)
| Surface | Package | How to enable |
|---------|---------|---------------|
| At-rest encryption | `@shade/storage-encrypted` | `shade migrate-storage` (see above) |
| Async store-and-forward | `@shade/inbox`, `@shade/inbox-server` | `createInboxServer()` + `new Inbox()` |
| Bridge transports (SSE, long-poll) | `@shade/transport-bridge`, `createBridgeRoutes()` | mount bridge routes; `FallbackBridgeTransport` |
| Web Workers crypto | `@shade/crypto-web/worker` | `shade.configureWorkerCrypto({ workerUrl })` |
| Social key recovery | `@shade/recovery` | `setupRecovery / attachGuardian / requestRecovery` |
| WebRTC P2P transport | `@shade/transport-webrtc` (peer-dep) | `shade.configureWebRTC({ factory })` |
| Key Transparency | `@shade/key-transparency`, `createPrekeyServerWithKT(...)` | server: `keyTransparency: { ... }` config; client: `keyTransparency: { mode, logPublicKey }` on `createShade` |
| Trust UX gates | built-in to `@shade/sdk` | `shade.beforeFirstLargeFile / beforeBackupImport / beforeNewDeviceTrust(...)` |
| Files RPC | `@shade/files` | `shade.files.serve(handler)` + `shade.files.client(peer)` |
Pulling in **none** of these gives you the 1.0-shape API at 4.0 quality
(audit-completed, soak-tested). Pulling in **all** of them gives the
full 4.0 stack.
### Schema additions
`StorageProvider` implementations (sqlite, postgres, encrypted variants)
auto-create the additional tables on `ensureTables()` /
`initialize()`. The 4.0 superset:
```sql
-- V3.2 (storage encryption) — only when EncryptedSQLiteStorage / EncryptedPostgresStorage is used
shade_master_key_meta(...) -- KeyManager fingerprint + scrypt params
shade_field_keys(...) -- per-(table, column) wrapped DEKs
-- V3.3 (fingerprint gates)
peer_verifications(...) -- markPeerVerified persistence
peer_identity_versions(...) -- bump on acceptIdentityChange
-- V3.6 (inbox relay)
shade_inbox_register(...) -- TOFU bind address ↔ signing key
shade_inbox_blobs(...) -- ciphertext blobs with TTL + msgId
-- V3.10 (recovery)
shade_recovery_setup(...) -- per-recoverer state
shade_recovery_deposits(...) -- per-guardian deposited shares
-- V3.12 (KT — server only)
shade_kt_leaves(...) -- append-only Merkle leaves
shade_kt_index(...) -- address-sorted commitment
shade_kt_sths(...) -- signed tree heads
-- streams resume (V0.2.0+, listed for completeness)
stream_state(...) -- at-rest encrypted streamSecret
```
A 0.3.x deploy that upgrades the package without enabling any new
surface gets these tables created on first start; they stay empty
unless the corresponding feature is wired. There is **no destructive
migration**. To verify before upgrading production:
```bash
shade doctor --db-path /data/shade-client.db
```
The CLI reports any mismatch between the on-disk schema and the version
the installed packages expect.
### Step-by-step upgrade (typical app)
1. **Bump dependencies.** Update every `@shade/*` to `^4.0.0` in your
`package.json`. Bun / npm / pnpm pull from the Gitea registry as
per `.npmrc`.
2. **Re-run install.** `bun install` (or your tool of choice). The new
table definitions ship with the storage backends — no schema-edit
PRs against your DB.
3. **Boot once with no new opt-ins.** Existing send/receive should work
byte-identically. `shade doctor` should print all green.
4. **Pick the opt-ins you actually want.** Wire them one at a time
(storage-encryption first, then fingerprint gates, then any of the
recovery / KT / WebRTC / inbox surfaces). Each surface has its own
doc under `docs/` (`storage-encryption.md`, `trust-ux.md`,
`recovery.md`, `key-transparency.md`, `webrtc.md`, `inbox.md`,
`transport.md`, `web-workers.md`, `files.md`).
5. **Run cross-version smoke.** Boot a 0.3.x peer next to a 4.0 peer in
staging; exchange a session; confirm `shade fingerprint` matches on
both ends and a round-trip message decrypts cleanly.
6. **Ship 4.0 to a canary.** Roll forward; revert path is `bun
install @shade/sdk@^0.4.0` — there is no DB write that 0.4 cannot
also read.
### Operator checklist (prekey container)
If you operate the standalone container (`gt.zyon.no/stian/shade-prekey`):
1. Pull the 4.0 image: `docker pull gt.zyon.no/stian/shade-prekey:4.0.0`.
2. Add new env vars only if you are turning the corresponding surface
on:
- `SHADE_INBOX_PG_URL` / `SHADE_INBOX_DB_PATH` — async store-and-forward.
- `SHADE_INBOX_PRUNE_INTERVAL_MINUTES` — inbox prune cadence.
- `SHADE_BRIDGE_*` — bridge / SSE / long-poll surface.
- `SHADE_KT_*` — Key Transparency mode + signing key path.
- `SHADE_TRANSFER_*` — transfer routes mounted on the same Hono app.
3. Restart with the existing volume; the inbox / KT tables auto-create
on first request.
4. Update `docs/PRODUCTION-CHECKLIST.md` items for any new surface
you've enabled (rate-limit budgets, retention policies, KT
witness-pinning).
5. Verify the [OpenAPI](packages/shade-server/openapi.yaml) endpoints
you advertise to clients now include the routes you mounted.
### What about 4.0 → 4.x?
V4.x is bug-fix only. No wire-bump until V5.0 (voice/video) which
is **additive** — it allocates new envelope types (frame-key prefixes)
that 4.0 clients ignore by design.
## Common pitfalls
1. **Don't store private keys in shared databases without encryption at rest** — for shared infrastructure, enable `@shade/storage-encrypted` (V3.2) or use filesystem encryption / PostgreSQL TDE. The default `SQLiteStorage` and `PostgresStorage` write unencrypted.
2. **Don't skip identity verification** — Shade gives you fingerprints (`getIdentityFingerprint()`), but it's the user's responsibility to compare them out-of-band on first contact.
3. **Don't reuse session storage between identities** — each user/device should have its own Shade storage. Mixing identities in one storage will corrupt the ratchet state.
4. **Keep prekey stocks topped up** — call `ensurePreKeyStock()` periodically (e.g., on app start or every hour). When the server runs out of one-time prekeys, new sessions will fall back to using just the signed prekey, which is slightly less secure.