Files
Shade/MIGRATION.md

354 lines
15 KiB
Markdown
Raw Normal View History

# Migration Guide
This document describes how to migrate existing systems with ad-hoc encryption to Shade's Signal Protocol implementation.
## Why migrate?
If you currently use:
- A static AES-256-GCM key per pair (e.g., ECDH at handshake, then never rotated)
- Pre-shared keys distributed at registration time
- Simple per-device symmetric encryption (like Nova's push notifications)
…then you're missing **forward secrecy** and **post-compromise recovery**. Shade gives you both with minimal code changes.
## Migration phases
The recommended migration is a three-phase rollout that lets you ship without downtime:
### Phase 1: Dual-write
- Set up the Shade prekey server alongside your existing system
- New devices register with both systems
- Old devices continue using the legacy encryption
- Both encrypted formats are accepted on read
### Phase 2: Switch reads
- Once the majority of devices are on Shade, prefer Shade for new sessions
- Continue accepting legacy messages for older clients
- Monitor decryption failure rates
### Phase 3: Deprecate
- Remove legacy encryption code
- Force all devices to re-pair via Shade
- Clean up legacy database columns
## Concrete examples
### Example A: Replacing a static AES tunnel
Before (`crypto/e2ee.ts`):
```ts
import { generateKeyPair, deriveSharedSecret, encrypt, decrypt } from './crypto/e2ee.js';
// During pairing
const myKp = await generateKeyPair();
const sharedSecret = await deriveSharedSecret(myKp.privateKey, peerPublicKey);
db.serverConnection.insert({ sharedSecret: exportSecret(sharedSecret) });
// On every message
const { ciphertext, nonce } = await encrypt(sharedSecret, plaintext);
ws.send({ ciphertext, nonce });
```
After (with Shade):
```ts
import { ShadeSessionManager } from '@shade/core';
import { SubtleCryptoProvider } from '@shade/crypto-web';
import { SQLiteStorage } from '@shade/storage-sqlite';
import { ShadeWebSocket, ShadeFetchTransport } from '@shade/transport';
const crypto = new SubtleCryptoProvider();
const storage = new SQLiteStorage('/data/shade.db');
const manager = new ShadeSessionManager(crypto, storage);
await manager.initialize();
// During pairing — fetch peer's bundle and start session
const transport = new ShadeFetchTransport({
baseUrl: 'https://prekey.example.com',
crypto,
signingPrivateKey: (await storage.getIdentityKeyPair())!.signingPrivateKey,
});
const peerBundle = await transport.fetchBundle('peer-id');
await manager.initSessionFromBundle('peer-id', peerBundle);
// On every message — wrap the WebSocket
const shadeWs = new ShadeWebSocket(rawWs, manager, 'peer-id');
shadeWs.onMessage((plaintext) => handleMessage(plaintext));
await shadeWs.send('Hello peer');
```
The key differences:
1. **No static shared secret** — keys ratchet forward with each message
2. **Identity is persistent** — same identity across reconnects, but session keys regenerate
3. **The transport wrapper is transparent** — your application code doesn't change
### Example B: Replacing per-device push encryption
Before (per-device static AES key):
```ts
// Server side
const device = db.pushDevices.findFirst({ where: { id } });
const key = Buffer.from(device.encryptionKey, 'base64');
const encrypted = encryptPayload(notificationJson, key);
sendToFCM({ data: { enc: encrypted, v: '1' } });
```
After (Shade per-device session):
```ts
// Server side
const manager = new ShadeSessionManager(crypto, storage);
await manager.initialize();
// First time per device: fetch their bundle and establish session
if (!await storage.getSession(`device:${deviceId}`)) {
const bundle = await prekeyTransport.fetchBundle(`device:${deviceId}`);
await manager.initSessionFromBundle(`device:${deviceId}`, bundle);
}
const envelope = await manager.encrypt(`device:${deviceId}`, notificationJson);
sendToFCM({ data: { enc: encodeEnvelope(envelope), v: '2' } });
```
Client side:
```kotlin
// Decode the envelope, decrypt via Shade
val envelope = decodeEnvelope(data["enc"]!!)
val plaintext = shadeManager.decrypt("server", envelope)
```
## Database migration
If your existing system stores symmetric keys in the database:
### Before
```sql
CREATE TABLE devices (
id TEXT PRIMARY KEY,
encryption_key TEXT NOT NULL -- base64 AES-256
);
```
### After
```sql
CREATE TABLE devices (
id TEXT PRIMARY KEY,
shade_address TEXT NOT NULL -- e.g. "device:abc123"
-- Shade tables (created automatically by SQLiteStorage):
-- shade_identity, shade_sessions, shade_signed_prekeys, etc.
);
```
The Shade tables are auto-created when you instantiate the storage backend. No manual migration needed.
## Migration for Orchestrator
The Orchestrator project's `orchestrator-shared/src/crypto/e2ee.ts` provides a static ECDH-derived AES-256-GCM key for the workstation↔server sync tunnel. To migrate:
1. **Add Shade dependencies** to `orchestrator-shared/package.json`
2. **Replace `e2ee.ts`** with imports from `@shade/core` and `@shade/transport`
3. **Update the pairing flow** in `sync-server.ts` and `sync-client.ts` to exchange Shade prekey bundles instead of raw ECDH public keys
4. **Wrap the sync WebSocket** with `ShadeWebSocket` for transparent encryption
5. **Migrate the `serverConnection` table** to a `shade_sessions` table (or run dual-write during the rollout)
The key insight: Shade replaces the static `sharedSecret` column with a full ratcheting session, but the WebSocket transport, message types, and application logic don't change.
## Migration for Nova (push notifications)
Nova's `pushDevices.encryptionKey` column is a per-device static AES key. To migrate:
1. **Run a Shade prekey server** (Docker container, see `examples/05-dokploy-deployment`)
2. **On Android device registration**, generate Shade identity + upload prekey bundle to the server (instead of generating a raw AES key)
3. **In the Nova backend**, fetch the device's bundle and establish a Shade session per device
4. **Encrypt notifications via the Shade session** instead of `encryptPayload()`
5. **On the Android client**, decrypt with Shade instead of the static key
6. **Cross-platform interop**: this requires the `shade-android` Kotlin module (not yet built — planned for the M8 milestone)
During the rollout, send notifications with a `v: 1` (legacy) or `v: 2` (Shade) field so old and new clients coexist.
## Migration to at-rest encryption (V3.2)
Shade 0.4.0 ships `@shade/storage-encrypted` — opt-in AES-256-GCM
encryption of every sensitive payload in the local SQLite/Postgres store.
Existing 0.3.x deploys keep their unencrypted DB and behave exactly as
before; encryption is enabled per-deployment with one CLI command.
### One-shot migration (SQLite)
```bash
# Encrypts in place, drops unencrypted tables, leaves a .bak alongside.
shade migrate-storage \
--key-source passphrase \
--passphrase "$SHADE_STORAGE_PASSPHRASE" \
--salt-file /data/shade-client.db.salt
```
For a dry run that validates every row without writing:
`shade migrate-storage … --dry-run`.
### Code-level switch
Replace:
```ts
import { SQLiteStorage } from '@shade/storage-sqlite';
const storage = new SQLiteStorage('/data/shade-client.db');
```
with:
```ts
import { KeyManager, EncryptedSQLiteStorage } from '@shade/storage-encrypted';
const km = await KeyManager.open({
kind: 'passphrase',
passphrase: process.env.SHADE_STORAGE_PASSPHRASE!,
salt: loadSaltFromDisk(),
});
const storage = await EncryptedSQLiteStorage.open({
dbPath: '/data/shade-client.db',
keyManager: km,
});
```
The encrypted store implements the same `StorageProvider`, so
`ShadeSessionManager` and the rest of the wiring is unchanged.
See `docs/storage-encryption.md` for the full design, key sources
(passphrase / OS keychain / app-injected) and rotation.
## Migrating from 0.3.x to 4.0 (GA)
Shade 4.0 is the GA-frozen baseline. Everything from V3.2V3.12 is
merged, externally reviewed, and the wire format is locked. Nothing is
breaking on the wire compared to 0.4.x — peers continue to interoperate.
The 4.0 migration is therefore mostly **opt-in surface activation**
plus a version-bump.
### What stays the same
- Wire envelope `0x02` (RatchetMessage) with u32 length-prefixes.
- Wire envelope `0x11` (stream-chunk) for `@shade/streams`.
- HTTP shape of all `/v1/keys/...` and `/v1/transfer/...` endpoints.
- All `StorageProvider` core method signatures.
- Identity fingerprints, X3DH flow, Ed25519 signature format.
A 0.3.x peer that has not enabled any opt-ins talks to a 4.0 peer
without code changes. The version bump is semantic ("we have completed
the audit cycle"), not breaking.
### What's new (opt-in)
| Surface | Package | How to enable |
|---------|---------|---------------|
| At-rest encryption | `@shade/storage-encrypted` | `shade migrate-storage` (see above) |
| Async store-and-forward | `@shade/inbox`, `@shade/inbox-server` | `createInboxServer()` + `new Inbox()` |
| Bridge transports (SSE, long-poll) | `@shade/transport-bridge`, `createBridgeRoutes()` | mount bridge routes; `FallbackBridgeTransport` |
| Web Workers crypto | `@shade/crypto-web/worker` | `shade.configureWorkerCrypto({ workerUrl })` |
| Social key recovery | `@shade/recovery` | `setupRecovery / attachGuardian / requestRecovery` |
| WebRTC P2P transport | `@shade/transport-webrtc` (peer-dep) | `shade.configureWebRTC({ factory })` |
| Key Transparency | `@shade/key-transparency`, `createPrekeyServerWithKT(...)` | server: `keyTransparency: { ... }` config; client: `keyTransparency: { mode, logPublicKey }` on `createShade` |
| Trust UX gates | built-in to `@shade/sdk` | `shade.beforeFirstLargeFile / beforeBackupImport / beforeNewDeviceTrust(...)` |
| Files RPC | `@shade/files` | `shade.files.serve(handler)` + `shade.files.client(peer)` |
Pulling in **none** of these gives you the 1.0-shape API at 4.0 quality
(audit-completed, soak-tested). Pulling in **all** of them gives the
full 4.0 stack.
### Schema additions
`StorageProvider` implementations (sqlite, postgres, encrypted variants)
auto-create the additional tables on `ensureTables()` /
`initialize()`. The 4.0 superset:
```sql
-- V3.2 (storage encryption) — only when EncryptedSQLiteStorage / EncryptedPostgresStorage is used
shade_master_key_meta(...) -- KeyManager fingerprint + scrypt params
shade_field_keys(...) -- per-(table, column) wrapped DEKs
-- V3.3 (fingerprint gates)
peer_verifications(...) -- markPeerVerified persistence
peer_identity_versions(...) -- bump on acceptIdentityChange
-- V3.6 (inbox relay)
shade_inbox_register(...) -- TOFU bind address ↔ signing key
shade_inbox_blobs(...) -- ciphertext blobs with TTL + msgId
-- V3.10 (recovery)
shade_recovery_setup(...) -- per-recoverer state
shade_recovery_deposits(...) -- per-guardian deposited shares
-- V3.12 (KT — server only)
shade_kt_leaves(...) -- append-only Merkle leaves
shade_kt_index(...) -- address-sorted commitment
shade_kt_sths(...) -- signed tree heads
-- streams resume (V0.2.0+, listed for completeness)
stream_state(...) -- at-rest encrypted streamSecret
```
A 0.3.x deploy that upgrades the package without enabling any new
surface gets these tables created on first start; they stay empty
unless the corresponding feature is wired. There is **no destructive
migration**. To verify before upgrading production:
```bash
shade doctor --db-path /data/shade-client.db
```
The CLI reports any mismatch between the on-disk schema and the version
the installed packages expect.
### Step-by-step upgrade (typical app)
1. **Bump dependencies.** Update every `@shade/*` to `^4.0.0` in your
`package.json`. Bun / npm / pnpm pull from the Gitea registry as
per `.npmrc`.
2. **Re-run install.** `bun install` (or your tool of choice). The new
table definitions ship with the storage backends — no schema-edit
PRs against your DB.
3. **Boot once with no new opt-ins.** Existing send/receive should work
byte-identically. `shade doctor` should print all green.
4. **Pick the opt-ins you actually want.** Wire them one at a time
(storage-encryption first, then fingerprint gates, then any of the
recovery / KT / WebRTC / inbox surfaces). Each surface has its own
doc under `docs/` (`storage-encryption.md`, `trust-ux.md`,
`recovery.md`, `key-transparency.md`, `webrtc.md`, `inbox.md`,
`transport.md`, `web-workers.md`, `files.md`).
5. **Run cross-version smoke.** Boot a 0.3.x peer next to a 4.0 peer in
staging; exchange a session; confirm `shade fingerprint` matches on
both ends and a round-trip message decrypts cleanly.
6. **Ship 4.0 to a canary.** Roll forward; revert path is `bun
install @shade/sdk@^0.4.0` — there is no DB write that 0.4 cannot
also read.
### Operator checklist (prekey container)
If you operate the standalone container (`gt.zyon.no/stian/shade-prekey`):
1. Pull the 4.0 image: `docker pull gt.zyon.no/stian/shade-prekey:4.0.0`.
2. Add new env vars only if you are turning the corresponding surface
on:
- `SHADE_INBOX_PG_URL` / `SHADE_INBOX_DB_PATH` — async store-and-forward.
- `SHADE_INBOX_PRUNE_INTERVAL_MINUTES` — inbox prune cadence.
- `SHADE_BRIDGE_*` — bridge / SSE / long-poll surface.
- `SHADE_KT_*` — Key Transparency mode + signing key path.
- `SHADE_TRANSFER_*` — transfer routes mounted on the same Hono app.
3. Restart with the existing volume; the inbox / KT tables auto-create
on first request.
4. Update `docs/PRODUCTION-CHECKLIST.md` items for any new surface
you've enabled (rate-limit budgets, retention policies, KT
witness-pinning).
5. Verify the [OpenAPI](packages/shade-server/openapi.yaml) endpoints
you advertise to clients now include the routes you mounted.
### What about 4.0 → 4.x?
V4.x is bug-fix only. No wire-bump until V5.0 (voice/video) which
is **additive** — it allocates new envelope types (frame-key prefixes)
that 4.0 clients ignore by design.
## Common pitfalls
1. **Don't store private keys in shared databases without encryption at rest** — for shared infrastructure, enable `@shade/storage-encrypted` (V3.2) or use filesystem encryption / PostgreSQL TDE. The default `SQLiteStorage` and `PostgresStorage` write unencrypted.
2. **Don't skip identity verification** — Shade gives you fingerprints (`getIdentityFingerprint()`), but it's the user's responsibility to compare them out-of-band on first contact.
3. **Don't reuse session storage between identities** — each user/device should have its own Shade storage. Mixing identities in one storage will corrupt the ratchet state.
4. **Keep prekey stocks topped up** — call `ensurePreKeyStock()` periodically (e.g., on app start or every hour). When the server runs out of one-time prekeys, new sessions will fall back to using just the signed prekey, which is slightly less secure.