release(v4.0.0): Shade GA — V3.x consolidation + audit prep
Some checks failed
Test / test (push) Has been cancelled
Cross-platform vectors / TypeScript vectors (bun) (push) Has been cancelled
Cross-platform vectors / Kotlin vectors (gradle) (push) Has been cancelled
Docker build and publish / docker (push) Has been cancelled
Publish / publish (push) Has been cancelled

V3.1 → V3.12 consolidated and tagged for the first GA release. Wire
format unchanged from 0.4.x — 4.0 peers interoperate with 0.4.x peers
byte-for-byte. The version bump is semantic: audit-cycle complete,
opt-in surface fully exposed, threat model refreshed for every new
surface.

Highlights:
- All 24 @shade/* packages bumped to 4.0.0 in lockstep.
- CHANGELOG 4.0.0 section is the canonical manifest of what landed.
- THREAT-MODEL extended (§10 fingerprint gates, §11 WebRTC P2P, §12
  Web-Worker boundary) + residual-risks table refreshed.
- OpenAPI now covers all 27 routes: prekey, transfer, KT, inbox,
  bridge, observer, /metrics, /healthz, /ready.
- MIGRATION 0.3.x → 4.0 documented + smoke-tested against
  shade migrate-storage on a real SQLite DB.
- docs/audit/REVIEW-BUNDLE.md + SCOPE.md ready for external reviewer.
- scripts/soak.ts harness for the GA-stable 2-week soak window.
- All V*.md plans archived under docs/archive/ with Status: Done.
- Voice/Video carved out into V5.0; 4.0 audit focuses on the frozen
  non-realtime stack.

Tests: TS 1000/1000 + Kotlin 11/11 cross-platform vectors green.
Docker: gt.zyon.no/stian/shade-prekey:4.0.0 builds and reports
  version 4.0.0 on /health.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-03 18:35:35 +02:00
parent 8b055912b7
commit e6fdf31b49
298 changed files with 37909 additions and 256 deletions

View File

@@ -73,15 +73,20 @@ Tables will be created automatically with the `shade_server_*` prefix, so they c
| `PORT` | `3900` | HTTP port |
| `SHADE_PREKEY_DB_PATH` | `/data/shade-prekeys.db` | SQLite file location |
| `SHADE_PREKEY_PG_URL` | unset | Postgres URL (overrides SQLite) |
| `SHADE_INBOX_DB_PATH` | unset (memory) | SQLite file for the V3.6 inbox relay |
| `SHADE_INBOX_PG_URL` | falls back to `SHADE_PREKEY_PG_URL` | Postgres URL for the inbox relay |
| `SHADE_INBOX_PRUNE_INTERVAL_MINUTES` | `5` | How often expired inbox blobs are dropped |
| `SHADE_OBSERVER_TOKEN` | unset | Enables dashboard at `/shade-observer/dashboard/`. Min 16 chars. |
| `SHADE_STALE_DAYS` | `30` | Purge identities with no activity in N days |
| `SHADE_CLEANUP_INTERVAL_HOURS` | `24` | Cleanup cycle interval |
| `SHADE_LOG_LEVEL` | `info` | `debug` / `info` / `warn` / `error` |
| `SHADE_OTEL_ENABLED` | unset | Set to `1`/`true` to enable OpenTelemetry tracing on `withTracer()`-configured deployments. See [`observability.md`](./observability.md). |
## Health and observability
- **Health:** `GET /health` — returns `{"status":"ok"}` when the storage backend is reachable. Docker's HEALTHCHECK uses this.
- **Metrics:** `GET /metrics` — Prometheus format with counters, histograms, and gauges for all routes.
- **Tracing:** Optional OpenTelemetry spans via `@shade/observability`. Off by default; flip `SHADE_OTEL_ENABLED=1` to activate. PII-safe span attributes are documented in [`observability.md`](./observability.md).
- **OpenAPI:** `GET /openapi.yaml` — machine-readable API contract for any language.
- **Redoc viewer:** `GET /docs` — human-readable API reference.
- **Dashboard:** `GET /shade-observer/dashboard/` — live activity viewer (requires token).

View File

@@ -0,0 +1,179 @@
# Shade Production Checklist
A flat punch-list for taking a Shade prekey server from "it boots" to
"production-ready". Every item below is a hard gate — if you can't tick it,
don't ship.
The deeper "why" behind each item lives in `THREAT-MODEL.md`,
`SECURITY.md`, and `docs/DEPLOYMENT.md`. This file is the operator's
checklist.
> Scope: a single Shade prekey container (`@shade/server`) plus any
> consumer apps that talk to it. For E2EE file transfer hardening
> (max-size, retention, quotas), see the **Hardening** and **Retention**
> sections of `docs/streams.md`.
---
## 1. TLS termination
- [ ] Public traffic is **TLS 1.2+ only** — Shade itself speaks plain HTTP
and assumes a reverse proxy (Caddy, Traefik, nginx, Dokploy's
built-in proxy) terminates TLS in front of it.
- [ ] HSTS is on (`Strict-Transport-Security: max-age=15552000`).
- [ ] The proxy is configured to pass the original `Host` header through
so signed payloads bound to the canonical address don't trip the
replay-window check on a mismatch.
- [ ] Internal traffic between consumer apps and the prekey container
runs on a private network (Docker bridge / VPC); the prekey port
is **not** exposed to the public internet without TLS in front.
> **Why:** identity signatures and observer bearer tokens travel in
> request bodies / headers. Without TLS, a network attacker can read
> the observer token and replay it for the full validity window, and
> can read the metadata (who registers, who fetches whose bundle).
> See `THREAT-MODEL.md § 1` (network attacker).
## 2. Backups
- [ ] **SQLite:** scheduled `sqlite3 /data/shade-prekeys.db ".backup ..."`
at least daily. The `.db` file plus `-wal` and `-shm` together is
the recovery unit; never copy the bare `.db` while the container
is running without using the online backup API.
- [ ] **Postgres:** `pg_dump` (or your provider's snapshot) at least
daily; verify a restore at least once per quarter.
- [ ] Backups are stored on different infrastructure than the primary
volume (different host / region / provider).
- [ ] Backups are encrypted at rest (your storage provider's
server-side encryption, age, or restic with a passphrase).
- [ ] **Restore drill:** at least once before going live, restore the
backup into a fresh volume and confirm `/health` is green and a
registered identity is still resolvable.
> **Why:** prekey records contain identity public keys and one-time
> prekeys. Losing them means new sessions can't be established to those
> identities until each user re-registers. Existing sessions keep
> ratcheting on the device-side state.
## 3. Observer token rotation
- [ ] `SHADE_OBSERVER_TOKEN` is set to **≥ 16 chars** of high-entropy
random data (e.g. `openssl rand -hex 32`). The server logs a
warning and disables the observer if the token is shorter.
- [ ] The token is held in your secret manager (Dokploy secret, GitHub
Actions secret, Vault, 1Password CLI), **never** committed to a
compose file or `.env` checked into git.
- [ ] The token is rotated on a schedule (recommended: every 90 days)
and immediately if it has been shared with anyone who no longer
needs access.
- [ ] If you expose the dashboard publicly, you also gate it behind
basic-auth at the proxy layer — bearer tokens are not
revocation-friendly on their own.
> **Why:** the observer dashboard exposes metadata about every active
> identity, registration timestamp, and recent activity. Anyone with
> the token can scrape the entire prekey directory.
## 4. SQLite vs PostgreSQL
Pick one and stick to it.
- [ ] **SQLite** is the default. Use it when **one** Shade container is
enough, you can tolerate downtime during backup snapshots, and
your write rate is below ~500 req/s. Path: `SHADE_PREKEY_DB_PATH`,
default `/data/shade-prekeys.db`.
- [ ] **PostgreSQL** is for multi-replica deployments, shared
infrastructure, or when you already operate a managed Postgres
and want one fewer thing to back up. Path: `SHADE_PREKEY_PG_URL`.
Tables are auto-created with `shade_server_*` prefix.
- [ ] Whichever you pick, the database lives behind TLS for the
connection (`sslmode=require` for Postgres) and on storage that
is itself encrypted (LUKS, EBS encryption, managed-DB encryption).
- [ ] You do **not** mix them in the same deployment. Setting
`SHADE_PREKEY_PG_URL` overrides SQLite silently — pick one in
`compose.yml` and document which.
> **Why:** Shade does **not** encrypt the database itself (V3.2 will).
> Disk-level / volume-level encryption is the operator's responsibility
> until at-rest encryption ships.
## 5. Log level and structured logs
- [ ] `SHADE_LOG_LEVEL` is set to `info` (production) or `warn`
(high-traffic). Avoid `debug` in prod — it logs request bodies
including signed payloads.
- [ ] Logs are shipped to a retention-bounded sink (Loki, CloudWatch,
Datadog) with **redaction of `Authorization` headers and signed
bodies** if your sink doesn't already strip them.
- [ ] You alert on `error`-level logs and on the absence of cleanup
cycles (a stuck cleanup loop = unbounded DB growth).
> **Why:** at `debug` level the server logs signature material. While
> Ed25519 signatures are not secrets per se, leaking them widens the
> replay-window blast radius and reveals timing patterns.
## 6. Stale-identity cleanup parameters
- [ ] `SHADE_STALE_DAYS` is set deliberately for your product. The
default (30 days) is right for "active chat app"; "occasional
use" apps should bump to 90+ to avoid surprise re-registration.
- [ ] `SHADE_CLEANUP_INTERVAL_HOURS` is left at 24 unless you have a
specific reason — running cleanup more often does not free more
space, and running it less often risks one cycle missing a day.
- [ ] You watch the `shade_cleanup_purged_total` metric (Prometheus) and
alert on a sudden 10× spike — that often signals a bug or a
deployment that broke client-side activity timestamps.
> **Why:** stale cleanup is the only thing keeping the prekey directory
> from growing forever. A misconfigured `SHADE_STALE_DAYS = 0` would
> nuke every identity on every cycle. Bound the value at ≥ 1 in your
> deployment config.
## 7. Secret rotation
- [ ] Identity signing keys: each consumer rotates via the documented
identity-rotation flow (7-day grace period for old sessions).
Operators do **not** touch identity keys directly.
- [ ] Observer token: see § 3.
- [ ] Database credentials (Postgres only): rotate per your standard
cadence, with the connection string supplied through the secret
manager.
- [ ] No long-lived API keys or service tokens are stored in the
container image or volume.
## 8. Rate-limit and body-size caps
- [ ] You have not lowered the built-in rate limits below the defaults
(per-IP register/bundle and per-identity replenish/delete).
- [ ] You have not raised the 64 KiB POST body limit. Prekey bundles
fit comfortably; raising the limit only enables abuse.
- [ ] Your reverse proxy enforces an additional connection / request-
rate limit at the edge (Caddy `rate_limit`, Cloudflare, etc.)
so a single noisy IP can't even reach Shade's per-route limits.
## 9. Health checks and metrics scrape
- [ ] Container has a Docker `HEALTHCHECK` (the official image already
ships one against `/health`).
- [ ] `/metrics` is scraped by Prometheus / OpenTelemetry and
retained ≥ 30 days.
- [ ] Alerts are wired for: `/health` failing for > 2 min, request
latency p99 > 1 s, error rate > 1 %, cleanup cycles missing for
> 25 h.
## 10. OpenAPI contract drift
- [ ] CI runs the OpenAPI lint (`bun test packages/shade-server/tests/openapi-lint.test.ts`)
on every PR — the spec must remain valid OpenAPI 3.1 with no
dangling `$ref`s.
- [ ] Generated clients (Python, Go, Kotlin) are regenerated from the
shipped spec on each release; mismatches between server and
client are caught at integration test time, not production.
---
## Pre-flight summary
If you can answer "yes" to every box above, ship it. If you can't,
write down which box and why before you do — that note belongs in your
runbook so the next operator inherits the gap, not the surprise.

116
docs/ROADMAP.md Normal file
View File

@@ -0,0 +1,116 @@
# Shade Roadmap — V3.1 → V5.0
Indeks over versjonsplanene fra V3.1-grunnsteinen via **Shade 4.0 GA** og
videre til **Shade 5.0** (Voice & Video).
- **V4.0 GA** ✅ — alt fra V2.1 / V2.2 / V2.3 og bonus-tracket (sosial
recovery, P2P WebRTC, Pub/Sub, Key Transparency) er merget, testet,
dokumentert og pakket for ekstern review. Wire-formatet er låst.
- **V5.0** = den dedikerte sanntids-releasen. *Alt* VOIP og videostreaming
ligger her — implementert oppå den frosne 4.0-stacken.
Alle V3.x-planer ligger nå under [`docs/archive/`](./archive/) med
`Status: Done`. Aktive planer: [`V5.0.md`](./V5.0.md).
---
## Faser
### Fase 1 — Documentation & Hardening Foundation ✅
| Plan | Tittel | Effort | Status |
|------|--------|--------|--------|
| [V3.1](./archive/V3.1.md) | Documentation & Hardening Foundation | S | **Done** |
### Fase 2 — Sikkerhetsmodning ✅
| Plan | Tittel | Effort | Status |
|------|--------|--------|--------|
| [V3.2](./archive/V3.2.md) | At-Rest Storage Encryption | L | **Done** |
| [V3.3](./archive/V3.3.md) | Fingerprint Gates & Trust UX | M | **Done** |
| [V3.4](./archive/V3.4.md) | Observability v2 (OpenTelemetry) | M | **Done** |
| [V3.5](./archive/V3.5.md) | Android Parity & Cross-Platform CI | XL | **Done** |
### Fase 3 — Plattformutvidelse ✅
| Plan | Tittel | Effort | Status |
|------|--------|--------|--------|
| [V3.6](./archive/V3.6.md) | Async Store-and-Forward (Inbox) | L | **Done** |
| [V3.7](./archive/V3.7.md) | Transport Bridge (SSE / long-poll) | M | **Done** |
| [V3.8](./archive/V3.8.md) | Web Workers Crypto | M-L | **Done** |
| [V3.9](./archive/V3.9.md) | Rich File Metadata & Previews | M | **Done** |
### Fase 4 — Tillit og P2P-transport ✅
| Plan | Tittel | Effort | Status |
|------|--------|--------|--------|
| [V3.10](./archive/V3.10.md) | Social Key Recovery | L | **Done** |
| [V3.11](./archive/V3.11.md) | WebRTC P2P Transport | XL | **Done** |
| [V3.12](./archive/V3.12.md) | Key Transparency | XXL | **Done** |
### Fase 5 — General Availability ✅
| Plan | Tittel | Effort | Status |
|------|--------|--------|--------|
| [V4.0](./archive/V4.0.md) | External Audit, Consolidation, GA | M | **Done** |
### Fase 6 — Sanntid (post-GA)
| Plan | Tittel | Effort | Avhenger av |
|------|--------|--------|-------------|
| [V5.0](./V5.0.md) | Voice & Video | XXL | V4.0 GA + V3.11 |
---
## Effort-nøkkel
| Symbol | Tid |
|--------|-----|
| **S** | 12 uker |
| **M** | 24 uker |
| **L** | 48 uker |
| **XL** | 24 måneder |
| **XXL** | 4+ måneder / multi-quarter |
---
## Avhengighetsgraf
```text
V3.1 ────┬──► V3.2 ──┐
├──► V3.3 ──┼──► V3.10 ──┐
├──► V3.4 ──┘ │
├──► V3.5 ───────────────┼──► V3.12 ──┐
├──► V3.6 ──► V3.7 ──► V3.11 ─────────┤
├──► V3.8 ├──► V4.0 GA ──► V5.0 (Voice & Video)
└──► V3.9 ─────────────────────────────┘
```
---
## Status-konvensjon
Hver plan har et `Status:`-felt øverst. Lov verdier:
- `Idea` — ikke startet, design fortsatt åpent.
- `Design` — designnotat under arbeid eller approved.
- `IMP` — implementasjon pågår.
- `Done` — merget i main, dekket av tester.
Når en plan blir `Done`, flytt fila til `docs/archive/` og oppdater denne tabellen.
---
## Versjonering
- **V3.1 → V3.12** ble trinnvise minor-releases på `0.4.x`-linjen.
- Wire-format-endringer akkumulerte til **V4.0**, men endte med å være
uendret fra 0.4.x — major-bumpen til 4.0 markerer audit-cycle ferdig
og GA-frosset kjerne, ikke en wire-bump.
- **V4.0** er GA — låst kjerne, pakket for ekstern review, ingen
voice/video.
- **V5.0** legger sanntid (voice/video/broadcast) oppå den frosne
4.0-stacken. Bygger på reserverte envelope-typer slik at 4.0-klienter
ignorerer 5.0-trafikk gracefully — ikke breaking.
- Hver `V*`-merge oppdaterer `CHANGELOG.md` og bumper alle pakker via
`bun run version`.

View File

@@ -29,9 +29,11 @@ Pull in **one row** that matches your project; add optional columns only when ne
| **Large files** — resumable E2EE upload/download | Above + stream protocol + HTTP (or WS) transport | `@shade/sdk` (re-exports transfer) + mount transfer routes on **your** HTTP server | `shade.upload` / `onIncomingTransfer` — see [streams.md](./streams.md) |
| **React UI** — upload/download widgets | Runtime from SDK + widgets | `@shade/sdk` + `@shade/widgets` | `ShadeRuntimeProvider`, `useShadeUpload` / `useShadeDownload` |
| **Prekey hosting only** — one container per product | No app crypto in the container | Docker image / `@shade/server` | Deploy prekey image; point `prekeyServer` at it from apps |
| **Offline-tolerant messaging** — recipient may be offline | Above + a relay that holds ciphertext blobs | `@shade/inbox` (client) + `@shade/inbox-server` (or the prekey container, which bundles both) | Register address, `inbox.send()` to peer, `inbox.onIncoming(handler)` — see [inbox.md](./inbox.md) |
| **"What if I lose my phone?"** — survive device loss without a recovery agent | Above + Shamir-split shares to `n` guardians; threshold `k` reconstruct | `@shade/recovery` + `@shade/widgets` (`<RecoverySetup />`, `<RecoveryRequest />`, `<RecoveryApprove />`) | `setupRecovery` / `attachGuardian` / `requestRecovery` — see [recovery.md](./recovery.md) |
| **Maximum control** — custom wire, custom transport | Wire + session manager | `@shade/core` + `@shade/proto` (+ your storage + crypto provider) | `ShadeSessionManager`, encode/decode envelopes yourself |
| **HTTP or WebSocket convenience** | Auto-wrap application bytes | `@shade/transport` on top of your stack | Use when you want transport helpers, not a new protocol |
| **Android** | Byte-compatible with TS (roadmap) | `shade-android` module | See [android/shade-android/README.md](../android/shade-android/README.md) — parity work in progress |
| **Android** | Byte-compatible with TS (cross-vector gated in CI) | `shade-android` module | See [android/shade-android/README.md](../android/shade-android/README.md). Cross-platform vectors live in [`test-vectors/`](../test-vectors/) and are exercised by both runners. |
You can **mix rows**: e.g. backend with `@shade/sdk` + SQLite for sessions, separate service mounting `transfer` routes, browser clients using `@shade/widgets`.
@@ -43,7 +45,7 @@ You can **mix rows**: e.g. backend with `@shade/sdk` + SQLite for sessions, sepa
2. Pick **storage** (`sqlite:…`, Postgres, or project-specific adapter implementing the core storage interfaces).
3. Choose **surface**: usually `@shade/sdk` unless you truly need `@shade/core` only.
4. For files: enable **transfer routes** and authenticate chunk uploads using the patterns in the SDK (see streams doc).
5. Run **`shade doctor`** when something fails in production-ish setups (install the CLI as in repository [Quick start](../README.md#quick-start)); coverage is evolving — roadmap in [V2.2](./V2.2.md).
5. Run **`shade doctor`** when something fails in production-ish setups (install the CLI as in repository [Quick start](../README.md#quick-start)); the gates that fire are documented in [trust-ux.md](./trust-ux.md) and [PRODUCTION-CHECKLIST.md](./PRODUCTION-CHECKLIST.md).
---
@@ -54,7 +56,7 @@ You can **mix rows**: e.g. backend with `@shade/sdk` + SQLite for sessions, sepa
| File transfer architecture | [streams.md](./streams.md) |
| Deployment & operations | [DEPLOYMENT.md](./DEPLOYMENT.md) |
| Threat model | [THREAT-MODEL.md](../THREAT-MODEL.md) |
| Planned improvements | [V2.1](./V2.1.md), feature backlog [V2.2](./V2.2.md), trust/ops [V2.3](./V2.3.md) |
| Planned improvements | [ROADMAP](./ROADMAP.md) — V3.x archive under [`archive/`](./archive/), next milestone [V5.0](./V5.0.md) |
---

135
docs/V5.0.md Normal file
View File

@@ -0,0 +1,135 @@
# Shade V5.0 — Voice & Video
**Status:** Idea (post-V4.0 GA)
**Effort:** XXL (4+ måneder)
**Forrige:** V4.0 GA + V3.11 (P2P transport kreves)
**Adresserer:** V2.1-tillegg "ShadeVoiceButton / ShadeVideoCall / ShadeBroadcaster"
V5.0 er den dedikerte sanntids-releasen — alt VOIP og videostreaming
samles her, *etter* at Shade 4.0 er GA-merket. Stacken under
(ratchet, transport, observability, recovery, key transparency,
WebRTC P2P) er låst i 4.0; 5.0 bygger uten å røre kjernekrypto-
revisjonen.
---
## Mål
E2EE sanntidskommunikasjon på Shade-stack: voice-calls, video-calls,
broadcast/streaming — alt som "magic drop-in"-komponenter for konsumerende
apper.
```tsx
<ShadeVoiceButton to={peerAddress} />
<ShadeVideoCall to="device:server-admin" />
<ShadeBroadcaster streamKey="game-stream-1" />
<ShadeViewer streamKey="game-stream-1" />
```
---
## Scope
### Inn
- Ny pakke `@shade/voice` — 1:1 voice over WebRTC P2P.
- Ny pakke `@shade/video` — 1:1 video, deler kjerne med voice.
- Ny pakke `@shade/broadcast` — 1:N broadcast med relay-helper.
- SFrame-style frame encryption — payload-keys ratchet'es per call,
derivert fra Shade-session.
- Codec: Opus (audio), AV1/VP9 (video) — WebRTC standard.
- Widget-komponenter for hvert use case.
- Key-rotation under loss: forward-secrecy per X frames eller hvert N
sekund.
### Ut
- Group-calls (≥ 3 deltakere) som første milestone — krever SFU + group
key agreement; egen sak.
- Replacement for native phone-app — vi tilbyr in-app calls.
- Codec-implementasjon — vi bruker browser/native WebRTC.
---
## Design
### Frame-key derivasjon
```text
callKey = X3DH(A, B) → HKDF("shade-call-v1") → callRatchetKey
frameKey[i] = HKDF(callRatchetKey, "frame" || u64(i))
```
`callRatchetKey` ratcheter forward hver N millisekund eller hver M frames;
kompromittert frame = bare det vinduet eksponert.
### SFrame
Følger IETF MLS/SFrame-mønstre:
- Header er klartekst (codec-metadata).
- Payload er AES-GCM med deterministisk nonce.
- Mottaker dropper frames med out-of-window seq.
### Topologi
- 1:1: P2P via V3.11.
- Broadcast: relay-helper i `@shade/broadcast-relay` distribuerer
ciphertext til subscribers — relay ser aldri plaintext.
---
## Leveranser
### Pakker
- `@shade/voice` + `@shade/video` (delt kjerne i `@shade/realtime-core`).
- `@shade/broadcast` + `@shade/broadcast-relay`.
- Widgets: `<ShadeVoiceButton />`, `<ShadeVideoCall />`,
`<ShadeBroadcaster />`, `<ShadeViewer />`.
### Tester
- Unit: SFrame encrypt/decrypt + tamper.
- Integration: 1:1 video 30 fps i 60 s; > 99 % frames levert; key rotation
observert.
- Loss recovery: 30 % packet loss → quality grace.
- Adversarial: relay-DB-dump avslører ingen plaintext.
### Dokumentasjon
- `docs/voice-video.md` — setup, codec-tradeoffs, broadcast-arkitektur.
---
## Akseptansekriterier
- [ ] 1:1 video 60 fps + 1080p mellom to klienter samme LAN.
- [ ] Frame-key kompromittering blokkerer maks 1 sekund forward data.
- [ ] Broadcast 1:50 viewers fungerer med < 2 s end-to-end latency.
---
## Avhengigheter
- **V4.0 GA** — kjerne-stacken må være ekstern-revidert og frosset før
vi legger sanntid-protokoll oppå.
- V3.11 — P2P transport (kommer i V4.0-vinduet).
- V3.5 — Android-paritet hvis voice/video skal funke på mobile.
---
## Risiko
- **Codec-quirks.** AV1 vs VP9 vs H.264 har ulik browser-støtte.
- **Frame-key sync under loss.** Avansert; SFrame-spec er fortsatt under
standardisering.
- **Latency vs sikkerhet.** Hver ratchet-step legger på µs.
---
## Migrasjon
Nye pakker. Ikke breaking — wire-formatene fra V4.0 holdes uendret;
voice/video legger til egne envelope-typer i et reservert range som
4.0-clients ignorerer.

100
docs/archive/V3.1.md Normal file
View File

@@ -0,0 +1,100 @@
# Shade V3.1 — Documentation & Hardening Foundation
**Status:** Done
**Effort:** S (12 uker)
**Forrige:** V2.3
**Neste:** V3.2 / V3.3 / V3.4 (kan kjøres parallelt)
---
## Mål
Lukke "lav-friksjon"-gjelden fra V2.1, V2.2 og V2.3 før vi tar fatt på de tunge
sikkerhetsløftene. Dette er pre-arbeidet som låser opp resten av roadmapen:
operatører skal kunne deploye trygt, transfer-konsumenter skal ha klare grenser,
og OpenAPI skal dekke hele HTTP-flaten.
Ingen ny kjernekode — kun docs, OpenAPI-utvidelser, retention-defaults og en
test-/threat-matrise.
---
## Scope
### Inn
- README + `@shade/server`-README: eksplisitt "keys vs payloads"-narrativ med
diagram + lenke til `THREAT-MODEL.md`.
- Ny `docs/PRODUCTION-CHECKLIST.md`: TLS, backup, observer-token-rotering,
SQLite vs PG, log-nivå, stale-params, secret-rotering.
- Hardening-seksjon i `docs/streams.md`: max stream-size, TTL, quota-mønstre —
peker mot `@shade/files`-hooks som referanse.
- `openapi.yaml` utvidet med `/v1/transfer/*` (`chunk`, `state`, `health`) +
sikkerhetsskjema for `ShadeTransferAuthenticator`.
- Retention-defaults i `docs/streams.md` + SDK-template:
`pruneStreamStates`-cron som default — "ferdige streams ryddes etter N
dager".
- `SECURITY.md`-utvidelse: review-status, "hvordan rapportere", lenking fra
`THREAT-MODEL.md`-rader → `tests/security/*` (test-/threat-matrise).
### Ut
- Faktisk crypto-review (det er V4.0).
- Endringer i krypto- eller wire-format.
- Ny kode utenfor SDK-templates.
---
## Leveranser
### Dokumentasjon
- `docs/PRODUCTION-CHECKLIST.md` — ny.
- `docs/streams.md` — utvidet med "Hardening" og "Retention".
- `README.md` — diagram-justering + "Hva som ikke går via Shade-server".
- `packages/shade-server/README.md` — speile narrativet.
- `SECURITY.md` — review-status + threat-/test-matrise.
- `THREAT-MODEL.md` — krysslenker til konkrete tester.
### Kode (kun konfig + templates)
- `packages/shade-server/openapi.yaml``/v1/transfer/*`-paths,
`ShadeTransferAuthenticator` securityScheme.
- `packages/shade-cli/templates/bun-server` — default
`pruneStreamStates`-cron.
### Tester
- Lint-test: OpenAPI-spec validerer fortsatt mot OpenAPI 3.1-skjema.
- Smoke-test for cron i template.
---
## Akseptansekriterier
- [ ] Ny utvikler kan lese README + `PRODUCTION-CHECKLIST.md` og deploye
prod-klar Shade uten å lese hele kodebasen.
- [ ] Generert klient (Python eller Go) fra `openapi.yaml` dekker både
prekey- og transfer-flate uten manuelle fixes for happy path.
- [ ] `THREAT-MODEL.md` linker hver "Mitigations"-rad til minst én test-fil.
- [ ] Default SDK-template `bun-server` prune'r resumable streams uten
manuell konfig.
---
## Avhengigheter
Ingen.
---
## Risiko
Lav. Verste utfall er foreldet docs hvis V3.2+ endrer overflater. Mitiger ved
å skrive små, oppdaterbare seksjoner heller enn lange narrative kapitler.
---
## Migrasjon
Ingen — alt er additivt.

134
docs/archive/V3.10.md Normal file
View File

@@ -0,0 +1,134 @@
# Shade V3.10 — Social Key Recovery
**Status:** Done — landet i `@shade/recovery` 0.4.0, frosset i 4.0 GA.
**Effort:** L (48 uker)
**Forrige:** V3.2 + V3.3
**Adresserer:** V2.1-tillegg "sosial nøkkel-recovery"
---
## Mål
Løs det største UX-hullet i alle E2EE-systemer: **"Hva skjer hvis jeg
mister telefonen?"**. Bruker velger N "guardians" (familie / venner /
jobb-partnere); når bruker mister enheten, kan en threshold-andel av
guardians sammen returnere identity-nøkkelen — uten at noen enkelt guardian
kan gjøre det alene, og uten at server lærer noe.
---
## Scope
### Inn
- Shamir Secret Sharing (k-of-n) over identity private key (eller en
backup-encryption-key).
- Distribusjon av shares via eksisterende 1:1 Shade-sesjoner — guardians
lagrer share lokalt.
- Recovery-flow: ny enhet ber threshold guardians sende sine shares;
rekonstruerer på ny enhet.
- Verifikasjons-step: ny enhet beviser identitet til hver guardian via OOB
safety-number-sammenligning **før** guardian frigjør share.
- UX-guide: hvor mange guardians, hvilken threshold, hvordan rotere når en
guardian mister enhet.
### Ut
- "Cloud guardian" / Shade-driftet recovery — vi tillater ingen sentralisert
komponent som kan gjøre det alene.
- Auto-distribusjon (vi krever eksplisitt valg av guardians).
---
## Design
### Hva deles
```text
shareSecret = AES-256-GCM-encrypt(identityState, recoveryKey)
recoveryKey is Shamir-split(k, n) → shares[i]
shareSecret stored locally + on each guardian
each guardian receives one share via Shade.send
```
`identityState` er det samme som `Shade.exportBackup` (eksisterer i 0.3.x),
men her gjenbrukes formatet.
### Recovery-flow
1. Ny enhet genererer **temporary** identity + safety number.
2. Ny enhet kontakter guardians via prekey-server (OOB verifisering først).
3. Hver guardian godkjenner manuelt og returnerer sin share via
`Shade.send`.
4. Ny enhet rekonstruerer `recoveryKey`, dekrypterer `shareSecret`,
gjenoppretter identity.
5. Original identity roterer (gammel identitet markeres som
"compromised — used for recovery").
### Guardian-UX
- Guardian-app/widget viser:
*"Alice (din venn) har mistet sin enhet og ber om recovery share.
Bekreft fingerprint før du sender."*
- Guardian kan **avslå** uten konsekvens.
---
## Leveranser
### Pakker
- `@shade/recovery` — Shamir + share-distribusjon.
- `@shade/widgets``<RecoverySetup />` (velg guardians) +
`<RecoveryRequest />` (ny enhet ber) + `<RecoveryApprove />` (guardian
godkjenner).
### Tester
- Unit: Shamir split/combine roundtrip; threshold-håndhevelse.
- Integration: full 3-of-5 recovery med 5 mock-guardians.
- Adversarial: 2 guardians koluderer (under threshold) → kan ikke
rekonstruere.
- Adversarial: ondsinnet ny enhet uten safety-number-bekreftelse → ingen
guardian skal frigjøre share.
### Dokumentasjon
- `docs/recovery.md` — full UX + threat model.
- Trusselmodell-utvidelse: kollusjon ≤ k-1, identitetsforfalskning, social
engineering.
---
## Akseptansekriterier
- [ ] 3-of-5 recovery fungerer end-to-end på 2 separate enheter.
- [ ] Ingen koalisjon av (k-1) guardians kan rekonstruere `shareSecret`
(verifisert med fast-check property test).
- [ ] Guardian-side widget krever fingerprint-bekreftelse før send (gate
fra V3.3 forsterket).
---
## Avhengigheter
- V3.2 — nøkkelmateriale at-rest hos guardian skal være kryptert.
- V3.3 — fingerprint-gate på recovery-handshake.
---
## Risiko
- **UX er det vanskeligste.** "Hvem er min guardian?" er sosialt komplekst;
bruker kan velge dårlig.
- **Social engineering.** Angriper imiterer offer over telefon → guardian
gir share. Mitiger med harde fingerprint-gates + cool-down.
- **Dead guardians.** Hvis guardian dør / mister sin enhet uten å være
erstattet, threshold synker. Periodisk "guardian health check"-prompt
anbefales.
---
## Migrasjon
Ny pakke. Apper kan legge til recovery-widget i innstillinger.

124
docs/archive/V3.11.md Normal file
View File

@@ -0,0 +1,124 @@
# Shade V3.11 — WebRTC P2P Transport
**Status:** Done — landet med `@shade/transport-webrtc` 0.4.0,
`MultiTransportFallback` i `@shade/transfer`, og
`shade.configureWebRTC()` i `@shade/sdk`. Se [docs/webrtc.md](../webrtc.md).
**Effort:** XL (24 måneder)
**Forrige:** V3.7
**Adresserer:** V2.1-tillegg "P2P WebRTC transport"
---
## Mål
Direct peer-to-peer datakanal mellom Shade-klienter når NAT/firewall
tillater. Primær gevinst: massiv throughput for `@shade/transfer`
(filer, store payloads) og lav-latens for messaging når begge peere
er online samtidig. E2EE bevart: WebRTC DTLS-SRTP er **transport**
payload er fortsatt Shade ratchet-krypto.
V3.11 lander i V4.0-vinduet og er foundation-only — sanntidsbruken
(voice, video, broadcast) ligger i [V5.0](../V5.0.md) som downstream
konsumer av denne datakanalen.
---
## Scope
### Inn
- Ny pakke `@shade/transport-webrtc`.
- Signaling via Shade control plane (eksisterende kanal — `Shade.send`).
- ICE/STUN: bruk offentlige STUN-servere som default.
- TURN: konfigurerbar TURN-relay som fallback.
- DataChannel for `@shade/transfer`-chunks.
- Auto-fallback: P2P → HTTP (eksisterende stack).
### Ut
- SFU/MCU (mange-til-mange topologi) — broadcast/video er V5.0.
- Voice/video media-tracks — V3.11 er ren datakanal (DataChannel);
audio/video over RTP er V5.0.
- DTLS-fingerprint-binding til Shade-fingerprint (vurderes som hardening,
men ikke krav).
---
## Design
### Connection-flow
```text
A initierer:
1. createOffer() → SDP
2. shade.send(B, { kind: "webrtc-offer", sdp })
3. B mottar over Shade-kanal, createAnswer()
4. shade.send(A, { kind: "webrtc-answer", sdp })
5. ICE-candidates exchange (samme kanal)
6. DataChannel åpen
```
### Wrapping
DataChannel sender ferdige `@shade/transfer`-chunks (allerede E2EE).
WebRTC's egen DTLS-SRTP fungerer som transport-secrecy lag.
### Topologi
- 1:1 P2P direkte når mulig.
- TURN-relay når NAT'er er for strenge (transport-only, ser ikke plaintext).
---
## Leveranser
### Pakker
- `@shade/transport-webrtc` — Connection, DataChannel-wrapper, ICE-config.
- `@shade/transfer` utvides: `WebRTCTransferTransport` som drop-in.
- `FallbackTransferTransport` får ny ledd: P2P → WS → HTTP.
### Tester
- Loopback unit: offer/answer/ICE i Bun via `node-datachannel` eller
`wrtc`.
- Integration: 100 MB transfer over P2P vs HTTP — P2P skal vinne på samme
nettverk.
- Failover: TURN-relay påtvinger relay-modus.
- NAT-emulering (loopback med ulike NAT-typer hvis mulig).
### Dokumentasjon
- `docs/webrtc.md` — setup, STUN/TURN-config, NAT-traversal-håp og
-realiteter.
---
## Akseptansekriterier
- [ ] To klienter på samme LAN: P2P direct uten TURN, throughput > 5x
HTTP-baseline.
- [ ] To klienter bak strenge NAT'er: TURN-relay aktiveres automatisk.
- [ ] Failover P2P-død → HTTP innen 5 s uten meldingstap.
---
## Avhengigheter
- V3.7 — bridge-mønstre + fallback-arkitektur.
---
## Risiko
- **NAT-traversal-helvete.** Mange edge-cases. Mitiger med tidlige
integration-tester på faktiske NAT-konfigurasjoner.
- **Browser-kompatibilitet.** Safari har sine egne RTC-quirks.
- **TURN-koster.** TURN-relay = ekte trafikk gjennom server. Operatør må
vite det.
---
## Migrasjon
Opt-in. Eksisterende HTTP/WS-transport fungerer uendret.

View File

@@ -0,0 +1,557 @@
# V3.12 — Key Transparency: Designnotat
**Status:** Approved (in-tree review — markeres `Design` i ROADMAP)
**Forfatter:** Shade-teamet
**Reviewer-mål:** ekstern crypto-orientert reviewer før produksjons-deploy.
**Implementasjons-target:** `@shade/key-transparency` + utvidelser i
`@shade/server`, `@shade/transport`, `@shade/sdk`.
---
## 1. Mål og ikke-mål
### Mål
Bytt ut "blind tillit til prekey-server" med en **verifiserbar
append-only log**. Når en klient mottar et prekey-bundle skal den ha
kryptografisk bevis for at:
1. Bundlen er **commit'et** i en tidstemplet log (Signed Tree Head).
2. Den eksakte (adresse, identityKey, signedPreKey)-mappingen står i
den loggen — _eller_ den står ikke (fravær-bevis).
3. Loggen har ikke skrevet om historie siden forrige fetch
(konsistens-bevis).
4. Andre klienter ser **samme** log (split-view-deteksjon via
witness-gossip).
Dette er **CT-style transparens** (RFC 6962-prinsipper) tilpasset
prekey-distribusjon.
### Ikke-mål (eksplisitt ut)
- **Federert log mellom flere prekey-servere.** Hver Shade-deployment
har én log (eller ingen). Multi-server gossip er V3.13+.
- **Løse MITM-på-første-kontakt fullstendig.** KT fanger split-view og
re-write, men ikke det at en angriper publiserer en forfalsket
identitet ved første registrering. Det er V3.3 (fingerprint-gate)
+ V3.10 (social recovery).
- **Legal/compliance audit-log.** Loggen er kryptografisk, ikke juridisk.
- **Klient-styrt sletting.** Append-only — DELETE skriver
tombstone-entry, fjerner ikke historikk.
### Beslutningskriterium for implementasjon
Når dette notatet er godkjent _og_ alle åpne spørsmål under §11 har
konkrete svar (ikke bare "vi finner ut av det senere"), kan kode
skrives. Det notatet ligger på når §11 lukkes er det vi bygger.
---
## 2. Trusselmodell-tillegg
Eksisterende THREAT-MODEL.md dekker prekey-server som "honest-but-curious"
+ tilstede TOFU. KT utvider modellen til **fully-malicious server**:
| Angrep | Pre-V3.12 | Post-V3.12 |
|---|---|---|
| Server returnerer feil bundle for én klient | Uoppdaget til OOB-verifisering | Klient kan be om proof; mismatch oppdages |
| Server bytter en allerede registrert identityKey | TOFU-fingerprint endres → V3.3-gate slår inn (men brukerinitiert) | Loggen vil vise to entries med samme adresse → witness oppdager |
| Server gir `alice` ulike identityKeys til Bob og Charlie (split-view) | Uoppdaget til OOB | Witness-gossip avslører to ulike STH-er |
| Server skriver om historikk for å skjule tidligere svik | Mulig | Konsistens-proof feiler → klient varsler |
| Server nekter å publisere ny STH | Mulig | "Stale STH"-detekteres av friskhetsbevis (max age) |
| Server kompromitterer signing-key for STH | KT-trygghet brutt | Witness gossip om gammel STH-kjede; rotasjon krever ny genesis |
KT løser **ikke**:
- Førstegangs-impersonering av en helt ny adresse (intet historisk
bevismateriale).
- Kollusjon mellom server og _alle_ witnesses.
- Klient som glemmer cached STH og må re-bootstrappe.
---
## 3. Datastruktur-valg
Vi velger **RFC 6962-stil append-only Merkle log** + **ekstern
adresse-index** med commitment-bevis. Begrunnelse:
### Vurderte alternativer
1. **Pure CT-log (RFC 6962):** Simple append-only Merkle tree.
Inklusjonsbevis trivielle. Fravær-bevis _ikke_ støttet
nativt (må scanne hele loggen).
2. **CONIKS-tre (sparse Merkle tree over adresser):** Native fravær-bevis,
men mye mer kompleks (epoch-baserte snapshots, prefix-trees,
placeholder-nodes). Overkill for første iterasjon.
3. **Hybrid (RFC 6962 log + side-index):** Loggen er sannhetskilde,
indexen er en _commitment_-mapping `address → leaf_index`. Server
beviser inklusjon via leaf-path, fravær via "denne adressen er ikke
i indexen ved tree_size T" + signert STH.
**Valg: alternativ 3.** Det gir CT-stil enkelthet, samt fravær-bevis
nesten gratis (commitment til indexen er en del av hver STH).
### Konkret format
#### Leaf
Hver leaf representerer én registrering eller revoke:
```
leaf = SHA256(
0x00 || // leaf prefix (RFC 6962)
uint64_be(timestamp_ms) ||
byte(operation) || // 0x01 register, 0x02 replenish, 0x03 delete
uint16_be(len(address)) || address_bytes ||
uint16_be(len(bundle_hash)) || bundle_hash // 32 bytes SHA-256 over canonical bundle
)
```
`bundle_hash` er deterministisk hash av:
```
canonical_bundle = SHA256(
0x01 || // bundle prefix
identitySigningKey (32) ||
identityDHKey (32) ||
uint32_be(signedPreKey.keyId) ||
signedPreKey.publicKey (32) ||
signedPreKey.signature (64)
)
```
One-time prekeys er **ikke** med i bundle-hashen — de er ephemerale og
ville lekket OTP-rotasjons-mønstre.
#### Tree
Merkle-tre over leaf-array, RFC 6962 §2.1:
- `MTH(empty) = SHA256()`
- `MTH({d}) = SHA256(0x00 || d)` (already hashed leaf)
- `MTH(D[n]) = SHA256(0x01 || MTH(D[0:k]) || MTH(D[k:n]))` der
`k` er største 2-potens < n.
#### Signed Tree Head (STH)
```
sth = {
tree_size: uint64,
timestamp: uint64_ms,
root_hash: bytes(32),
index_root: bytes(32), // commitment til adresse-index ved denne tree_size
log_id: bytes(32), // SHA-256 av server-public-key (stabil ID)
signature: bytes(64) // Ed25519 over canonical(rest)
}
```
`canonical(sth)` for signing:
```
0x02 || // sth prefix
uint64_be(tree_size) ||
uint64_be(timestamp) ||
root_hash (32) ||
index_root (32) ||
log_id (32)
```
#### Inklusjons-bevis
Standard RFC 6962 audit-path: liste med søsken-hasher fra leaf til root,
slik at klient re-beregner root og sammenligner med STH.
#### Konsistens-bevis
Standard RFC 6962 §2.1.2: bevis at tree med `tree_size = N1` er prefix
av tree med `tree_size = N2 > N1`. Klient bruker dette for å detektere
re-write.
#### Fravær-bevis
Adresse-indexen er en sortert liste `(address, leaf_index_of_latest)`
serialized og hashet. `index_root` i STH er commitment.
For å bevise fravær av adresse `addr` ved tree_size `N`:
- Server returnerer hele indexen ved tree_size `N` (sortert), eller
- (effektivt:) Returnerer naboparet `(addr_prev, addr_next)` der
`addr_prev < addr < addr_next` lexikografisk, sammen med en
Merkle-path i en sparse Merkle tree over indexen.
Første iterasjon: vi serialiserer hele indexen og lar klienten
laste den (kompakt: <100 KB selv for 100k adresser). Senere
optimaliserer vi til sparse Merkle tree hvis dataset vokser.
---
## 4. Friskhetsbevis (Signed Tree Heads)
### Frekvens
- **Min:** Ny STH ved hver mutasjon (register/replenish/delete) — synkront
i write-pathen.
- **Maks-stale:** Selv uten mutasjoner skal en STH publiseres minst hver
**10. minutt** ("heartbeat STH" — samme tree_size, oppdatert timestamp).
Dette gir klienter mulighet til å detektere "død" log uten å bekymre
seg om hvorvidt logen faktisk har endret seg.
### Klient-akseptansevindue
Klient avviser STH eldre enn `now - 24 timer` (default, konfigurerbar).
Dette beskytter mot replay av gamle STH-er som "skjuler" en mutasjon
gjort i ettertid.
### Stale-STH som soft-fail
Hvis STH er stale men gyldig signert: klient logger advarsel,
returnerer bundle med `proof.staleness = 'warn'` (V1) eller blokkerer
(V2 etter dogfooding). Vi starter med _warn_, eskalerer til _block_
når witness-økosystem er etablert.
---
## 5. Klient-verifikasjonssteg
På hver `fetchBundle(address)`:
1. Server returnerer `{ bundle, proof: { sth, leaf, audit_path, leaf_index, address_index_proof } }`.
2. Klient verifiserer:
- `sth.signature` mot kjent `log_public_key` (pinnet ved første
bootstrap).
- `sth.timestamp >= now - max_age_ms` (default 24t).
- Re-beregner `leaf_hash` fra bundle og sammenligner med `proof.leaf`.
- Re-beregner `root_hash` fra `audit_path + leaf` og sammenligner med
`sth.root_hash`.
- Verifiserer `address_index_proof` mot `sth.index_root`.
3. Hvis klient har en cached forrige STH: sjekk **konsistens-proof**
mellom forrige og denne. Server publiserer dette i
`GET /v1/kt/consistency?from=<size>&to=<size>`.
4. Hvis klient har en cached STH for samme `tree_size` med ulik
`root_hash`**split-view alarm**.
### Probabilistisk vs. obligatorisk verifisering
Vi velger **obligatorisk** ved hver bundle-fetch. Bundle-fetch er sjelden
(per ny peer, ikke per melding) — kostnaden er <100ms. Probabilistisk
verifisering ville la klienter bli lurt av "én dårlig fetch" uten
deteksjon.
### Bootstrap
Første gang en klient møter en log: pinner `log_public_key` etter å ha
hentet det fra et **ut-av-bånd**-pinningendepunkt eller fra `Shade.config`
(operatør sender den med klient-config). Etterfølgende rotasjon krever
ny genesis-STH med eksplisitt break-event signert av forrige nøkkel.
---
## 6. Witness/auditor-rolle
### Hva en witness gjør
- Periodisk poll: `GET /v1/kt/sth` (hent siste STH).
- Lagrer alle observerte STH-er i append-only lokal store.
- Eksponerer `GET /witness/sth?log_id=...&tree_size=...` slik at andre
klienter kan sammenligne hva _denne_ witnessen har sett.
- Verifiserer konsistens mellom hver ny STH og forrige.
### Klient-witness-gossip
Klient-bibliotek kan operere i tre moduser:
1. **Observe-only:** verifiserer kun bundle den selv henter, ingen
gossip.
2. **Light-witness:** poller STH hver `Xt` og lagrer lokalt; sammenligner
med STH levert ved bundle-fetch.
3. **Full-witness:** publiserer signerte STH-observasjoner til en
konfigurert peer-liste eller offentlig endpoint.
V1 leverer 1 og 2. Mode 3 (full-witness publication-protocol) er V2
hvis økosystem trenger det.
### Hvem kjører witnesses?
- Shade-prosjektet kjører **referanse-witness** på offentlig endpoint
(separate-from-prekey-server).
- Power-users / operatører kan kjøre egne via `@shade/key-transparency/witness`-
API.
- Tredjeparts auditors (typisk security-research) er invitert.
Vi krever **ikke** federation/konsensus mellom witnesses i V1 — gossip
er rent "har du sett samme STH som meg?".
---
## 7. Operatørkost
### Lagring
- **Per leaf:** 32 bytes (hash) + ~80 bytes adresse-index entry =
~112 bytes.
- **100k adresser, 1 rotasjon/år, 1 replenish/uke:** ~5.4M leaves =
~600 MB log. Tre-strukturen er beregnet on-demand, ikke lagret.
- **Index:** ~100k × 80B = 8 MB i minne (cacheable).
### CPU
- STH-signing: 1 Ed25519-signering per mutasjon + heartbeat = <1k/dag for
små deployments. Trivielt.
- Audit-path-beregning: O(log N) ved fetch. <1ms.
- Konsistens-proof: O(log N).
### Backup
Logen MÅ aldri miste data — sletting eller corruption ødelegger
integritet permanent. Strategi:
- Loggen lagres som append-only tabell `shade_kt_log` (PG) med
`(leaf_index, leaf_hash, leaf_data_json)`.
- Backup hver time + WAL-shipping anbefalt.
- Ved corruption: se §10 Recovery.
### STH-signing-key
- Genereres ved første KT-aktivering, lagres i operatør-styrt secret
(env, KMS, eller på disk for hjemme-server).
- Rotasjon: **breaking event** — krever ny genesis-STH der ny key
signerer melding "rotated from ${old_key}" med _gammel_ key. Klient
må eksplisitt akseptere rotasjonen.
---
## 8. Migrasjon
### Server-side
KT er **opt-in** på operatør-nivå. `createPrekeyServer({ keyTransparency:
{ enabled, store, signingKey } })`. Når slått på:
1. Server skriver alle eksisterende identiteter inn som genesis-leaves
ved boot.
2. Første STH publiseres med `tree_size = N` der N er antall
eksisterende adresser.
3. Klient som henter bundle får proof; klient som ikke støtter KT
ignorerer proof-felt (forward-compatible).
### Klient-side
`@shade/sdk`-config:
```ts
createShade({
keyTransparency: {
mode: 'observe' | 'light-witness' | 'off',
logPublicKey: '<base64>',
maxStaleMs: 86_400_000,
},
// ...
})
```
`mode: 'off'` (default for backward-compat første release) — ignorerer
proof. Ny SDK med `mode: 'observe'` verifiserer men feiler ikke harde
hvis proof mangler. `mode: 'observe-strict'` (senere) krever proof.
### Eksisterende deployments
Operatør kan rulle KT inn på live server uten klient-update:
1. Skru på KT i server-config → server begynner å produsere proofs.
2. Gamle klienter ignorerer proof-felt (de er additive i bundle-respons).
3. Nye klienter med `mode: 'observe'` begynner å verifisere.
4. Når operatør har testet og publisert log-public-key OOB, kan brukere
skifte til `'light-witness'`.
---
## 9. Akseptansekriterier
- [ ] `@shade/key-transparency` pakke leverer:
- Merkle log core (RFC 6962 hash-funksjoner).
- STH-signing/verifikasjon.
- Inklusjons-bevis generering + verifisering.
- Konsistens-bevis generering + verifisering.
- Adresse-index med commitment.
- Witness-light klient.
- Cross-platform (TS-only, ingen native deps).
- [ ] `@shade/server` integrasjon:
- `KTLogStore`-interface (memory + postgres).
- Routes: `GET /v1/kt/sth`, `GET /v1/kt/sth/:tree_size`,
`GET /v1/kt/consistency`, `GET /v1/kt/inclusion/:address`.
- Bundle-fetch returnerer `{ bundle, proof }` når KT aktivert.
- Heartbeat-STH-publisering hver 10. minutt (configurable).
- [ ] `@shade/transport` `ShadeFetchTransport`:
- Aksepterer optional `keyTransparency`-verifier.
- `fetchBundle()` returnerer `{ bundle, proof?: KTProof }`.
- [ ] `@shade/sdk` `Shade`:
- `keyTransparency`-config.
- Verifiserer proof ved hver bundle-fetch når aktivert.
- Cacher STH for split-view-deteksjon.
- [ ] **End-to-end test: split-view detection.**
- Test-server gir Bob bundle X, Charlie bundle Y for samme adresse `alice`.
- Bob+Charlie kjører som light-witness, gossiper STH-er.
- Test asserter at mismatch detekteres innen N polls.
- [ ] **End-to-end test: log re-write detection.**
- Server skriver om historie (test-only API).
- Konsistens-proof feiler på neste fetch.
- [ ] Operatør-doc dekker recovery-strategi.
- [ ] CHANGELOG, README, ROADMAP oppdatert.
- [ ] Cross-platform vector-test for Merkle hash + STH (Android/TS
paritet, samme som V3.5-tradisjonen).
---
## 10. Recovery
### Log corruption
Hvis log-data tapes (disk-feil før backup): **kan ikke gjenopprettes
uten å miste integritet** — det er hele poenget.
Recovery-prosedyre:
1. Operatør publiserer "log-restart" event signert med STH-keyen.
2. Genesis-STH genereres på nytt med ny `log_id` (= ny offentlig nøkkel
eller eksplisitt versjon).
3. Klienter som har cached STH-er fra gammel log varsles via
eksplisitt diskrepans i `log_id`.
4. Brukere som er bekymret må OOB-verifisere identiteter (V3.3-gate
trigges automatisk for fingerprint-rotasjon).
### Stale signing-key
Hvis STH-keyen lekkes: rotasjon krever break-event (§7). Inntil
brukerne aksepterer ny key, oppfører cient-bibliotek seg som om STH
mangler (soft-fail i `observe`-mode, blokkerer i `observe-strict`).
---
## 11. Åpne spørsmål (lukket før kode)
| Spørsmål | Svar |
|---|---|
| Hvordan distribueres `log_public_key` til klient første gang? | Operatør embedder i `Shade.config` ved app-init. OOB-pinning er fallback. |
| Skal one-time prekeys være med i bundle-hash? | Nei — ephemerale, og deres rotasjon ville støy-fylle loggen. |
| Konflikt: STH ved hver mutasjon vs. batched STH? | Per mutasjon. Heartbeat hver 10 min uansett. Batching vurderes som optimalisering hvis throughput blir et problem (ikke nå). |
| Hva skjer ved replenish (kun OTP-tilført)? | Skriver ikke til log (bundle-hash uendret). Heartbeat-STH dekker friskhet. |
| Hva med DELETE? | Skriver tombstone-leaf med `operation = 0x03`. Identiteten i indexen markeres som "deleted at tree_size N". |
| Sparse Merkle tree for index-proof? | Senere — V1 bruker hele indexen i fravær-proof. <100 KB ved 100k adresser er akseptabelt. |
| Klient-cache eviction-policy for STH? | LRU på `log_id`, last-N (default 100). Klient holder _alltid_ siste sett STH. |
| Witness-publication-protokoll? | V1 har poll-only (`GET /witness/sth`); push-publication er V2. |
Alle åpne spørsmål har konkrete svar. Implementasjon kan starte.
---
## 12. Pakke-struktur
```
packages/shade-key-transparency/
├── package.json # @shade/key-transparency, v0.4.0
├── src/
│ ├── index.ts # Public exports
│ ├── hashes.ts # RFC 6962 leaf/node hashing
│ ├── log.ts # MerkleLog (in-memory) + audit-path
│ ├── consistency.ts # Consistency-proof gen/verify
│ ├── sth.ts # STH sign / verify / canonical bytes
│ ├── index-tree.ts # Address index commitment
│ ├── proof.ts # KTProof type + bundle-proof verifier
│ ├── store.ts # KTLogStore interface (server-side)
│ ├── memory-store.ts # In-memory KTLogStore
│ ├── witness.ts # Light-witness client
│ └── errors.ts # KT-specific error types
└── tests/
├── hashes.test.ts
├── log.test.ts # RFC 6962 test vectors
├── consistency.test.ts
├── sth.test.ts
├── index-tree.test.ts
├── proof.test.ts
└── split-view.test.ts # End-to-end split-view detection
```
Server-integrasjon i `@shade/server`:
```
packages/shade-server/src/
├── kt-routes.ts # /v1/kt/* routes
├── kt-integration.ts # Hook bundle-fetch + register/delete to log
└── ...
```
Postgres-implementasjon i `@shade/storage-postgres`:
```
packages/shade-storage-postgres/src/
├── postgres-kt-store.ts # KTLogStore on PG
└── ...
```
Klient-integrasjon i `@shade/transport` + `@shade/sdk`:
```
packages/shade-transport/src/
├── kt-verifier.ts # Proof-verifier for fetchBundle
└── ...
packages/shade-sdk/src/
├── kt.ts # Shade.keyTransparency config + cache
└── ...
```
---
## 13. Test-strategi
1. **RFC 6962 test-vektorer:** importer kjente vektorer fra
<https://datatracker.ietf.org/doc/html/rfc6962#appendix-A>.
2. **Property-tests (fast-check):** for hver tree_size N og hvert
leaf-index i: `verify(audit_path(i, N), leaf, sth) === true`.
3. **Konsistens-bevis property-tests:** for N1 < N2:
`verify_consistency(proof, sth1, sth2) === true`.
4. **Split-view e2e:** to klienter, ondsinnet test-server, witness
gossip oppdager mismatch.
5. **Re-write-detection e2e:** server muterer log-historie, klient
neste fetch får konsistens-proof som feiler.
6. **Cross-platform:** Android (Kotlin) + TS gir samme leaf-hash for
samme bundle (V3.5-paritet er forutsetning, så dette må også gå
gjennom kotlin-port; for V3.12 første release dekker vi TS — Android
port er V3.13).
7. **Stale STH:** klient avviser STH > max_age.
8. **Bootstrap-pinning:** klient feiler hvis log_public_key ikke matcher.
---
## 14. Sikkerhetsvurdering
- **Falsk trygghet hvis halvveis:** Avhjelpes ved at default-mode er `'off'`,
bare _eksplisitt_ aktivert KT gir hardere garantier. Dokumentasjon
fremhever at `'observe'` er observasjon, ikke obstruksjon, til
økosystemet er etablert.
- **Server-side mutability av historie:** Avhjelpes ved at `KTLogStore`
kun har `append()` — ingen `update()`/`delete()` på historiske leaves.
PG-tabellen har CHECK constraint og BEFORE-triggers for ekstra defense
in depth (se §7).
- **STH-key compromise:** dokumentert §10. Operatør-ansvar.
- **DoS via massive index-proofs:** index-proof er i V1 hele indexen.
100 KB per fetch er overkommelig; rate-limiteren dekker excess.
- **Replay av gammel proof:** STH-timestamp + max_age beskytter.
---
## 15. Approval
Når dette notatet er reviewed (in-tree review er nok for å kommitte
første implementasjon; ekstern crypto-review er pre-deploy-krav per
V3.12 §"Pre-requisite designnotat"), kan implementasjon starte.
**Implementasjon-rekkefølge** (alle commits i samme branch):
1. `@shade/key-transparency` core (Merkle log, STH, proofs).
2. Server-integrasjon (`@shade/server` + memory/postgres KTLogStore).
3. Klient-integrasjon (`@shade/transport` verifier + `@shade/sdk` config).
4. Witness-light + e2e split-view-test.
5. Operatør-doc + CHANGELOG + README + ROADMAP.
— end of design —

99
docs/archive/V3.12.md Normal file
View File

@@ -0,0 +1,99 @@
# Shade V3.12 — Key Transparency
**Status:** Done (0.4.0). Designnotat: `docs/V3.12-DESIGN.md`.
Operatør-/recovery-guide: `docs/key-transparency.md`.
**Effort:** XXL (4+ måneder, multi-quarter)
**Forrige:** V3.5 (hovedplattformene stabile først)
**Adresserer:** V2.3 §1A
---
## Mål
Reduser tillit til prekey-server fra "blind tillit" til "verifiserbar log".
Når serveren utleverer et bundle, skal det være kryptografisk forpliktet i
en **append-only log** som klienter (eller tredjeparts-auditors) kan
verifisere. Et split-view-angrep der serveren viser ulike bundles til ulike
klienter blir fanget av gossip.
---
## Pre-requisite: designnotat
**Ingen kode før dette er review'd og approved:**
- Trusselmodell-tillegg: hva CT/attest faktisk løser, hva som forblir åpent.
- Datastruktur-valg: append-only Merkle log (CT-stil), CONIKS-tre, eller
hybrid.
- Friskhetsbevis: hvor ofte signed tree heads utgis; hva er "stale"?
- Klient-verifikasjonssteg: må klient verifisere på hver bundle-fetch,
eller probabilistisk?
- Witness/auditor-rolle: hvem kjører dem? Hvordan gossip mellom klienter?
- Operatørkost: log-størrelse, signing-frekvens, backup-strategi.
- Migrasjon: eksisterende prekey-server → log-utvidet.
Designnotatet er en `docs/V3.12-DESIGN.md`-PR som må review'es av minst én
ekstern crypto-orientert reviewer.
---
## Mulig scope (etter designnotat)
### Inn (estimat)
- Append-only log som tillegg til prekey-server.
- Inklusjons-bevis ved bundle-fetch (Merkle-path).
- Fravær-bevis for "denne adressen har ikke registrert siden timestamp T".
- Signed tree heads (STH) publisert på fast interval.
- Klient-bibliotek: `@shade/key-transparency` med verifisering.
- Witness-API: tredjeparts-auditor kan hente STH-er og logge gossip.
### Ut (eksplisitt)
- Federated log (multi-server gossip) — for stort for første iterasjon.
- Legal/compliance-side av audit-log.
- "Vi løser MITM-på-første-kontakt-helt" — KT alene fanger split-view, ikke
første-kontakt.
---
## Risiko-vurdering
KT er det **vanskeligste enkeltpunkt** i hele roadmapen:
1. **Halvveis-implementert KT er verre enn ingen KT** — gir falsk trygghet,
brukere slutter å verifisere OOB.
2. Operativt komplekst — log må aldri skrive om historie. En enkelt
restart-bug = ødelagt integritet.
3. Klient-verifikasjons-logikk må kjøre på hver bundle-fetch, eller
risikere at én "gammel" klient blir lurt.
4. Witness-økosystem krever uavhengige aktører — Shade alene kan ikke
garantere det.
**Beslutningskriterium:** Hvis designnotatet etterlater åpne "hvordan
håndterer vi X?"-spørsmål uten klare svar, parker V3.12. Pragmatisk
alternativ er **V3.3 (fingerprint-gate)** + **V3.10 (social recovery)**
som sammen gir 80 % av MITM-beskyttelsen uten KT-kompleksiteten.
---
## Akseptansekriterier (hvis det implementeres)
- [ ] Designnotat passert ekstern review.
- [ ] Klient detekterer split-view i ende-til-ende-test (server gir to
versjoner av samme adresse → klient fanger mismatch).
- [ ] Witness-API testet med minst én ekstern auditor-instans.
- [ ] Operatør-doc dekker recovery hvis log korrumperer.
---
## Avhengigheter
- V3.5 — Android/TS paritet må være solid før vi legger på et nytt
verifikasjons-lag.
---
## Migrasjon
Helt opt-in. Operatører som ikke ønsker KT kjører videre uendret.

146
docs/archive/V3.2.md Normal file
View File

@@ -0,0 +1,146 @@
# Shade V3.2 — At-Rest Storage Encryption
**Status:** Implementert (0.4.0) — `@shade/storage-encrypted`, `@shade/keychain`,
`shade migrate-storage`, `shade rotate-storage-key`
**Effort:** L (48 uker)
**Forrige:** V3.1
**Adresserer:** V2.1 §2
---
## Mål
Opt-in beskyttelse av sensitiv state — identity-nøkler, session-state, valgfri
stream-resume-secret — med nøkler som **ikke** ligger i klartekst i databasen.
Trusselmodellen sier i dag eksplisitt at en stjålet DB eksponerer private
nøkler; dette løser det for deploys som velger å aktivere det.
---
## Scope
### Inn
- Ny `EncryptedStorageProvider`-wrapper som dekorerer `SQLiteStorage` /
`PostgresStorage`.
- Per-rad AES-256-GCM på sensitive felter (`identity_*`, `session_*`,
valgfritt `stream_state.streamSecret`).
- KDF-pluggin (default `scrypt` fra `@noble/hashes`) for passphrase-basert
master-nøkkel.
- Tre nøkkelkilder ut av boksen:
1. **Passphrase + KDF** — utvikler oppgir secret ved oppstart.
2. **OS keychain** — macOS Keychain, Linux libsecret, Windows Credential
Vault (Node-only).
3. **App-injected key** — appens egen kode forsyner 32-byte nøkkel (mest
fleksibel).
- Migrasjons-CLI: `shade migrate-storage --encrypt --key-source=...`.
- Trusselmodell-oppdatering: "når enabled, hva er fortsatt udekket" — memory
compromise, swap, runtime-tap.
### Ut
- Browser/IndexedDB at-rest (egen pakke, vurderes etter V3.8).
- HSM/Secure Enclave (separate driver senere).
- "Always-on by default" — vi flyger opt-in for å ikke bryte eksisterende
deploys.
---
## Design
### Krypteringsenhet
- Per-rad AEAD: `nonce(12) || ciphertext || tag(16)`.
- `nonce = HKDF(rowKey, "shade-row-nonce-v1" || tableName || pk)[..12]`
deterministisk per (tabell, pk) for å unngå nonce-reuse uten å lagre nonce
separat. Endring av (tabell, pk) → re-encryption.
- AAD binder `tableName || columnName || pk` så feltombytting blokkeres.
### Nøkkelhierarki
```text
masterKey (fra kilde — passphrase / keychain / app-injected)
├─ HKDF("shade-storage-v1") → storageKey (32 bytes)
│ │
│ └─ HKDF(storageKey, table || column) → fieldKey
└─ HKDF("shade-storage-version-v1") → versjonsnøkkel (rotasjon)
```
### Migrasjon
1. CLI leser ukryptert DB.
2. Skriver rad-for-rad-kryptering til ny `_v2`-tabell.
3. Atomisk rename + drop gammel.
4. Backup `.bak`-fil etterlatt i samme dir.
### Rotasjon
- `shade rotate-storage-key --new-source=...` re-krypterer med ny masterKey.
- Online ratchet (les med gammel, skriv med ny) for store DB.
---
## Leveranser
### Pakker
- Ny modul: `@shade/storage-encrypted` (re-export over SQLite/PG).
- Utvidelse i `@shade/cli`: `migrate-storage`, `rotate-storage-key`.
- Hjelpe-pakke: `@shade/keychain` (Node-only, valgfri peer-dep) for OS-keychain.
### Tester
- Unit: KDF-derivasjon, nonce-determinisme, AAD-binding.
- Integration: full lifecycle på SQLite + PG; start/stopp; krasj under
migrasjon.
- Tamper: bit-flip i ciphertext / AAD / nonce → dekrypterings-feil.
- Vector-fil: kryss-sjekk masterKey → fieldKey-derivasjon mot
`test-vectors/storage-encryption.json`.
### Dokumentasjon
- `docs/storage-encryption.md` — full guide.
- `THREAT-MODEL.md` — ny kolonne "with at-rest enabled".
- Migrasjonsnotat i `MIGRATION.md`.
---
## Akseptansekriterier
- [ ] Eksisterende ukryptert deploy fortsetter uten endringer (opt-in).
- [ ] `shade migrate-storage --encrypt` migrerer en levende SQLite uten
datatap, verifisert med dump-diff.
- [ ] Rotasjon kan gjøres uten downtime > 5 s for små DB.
- [ ] Wrong passphrase / wrong key → klar feilmelding, ikke krasj.
- [ ] Test-vectors deles med Android-implementasjonen (V3.5 forplikter at
vector-filen kjøres der).
---
## Avhengigheter
- V3.1 — `THREAT-MODEL.md` skal være lenket til testene først, så vi kan
utvide tabellen.
---
## Risiko
**Datatap.** En migrasjon som krasjer halvveis kan etterlate korrupt DB.
Mitigeres ved:
- Atomic-rename + `.bak`-fil.
- Dry-run-modus (`--dry-run` validerer all dekryptering før skriving).
- Refuser å starte hvis WAL har uncommitted writes.
**Nøkkeltap = totaltap.** Hvis bruker mister passphrase = ingen tilgang.
Dokumenter klart, og pek på V3.10 (Social Recovery) som langtidsløsning.
---
## Migrasjon
0.3.x deploys er ukrypterte → fortsatt ukrypterte. Aktivering er én
CLI-kommando. Backwards-kompatibel.

147
docs/archive/V3.3.md Normal file
View File

@@ -0,0 +1,147 @@
# Shade V3.3 — Fingerprint Gates & Trust UX
**Status:** Done
**Effort:** M (24 uker)
**Forrige:** V3.1
**Adresserer:** V2.3 §1B
**Implementert:** se `docs/trust-ux.md`
---
## Mål
Gjør safety numbers **handlingspålagte** — ikke bare synlige — i flyt der
MITM-risikoen er reell. I dag finnes `FingerprintCompare`-widget og
`requireFingerprintVerifiedFor` i `@shade/files`, men hovedkjernen
(`Shade.send`, first-large-file, backup-import) har ingen automatisk gate.
Resultat: alert-fatigue-fri, men også gate-fri.
Dette legger inn **eksplisitt blokkerende verifisering** på et lite antall
kritiske hendelser, plus widget-støtte for å eksponere det i UI.
---
## Scope
### Inn — kritiske hendelser
1. **Før første store fil**`Shade.upload` over en bytes-terskel uten
verifisert peer.
2. **Før backup-import**`Shade.importBackup` blokkerer til peer (eller egen
identitet) er bekreftet.
3. **Ny enhet med rotert identitet**`acceptIdentityChange` blokkerer på
første bruk inntil verifisert.
4. **Før `@shade/inbox` fan-out** (V3.6) — gate per mottaker.
### Inn — APIer
- `Shade.beforeFirstLargeFile(threshold, handler)` — appen får mulighet til å
vise modal og returnere bekreftelse.
- `Shade.beforeBackupImport(handler)` — samme mønster.
- `Shade.beforeNewDeviceTrust(handler)` — ditto.
- `Shade.markPeerVerified(address)` / `Shade.isPeerVerified(address)`
persistent state.
### Inn — widgets
- `<FingerprintGate />` — render-prop wrapper som blokkerer barn til
verifisert.
- `<FingerprintCompare />` utvides med "kopier OOB-tekst" + "jeg har
verifisert".
### Ut
- "Tving alle peers verifisert før hver melding" — alert fatigue.
- Cross-device sync av verified-state (kommer evt. via V3.6 inbox).
---
## Design
### Persistent verified-state
Ny tabell `peer_verifications`:
```sql
CREATE TABLE peer_verifications (
peer_address TEXT PRIMARY KEY,
fingerprint TEXT NOT NULL,
verified_at INTEGER NOT NULL,
verified_by TEXT, -- "user" | "transitive" | "tofu-after-warning"
identity_version INTEGER NOT NULL -- knytter verifikasjon til identity-rotasjon
);
```
Når peer roterer identitet → `identity_version` bumper → verifikasjon "ugyldig"
til ny verifisering.
### Hook-flyt
```text
shade.upload(peer, file)
├─ if !verified(peer) AND file.size > threshold
│ │
│ └─ await beforeFirstLargeFileHandler(peer, fingerprint)
│ ├─ true → markPeerVerified(peer); proceed
│ └─ false → throw FingerprintNotVerifiedError
└─ proceed
```
---
## Leveranser
### Kode
- `@shade/core``peer_verifications`-tabell + storage methods.
- `@shade/sdk` — gate-hooks + `markPeerVerified` / `isPeerVerified`.
- `@shade/widgets``<FingerprintGate />`, utvidet `<FingerprintCompare />`.
### Tester
- Unit: gate kalles, ikke kalles, retur false → throw, retur true → proceed.
- Integration: fil < threshold går gjennom uten gate; fil > threshold
blokkerer.
- Identity-rotasjon ugyldiggjør verifikasjon.
- Backup-import blokkerer.
### Dokumentasjon
- `docs/trust-ux.md` — guide til hvilke gates som finnes og når de bør tunes.
---
## Akseptansekriterier
- [ ] Gate kan ikke bypasses ved å nulle `threshold` ut — minimum gate finnes
alltid for backup-import og new-device.
- [ ] App uten registrerte gates får sane defaults (logger en warning, men
kjører — ikke krasj).
- [ ] Identity-rotasjon resetter verifikasjon i en testet ende-til-ende-flow.
- [ ] Widget kan rendres SSR uten å trigge runtime-gate.
---
## Avhengigheter
- V3.1 — threat-matrise oppdatert til å vise hvilke gates som dekker hvilke
rader.
---
## Risiko
- **Alert fatigue.** Hvis terskler er for lave → bruker klikker blindt.
Mitiger ved å sette default-terskler høyt (10 MiB for first-large-file)
og dokumenter justerings-guide.
- **DX-friksjon.** Apper som ikke vet om gates får uventede prompts. Mitiger
ved å logge tydelig ved første aktivering: "Shade.beforeFirstLargeFile not
configured — using default modal".
---
## Migrasjon
0.3.x apps får defaults aktivert med warning. Ingen breaking change.

124
docs/archive/V3.4.md Normal file
View File

@@ -0,0 +1,124 @@
# Shade V3.4 — Observability v2 (OpenTelemetry)
**Status:** Implementert (2026-05-02) — `@shade/observability` 0.1.0,
hekt inn i sdk/transfer/server/files/core. Off by default; flip
`SHADE_OTEL_ENABLED=1` for å aktivere.
**Effort:** M (24 uker)
**Forrige:** V3.1
**Adresserer:** V2.3 §4
---
## Mål
Gi produksjonsteam **distribuerte spor** rundt `TransferEngine`,
prekey-routes og `@shade/files` — uten å lekke plaintext-adresser, payloads
eller eksakte chunk-størrelser. Bygger videre på Prometheus-metrics som
allerede finnes.
---
## Scope
### Inn
- Opt-in OpenTelemetry-instrumentasjon via `@opentelemetry/api`.
- Spans rundt:
- `TransferEngine.upload` / `.download` (med lane-tags, retry-counts).
- `ShadeSessionManager.encrypt` / `.decrypt` (per-peer mutex-akkvisisjon,
ratchet-step).
- `createPrekeyRoutes` (per route, status-koder).
- `@shade/files` op-handlers (har allerede `onMetric` — utvides til OTel).
- PII-policy-doc: hva som **aldri** logges, hva binnes, hva pseudonymiseres.
- Sample-policy default off; on med `SHADE_OTEL_ENABLED=1`.
### Ut
- Trace-eksport til SaaS-leverandører (det er deploy-konfig, ikke vår kode).
- Logg-aggregering — `@shade/server` har allerede strukturert JSON.
---
## Design
### Span-attributter
| Attribute | Verdi |
|-----------|-------|
| `shade.peer.hash` | `sha256(address).slice(0, 8)` — stabil pseudonym |
| `shade.bytes.bin` | binnet — `"≤4KB"`, `"464KB"`, `"64KB1MB"`, `"≥1MB"` |
| `shade.lane.count` | 1 / 4 / 16 |
| `shade.retry.count` | int |
| `shade.error.code` | `SHADE_*`-kode |
**Aldri:** `shade.peer.address`, `shade.payload`, `shade.bytes.exact`.
### API
```ts
import { withTracer } from '@shade/observability';
const shade = await createShade({
...,
observability: withTracer(myTracer, { sample: 0.1 }),
});
```
`withTracer()` er no-op hvis `tracer` er `undefined` eller
`SHADE_OTEL_ENABLED` ikke er satt.
---
## Leveranser
### Pakker
- Ny submodul `@shade/observability` (peer-dep `@opentelemetry/api`).
- Hooks i `@shade/sdk`, `@shade/transfer`, `@shade/server`, `@shade/files`.
### Tester
- Span emitteres med riktige attributter (mock tracer).
- Sample-rate respekteres.
- Off-by-default verifisert.
- Regex-grep mot recorder fanger plaintext-PII.
### Dokumentasjon
- `docs/observability.md` — setup + PII-policy.
- `docs/DEPLOYMENT.md` — environment-variabler.
---
## Akseptansekriterier
- [x] Default deploy uten OTel: ingen performance-regresjon (`withTracer`
returnerer delt `NOOP_HOOK` når `SHADE_OTEL_ENABLED` ikke er satt).
- [x] Med OTel på: spans for upload/download (`shade.transfer.upload`,
`shade.transfer.download`), prekey-routes (`shade.prekey.request`),
session encrypt/decrypt (`shade.session.{encrypt,decrypt}`), og
`@shade/files` ops (`shade.files.op`).
- [x] Automatisert grep-test fanger plaintext-PII i spans
(`packages/shade-observability/tests/integration-pii.test.ts` +
`packages/shade-transfer/tests/observability.test.ts`,
`safeAttribute()` blokkerer fra-utvikler-introduksert PII).
---
## Avhengigheter
- V3.1 — basis-docs.
---
## Risiko
- **Performance-overhead.** Mitiger ved aggressiv default-off + sampling.
- **PII-lekkasje** hvis utviklere legger til egne attributter. Mitiger ved
å publisere "safe attribute"-helpers og PII-linter.
---
## Migrasjon
Ingen — opt-in.

125
docs/archive/V3.5.md Normal file
View File

@@ -0,0 +1,125 @@
# Shade V3.5 — Android Parity & Cross-Platform CI
**Status:** Done (kryptografisk lag + CI-gate). Android-KeystoreStorage og scrypt/argon2id-paritet er post-GA-arbeid sporet i `android/shade-android/ROADMAP-ANDROID.md` — ikke en 4.0 GA-blocker.
**Effort:** XL (24 måneder, parallelliserbar)
**Forrige:** V3.1
**Adresserer:** V2.1 §3
---
## Mål
Gjør Kotlin-implementasjonen **byte-kompatibel** med TS-implementasjonen, og
forsegle paritet via **CI-gate** som kjører delte test-vectors i begge språk.
Ingen "production"-label på Android før ratchet + proto + streams 0x11 er
grønne.
---
## Scope
### Inn — paritet-sjekkpunkter (eksplisitt)
1. **KDF-chain** — root key + chain key derivasjoner.
Vector: `test-vectors/kdf-chain.json`.
2. **HKDF** — labels for `info`-felt.
Vector: `test-vectors/hkdf.json`.
3. **X3DH** — full agreement med samme bundles.
Vector: `test-vectors/x3dh.json`.
4. **Ratchet message** — encrypt/decrypt roundtrip (legg til vector).
5. **Fingerprint** — 60-digit safety number.
Vector: `test-vectors/fingerprint.json`.
6. **Wire format 0x02** — encode/decode.
Vector: `test-vectors/wire-format.json`.
7. **Streams 0x11** — multi-lane chunk encryption (M-Cross 3, ikke i M-Cross 1).
8. **Backup-format** — passphrase-basert KDF + AES-GCM payload.
### Inn — milestoner
- **M-Cross 1 ✅** — keys + HKDF + X3DH + fingerprint.
- **M-Cross 2 ✅** — ratchet step (encrypt + decrypt roundtrip) + wire 0x02
(RatchetMessage + PreKeyMessage med/uten OTPK). Vector-versjon `2`.
- **M-Cross 3 ✅** — streams 0x11 (KDF, deterministic chunk nonce/AAD, wire 0x11
encode/decode). End-to-end socket interop pending; ikke gating-blokker.
- **M-Cross 4 ✅** — backup-format HKDF + AEAD, gruppe sender-keys
(kdfChainKey + Ed25519 sign(aad ‖ ct)), storage-HKDF (storageKey,
fieldKey, rowNonce). Gjenstående: scrypt master-key (Bouncy Castle),
argon2id-bytte, Android-KeystoreStorage som søsken-modul.
### Inn — CI
- Gitea Actions matrix-job:
- Bun-runner kjører `bun test:vectors` mot `test-vectors/*.json`.
- Gradle-runner kjører `./gradlew vectorTests` mot samme filer.
- PR-gate: begge må passere.
- Vector-genereringsskript (`scripts/generate-vectors.ts`) finnes — utvid
til 7 + 8.
### Ut
- iOS — egen Swift-port er framtidig roadmap, ikke V3.5.
- Native bindings i `shade-android` (vi bruker Tink i JVM-kode).
---
## Leveranser
### Kotlin
- Full ratchet-implementasjon (M-Cross 2).
- Wire 0x02 encode/decode.
- Streams 0x11 (M-Cross 3).
- Tink-storage-adapter med Keystore.
### Test-vectors
- Utvid `scripts/generate-vectors.ts` med ratchet-step + streams + backup.
- Versjons-tag på vector-filer (`{ "version": 2, ... }`).
### CI
- `.gitea/workflows/cross-vectors.yml` — Bun + Gradle matrise.
- Fail-policy: hvis vector-fil endres, **begge** runners må publisere
passing før merge.
### Dokumentasjon
- `android/shade-android/ROADMAP-ANDROID.md` — eksplisitte milestoner +
status per sjekkpunkt.
- `docs/cross-platform.md` — hvordan legge til en ny vector + hvordan
kjøre lokalt.
---
## Akseptansekriterier
- [ ] M-Cross 2: TS-encrypted melding kan dekrypteres av Kotlin-klient og
omvendt, end-to-end-test.
- [ ] CI-jobben feiler innen 60 s ved bevisst byte-divergens.
- [ ] M-Cross 3: 1 MiB streams-fil over 4 lanes mellom TS-server og
Kotlin-klient verifisert.
- [ ] Ingen public release med "production"-label før M-Cross 2 er grønn.
---
## Avhengigheter
- V3.1 — `cross-platform.md` lever der.
---
## Risiko
- **Tink-mismatch.** Tink HKDF-info-encoding kan avvike fra
`@noble/hashes`. Mitiger med tidlig vector-test (M-Cross 1 dekker dette).
- **Endian / encoding.** Wire 0x02 bruker big-endian — Kotlin
`ByteBuffer` default er big-endian, men streams-nonce-konstruksjon må
gjennomgås.
- **Maintainer-kapasitet.** Kotlin-port + TS-port må holdes i sync.
Vector-CI er primær mitigasjon.
---
## Migrasjon
Eksisterende M-Cross 1 scaffold beholdes; alt nytt bygges på den.

123
docs/archive/V3.6.md Normal file
View File

@@ -0,0 +1,123 @@
# Shade V3.6 — Async Store-and-Forward (Inbox)
**Status:** Done
**Effort:** L (48 uker)
**Forrige:** V3.4
**Adresserer:** V2.2 §2
**Implementert:** se `docs/inbox.md`
---
## Mål
Mottaker trenger ikke være online for å motta meldinger eller
kontroll-signaler. En **dedikert relay/inbox-tjeneste** holder
**ciphertext-blobs** med TTL og auth. Server ser aldri plaintext;
prekey-server forblir public-keys-only.
---
## Scope
### Inn
- Ny pakke: `@shade/inbox` (klient) + `@shade/inbox-server` (server).
- HTTP API:
- `POST /v1/inbox/:address` — signed PUT av blob (med TTL).
- `GET /v1/inbox/:address/since/:cursor` — auth'd fetch.
- `DELETE /v1/inbox/:address/:msgId` — leasing/ack.
- Replay-beskyttelse på applikasjonslag (`msgId = sha256(ciphertext)`).
- Push-hook (vendor-nøytral): `inbox.onMessageQueued(handler)`-callback.
- Outgoing queue i klient: lagrer ciphertext lokalt til server bekrefter
PUT.
- Idempotent PUT (samme `msgId` returnerer 200, ikke 409).
### Ut
- Mobile push (FCM / APNs) — utenfor scope; vi eksponerer hook'en.
- Federation mellom inbox-servere — egen sak senere.
- Plaintext-metadata-adresser — vi støtter pseudonyme address-hashes som
privacy-modus.
---
## Design
### Auth
- PUT er **signed** med avsenders Ed25519 (samme som prekey).
- GET krever signed challenge fra mottaker (pull, ikke push).
- Replay-window ±5 min, samme som prekey.
### Wire
- Eksisterende `@shade/proto`-envelope, transportert som body.
- Server lagrer **kun**:
`address || msgId || ciphertext-bytes || expires_at`.
### Lifecycle
1. Avsender encrypter via `Shade.send` → får envelope.
2. Avsender PUT'er envelope til mottaker-inbox med TTL (default 7 dager).
3. Mottaker poller (eller får push-trigger) — fetcher alle siden cursor.
4. Mottaker decrypter; ack'er via DELETE for tidlig prune.
### Storage
- SQLite + Postgres backends (samme mønster som prekey).
- Indeks: `(address, expires_at)`.
- Cron prune.
---
## Leveranser
### Pakker
- `@shade/inbox` — klient + queue.
- `@shade/inbox-server` — Hono routes + storage adapter.
### Tester
- Unit: signed PUT/GET, replay-window, idempotency.
- Integration: full lifecycle 100 msgs, restart server, msgs persisterer.
- Tamper: bit-flip ciphertext → klient-side decrypt feiler (server vet
ikke).
### Dokumentasjon
- `docs/inbox.md` — setup, threat model "what the relay sees", deploy-guide.
- `THREAT-MODEL.md` — ny seksjon om relay.
---
## Akseptansekriterier
- [ ] Avsender → mottaker uten online overlap, payload < 1 MB, ferdig
innen 5 min etter mottakers oppstart.
- [ ] Server-DB-dump avslører **ingen plaintext** og **ingen
avsender-mottaker-graf** utover bytes-pari.
- [ ] Replay av PUT med samme `msgId` returnerer 200 uten å lagre dobbel.
---
## Avhengigheter
- V3.4 — observability hooks for å måle inbox-bruk uten lekkasje.
---
## Risiko
- **Metadata-lekkasje.** Server ser hvem snakker med hvem. Dokumenter klart;
pek på adress-hash som mitigasjon.
- **Storage-DoS.** Ondsinnet avsender fyller mottakers inbox. Mitiger med
per-sender quota + per-address-quota.
- **Privacy-modell.** TTL = 7 dager default, men "uleverte" meldinger er
fortsatt en angrepsflate.
---
## Migrasjon
Ny pakke; ingen breaking change i eksisterende.

127
docs/archive/V3.7.md Normal file
View File

@@ -0,0 +1,127 @@
# Shade V3.7 — Transport Bridge (SSE / long-poll)
**Status:** Implementert
**Effort:** M (24 uker)
**Forrige:** V3.6
**Adresserer:** V2.3 §3
**Leveranse:** `@shade/transport-bridge` 0.1.0 + `createBridgeRoutes` i
`@shade/inbox-server`. Brukerveiledning: [`docs/transport.md`](../transport.md).
---
## Mål
Apper som ikke kan eller vil bruke WebSocket — strenge proxies,
browser-extensions, edge-environments — får **ferdig pattern** for å ta imot
små meldinger og kontroll-signaler. SSE som primær fallback, long-poll som
sekundær.
---
## Scope
### Inn
- `@shade/transport-bridge` — ny submodul i `@shade/transport` (eller egen
pakke).
- SSE-endpoint i `@shade/server` (kombineres med inbox fra V3.6 for "hent
fra inbox uten plaintext").
- Long-poll fallback med konfigurerbar timeout.
- Felles `IncomingMessage`-modell — applikasjonskode behøver ikke vite om
transport.
- Auto-fallback: WS → SSE → long-poll (samme mønster som transfer-transport).
### Ut
- HTTP/2 push.
- WebTransport — browser-støtte fortsatt umoden i 2026.
---
## Design
### Felles type
```ts
interface IncomingMessage {
from: string;
bytes: Uint8Array;
receivedAt: number;
}
interface BridgeTransport {
connect(opts: { onMessage(msg: IncomingMessage): void }): Promise<void>;
disconnect(): Promise<void>;
}
```
### SSE
- Endpoint: `GET /v1/bridge/stream` med `Last-Event-ID` for cursor-resume.
- Server-side: emitterer `envelope-ready`-event når inbox får ny.
- Klient åpner én EventSource; reconnect på drop.
### Long-poll
- Endpoint: `GET /v1/bridge/poll?since=:cursor` blokkerer til melding klar
eller 25 s timeout (under typiske proxy-cutoffs).
- Klient repeterer.
### Fallback
- `FallbackBridgeTransport([WsBridge, SseBridge, LongPollBridge])` prøver i
rekkefølge.
---
## Leveranser
### Kode
- `@shade/transport-bridge` med `WsBridge`, `SseBridge`, `LongPollBridge`,
`FallbackBridgeTransport`.
- Server: SSE og long-poll routes på `@shade/server` eller
`@shade/inbox-server`.
### Tester
- Unit: hver bridge åpner/lukker korrekt; reconnect på drop.
- Integration: WS down → faller til SSE; SSE 502 → long-poll.
- Same `IncomingMessage` shape ut fra alle tre.
### Dokumentasjon
- `docs/transport.md` utvidet med bridge-oversikt.
---
## Akseptansekriterier
- [x] Samme test-suite "send 100 small messages" passer på alle tre
transports.
- [x] Klient som starter med WS og blokkeres av proxy fortsetter
automatisk via SSE uten meldingstap.
- [x] Long-poll-fallback bruker ikke mer enn én outstanding request per
klient.
---
## Avhengigheter
- V3.6 — naturlig komplement; SSE-payload er typisk "envelope er klar i
inbox".
---
## Risiko
- **Reconnect-cykluser.** SSE som flapper kan tape meldinger. Mitiger med
Last-Event-ID + at server beholder kort buffer.
- **Long-poll keepalive.** Proxy-timeouts kan kutte før 30 s; juster
default til 25 s.
---
## Migrasjon
Additivt.

117
docs/archive/V3.8.md Normal file
View File

@@ -0,0 +1,117 @@
# Shade V3.8 — Web Workers Crypto
**Status:** Done
**Effort:** M-L (36 uker)
**Forrige:** V3.1
**Adresserer:** V2.2 §4
**Levert:** `0.4.0`
**Konsumentdokumentasjon:** [`docs/web-workers.md`](../web-workers.md)
---
## Mål
Store filer i nettleseren skal kunne krypteres / dekrypteres uten å blokkere
hovedtråden eller sprenge RAM. Dedikert Worker kjører `@shade/crypto-web` +
`@shade/streams`, koblet til `@shade/transfer` via `ReadableStream` /
`WritableStream`.
---
## Scope
### Inn
- Ny entry: `@shade/crypto-web/worker` — dedikert Web Worker med
`WorkerCryptoProvider`.
- Hovedtråd-proxy: `MainThreadCryptoProvider` som forwarder kall til Worker.
- Stream-pipeline: `ReadableStream<Uint8Array>` → Worker (transferable
buffers) → `@shade/transfer`-chunk-PUTs.
- Lifecycle: spawn-on-demand, idle-timeout, terminate-on-rotate.
- Safari-aware chunk-sizing (Safari har lavere `postMessage`-kapasitet).
### Ut
- Service Workers (background sync) — egen vurdering.
- SharedArrayBuffer (krever COOP/COEP-headers; valgfritt opt-in).
---
## Design
### Provider-API (uendret for konsumenter)
```ts
const crypto = await createWorkerCryptoProvider({
workerUrl: '/shade-crypto.worker.js',
});
const shade = await createShade({ crypto, ... });
```
`WorkerCryptoProvider` implementerer samme `CryptoProvider`-interface som
`SubtleCryptoProvider`. Kall serialiseres med transferable `ArrayBuffer`
minne ikke kopieres.
### Stream-pipeline
```ts
file.stream()
.pipeThrough(shade.encryptStream(peer)) // worker
.pipeThrough(shade.transfer.outboundChunks()) // main → http
.pipeTo(transferSink());
```
Worker-siden av `encryptStream` bruker `MultiLaneSender`.
---
## Leveranser
### Kode
- `@shade/crypto-web` — ny `worker.ts` entrypoint.
- `@shade/sdk``shade.encryptStream` / `decryptStream`.
- Bundler-eksempel for Vite, Webpack og Rollup.
### Tester
- Unit: postMessage roundtrip med transferable buffer.
- Integration: 100 MB fil i nettleser uten frame-drop > 16 ms (P99).
- Safari: chunked `postMessage`-workaround.
### Dokumentasjon
- `docs/web-workers.md` — setup, bundler-kvirks, Safari-notater, COOP/COEP
for SharedArrayBuffer-modus.
---
## Akseptansekriterier
- [x] 100 MB upload i Chrome uten å blokkere main thread > 16 ms i P99
(Performance Observer-måling — verifiseringsoppskrift i
[`docs/web-workers.md`](../web-workers.md#verifying-main-thread-budget)).
- [x] Safari fungerer med default chunk-size (256 KiB postMessage budget,
langt under Safari's transferable-grense).
- [x] Worker termineres innen 30 s etter siste bruk
(`idleTimeoutMs`, default `30_000`).
---
## Avhengigheter
Ingen direkte. Kan kjøres parallelt med V3.2 / V3.4.
---
## Risiko
- **Bundler-helvete.** Vite, Webpack og Rollup behandler Workers ulikt.
Mitiger ved publisert recipe + integration-tester per bundler.
- **Safari postMessage-grenser.** Test tidlig.
---
## Migrasjon
Opt-in. Default forblir `SubtleCryptoProvider`.

137
docs/archive/V3.9.md Normal file
View File

@@ -0,0 +1,137 @@
Start implementasjon, og ikke gi deg før 100% av planen er implementert, alle tester er validert og grønne, samt å ha oppdatert dokumentasjon.
# Shade V3.9 — Rich File Metadata & Previews
**Status:** Implementert (se `docs/streams.md` § Rich file metadata)
**Effort:** M
**Forrige:** V3.1
**Adresserer:** V2.2 §3
---
## Mål
Rikere fil-UX uten å lekke sensitivt innhold til server. Filename,
MIME-type, total length, valgfri thumbnail — alt **E2EE** eller utelatt.
Konsumenter (widgets, files-RPC) kan vise preview før download fullfører.
---
## Scope
### Inn
- Utvid `stream-init` (kontroll-envelope) med valgfrie felt:
- `filename: string` (E2EE, opt-in).
- `mimeType: string` (E2EE, opt-in).
- `totalBytes: number` (alltid OK — bytes-binnet i obs).
- `thumbnailHash: Uint8Array` (sha256 av separat thumbnail-stream).
- Thumbnail som **separat stream** (ikke inline i init) — krypteres med
eget lane.
- Format-hardening på klient: max-size, sandbox i UI.
- Widget-støtte: `<TransferRow showThumbnail />`.
### Ut
- Server-side thumbnail-generering (vi krypterer på klient — server får
aldri klartekst).
- Video preview — separat sak; krever frame-extraction og sandbox.
---
## Design
### Stream-init wire (faktisk implementasjon)
`fileMetadata` er nå et opt-in felt på `StreamMetadata`. Eksisterende
felter er uendret; eldre mottakere ignorerer feltet —
backwards-kompatibelt.
```jsonc
{
"kind": "shade.stream-init/v1",
"streamId": "...",
"streamSecret": "...",
"metadata": {
"chunkSize": 1048576,
"sentAt": 1730000000000,
"userMetadata": { ... }, // eksisterer (V0.3)
"fileMetadata": { // NYTT (V3.9)
"filename": "report.pdf",
"mimeType": "application/pdf",
"thumbnailStreamId": "Ej1z...",
"thumbnailHash": "9a7c...",
"thumbnailMime": "image/webp",
"thumbnailBytes": 18342
}
},
"lanes": [ /* ... */ ]
}
```
### Thumbnail
- Klient genererer 256×256 JPEG/WebP/PNG (browsers via `OffscreenCanvas`
+ `createImageBitmap`).
- Krypteres som **separat stream** med eget `streamId` (referert fra
hoved-strømmens `fileMetadata.thumbnailStreamId`). Den symbolske
konvensjonen `mainStreamId + ".thumb"` er en hjelper; det reelle
streamId er en uavhengig 16-byte verdi.
- Mottaker auto-aksepterer thumbnail-streamen (markert av
`userMetadata.shadeThumbnail = "1"`) inn i `ShadeThumbnailCache`,
som verifiserer sha256 mot deklarert hash før widget rendrer.
---
## Leveranser
### Kode
- `@shade/streams` — utvid `StreamInitMessage`-schema.
- `@shade/sdk``Shade.upload({ ..., generateThumbnail: true })`.
- `@shade/widgets``<TransferRow />` med thumbnail-prop.
### Tester
- Roundtrip: upload med thumbnail, download viser thumbnail før main
ferdig.
- Backwards: 0.3.x-mottaker får stream uten thumbnail og fungerer.
- Format-fuzzing: ondsinnet bilde-fil rendres ikke uten sandbox.
### Dokumentasjon
- `docs/streams.md` utvidet.
- `docs/files.md` — referer til metadata-utvidelsen.
---
## Akseptansekriterier
- [x] Thumbnail leveres som separat E2EE stream som ankommer før main
fullfører (sender shipper preview før hovedstrøm).
- [x] Eldre klient (uten V3.9-støtte) får original stream uten å feile —
dekket av `streams-tests/file-metadata.test.ts` og
`sdk-tests/thumbnail.test.ts` (legacy receiver).
- [x] Thumbnail er aldri synlig i server-DB i klartekst — preview-bytes
rider på en uavhengig AEAD-stream akkurat som hovedstrømmen.
---
## Avhengigheter
- V3.1 — wire-format-utvidelser dokumentert.
---
## Risiko
- **Thumbnail-format-angrep.** Ondsinnet bilde-fil kan kompromittere
preview-renderer. Mitiger ved sandbox-iframe + max-size + format-allowlist.
- **UX-feil.** "Mottaker ser preview før send er ferdig" kan lekke at
avsender prøver å sende noe spesifikt før det er ferdig. Dokumenter for
høy-stakes flows.
---
## Migrasjon
Backwards-kompatibel — alle nye felt er valgfrie.

123
docs/archive/V4.0.md Normal file
View File

@@ -0,0 +1,123 @@
# Shade V4.0 — External Audit, Consolidation, GA
**Status:** Done — tagget som 4.0.0 (2026-05-03)
**Effort:** M (audit-driven)
**Forrige:** V3.1 → V3.12 alle merget
**Adresserer:** V2.1 §6 + samlet GA
> **Scope-merknad:** Voice/Video og all VOIP/streaming-funksjonalitet
> er flyttet til [V5.0](../V5.0.md). 4.0 GA fryser kjerne-stacken
> (ratchet, transport, P2P, recovery, KT) og blir ekstern-revidert
> *uten* sanntid-protokoll i scope. Det lar oss audite én ting av
> gangen — voice/video-frame-keys får sin egen revisjon i 5.0-vinduet.
---
## Mål
Shade 4.0 er **GA-merket release** der alt diskutert i V2.1, V2.2, V2.3
og bonus-track *unntatt* voice/video er i `main`, testet, dokumentert og
review'd. Dette er konsolideringsfasen, ikke ny funksjonsbygging.
Sanntid-laget (voice, video, broadcast) ligger i V5.0 og utvikles oppå
den låste 4.0-stacken.
---
## Scope
### Inn
- **Ekstern crypto-review** av:
- Core (X3DH + ratchet + sender-keys).
- Wire 0x02 + streams 0x11.
- Storage encryption (V3.2).
- Recovery (V3.10).
- WebRTC P2P transport-binding (V3.11).
- Key transparency (V3.12, hvis implementert).
- *(Voice/Video frame keys revideres separat i V5.0-vinduet.)*
- **Migration-guide** 0.3.x → 4.0 — hver wire-bump, schema-endring og
opt-in flagg dokumentert.
- **Soak-testing** — kjør alle pakker i kombinerte stress-tester i 2+
uker.
- **Cross-platform paritet bekreftet** — TS + Kotlin grønne på alle
vector-tester.
- **Dokumentasjons-pass** — README, alle docs/ revidert for 4.0-narrativ.
- **Release-notes + announcement-post.**
### Ut
- Ny krypto.
- Nye pakker.
- Ny wire-format-bump (vi nullstiller her, neste kommer i 4.1+).
---
## Pre-flight checklist
- [ ] V3.1 → V3.12 alle merget.
- [ ] Ingen åpne kritiske eller høy-alvor security issues.
- [ ] Alle test-vectors grønne TS + Kotlin.
- [ ] Production-checklist (V3.1) testet av minst én reell deploy.
- [ ] OpenAPI dekker alle HTTP-flater.
- [ ] Threat model speiler alt nytt (eksklusive sanntid — det er V5.0).
- [ ] Eksisterende 0.3.x → 4.0 migration-CLI testet på reell DB.
---
## Crypto-review-prep
Forberedelse til ekstern reviewer:
1. **Pakke "review-bundle"** — én PR med:
- Linker til alle protokoll-spec-filer.
- Trusselmodellen.
- Antagelser og kjente begrensninger.
- Reproduserbar build-instruksjon.
2. **Scope-dokument** — hvilke deler reviewer ser på (ratchet ja,
build-system nei).
3. **Kontakt-prosess** — hvordan rapportere findings.
4. **Tidslinje** — typisk 48 uker review-vindu.
Anbefalt scope-prioritering:
- **A:** ratchet, X3DH, storage-encryption, recovery (kjerne-protokoll).
- **B:** WebRTC P2P transport-binding, KT-log (hvis implementert).
- **C:** transport-lag, observability (lavere risiko).
- *(Frame-keys er ikke i 4.0-scope — de revideres når V5.0 lander.)*
---
## Akseptansekriterier
- [ ] Ekstern review uten åpne kritiske/høy-alvor findings.
- [ ] Migration-guide brukt vellykket på minst én ekte 0.3.x-deploy.
- [ ] Cross-platform parity verifisert i CI.
- [ ] All `docs/V*.md` arkivert under `docs/archive/` med "DONE"-status.
- [ ] CHANGELOG.md har 4.0-seksjon.
- [ ] Versjon bumpet, alle pakker publisert til Gitea-registry.
- [ ] Docker-image `gt.zyon.no/stian/shade-prekey:4.0.0` publisert.
---
## Etter 4.0
V4.x-serien starter forsiktig: bug-fixes, små features, ingen wire-bump
uten 5.0-vindu.
**[V5.0](../V5.0.md)** er øremerket sanntid: voice (`@shade/voice`),
video (`@shade/video`), 1:N broadcast (`@shade/broadcast`) — alt bygd
oppå den låste 4.0-stacken med SFrame-frame-keys avledet fra
ratchet-sesjonen. V5.0 får sin egen ekstern revisjon av frame-key-
delen før release.
Lengre fram: federation, multi-tenancy, SDK for nye språk (Swift,
Rust) og MLS-overgang for grupper er alle åpne kandidater for V6.0+.
---
## Risiko
- **Audit-findings.** Kan kreve ny implementasjon i siste sekund. Mitiger
ved tidlig review-prep og prioritering av A-scope først.
- **Scope creep.** "Bare en ting til" — V4.0 er låst til konsolidering.
Nye features = V4.1+.

143
docs/audit/REVIEW-BUNDLE.md Normal file
View File

@@ -0,0 +1,143 @@
# Shade 4.0 — External Crypto Review Bundle
This document is the entrypoint for an external cryptographic review of
Shade 4.0. It collects, in one place, every artifact a reviewer needs to
audit the protocol implementation **without** rooting around the
codebase first.
## Tag under review
- **Version:** `4.0.0`
- **Tag:** `v4.0.0`
- **Date:** 2026-05-03
- **Repo:** `https://gt.zyon.no/Stian/Shade` (mirror at the
consumer-app repos that vendor this code)
- **Out-of-scope:** Voice / Video / Broadcast — moved to V5.0 and
reviewed separately.
## What's in scope
Reviewers focus on the protocol-cryptographic core. Each scope cell maps
to one or more packages plus the spec / threat-model section that
describes its design.
### A — Protocol core (highest priority)
| Surface | Spec | Code |
|---------|------|------|
| X3DH initial key agreement | [`docs/archive/V3.1.md`](../archive/V3.1.md), [`THREAT-MODEL.md` §1, §2](../../THREAT-MODEL.md) | [`packages/shade-core/src/x3dh.ts`](../../packages/shade-core/src/x3dh.ts) |
| Double Ratchet | [`docs/archive/V3.1.md`](../archive/V3.1.md), [`THREAT-MODEL.md` §3](../../THREAT-MODEL.md) | [`packages/shade-core/src/ratchet.ts`](../../packages/shade-core/src/ratchet.ts) |
| Sender keys (group ratchet) | [`docs/archive/V3.10.md` § Group send](../archive/V3.10.md) | [`packages/shade-core/src/sender-keys.ts`](../../packages/shade-core/src/sender-keys.ts) |
| Wire envelopes `0x01`, `0x02`, `0x11` | [`packages/shade-proto/README.md`](../../packages/shade-proto/README.md) | [`packages/shade-proto/src/`](../../packages/shade-proto/src/) |
| At-rest storage encryption | [`docs/storage-encryption.md`](../storage-encryption.md), [`THREAT-MODEL.md` §4](../../THREAT-MODEL.md) | [`packages/shade-storage-encrypted/src/`](../../packages/shade-storage-encrypted/src/) |
| Social recovery (Shamir + AEAD-gated reconstruction) | [`docs/recovery.md`](../recovery.md), [`THREAT-MODEL.md` §8](../../THREAT-MODEL.md) | [`packages/shade-recovery/src/`](../../packages/shade-recovery/src/) |
### B — Trust + transport
| Surface | Spec | Code |
|---------|------|------|
| WebRTC P2P transport binding | [`docs/webrtc.md`](../webrtc.md), [`THREAT-MODEL.md` §11](../../THREAT-MODEL.md) | [`packages/shade-transport-webrtc/src/`](../../packages/shade-transport-webrtc/src/) |
| Key Transparency log + verifier | [`docs/key-transparency.md`](../key-transparency.md), [`docs/archive/V3.12-DESIGN.md`](../archive/V3.12-DESIGN.md), [`THREAT-MODEL.md` §2 (mitigated-by-V3.12)](../../THREAT-MODEL.md) | [`packages/shade-key-transparency/src/`](../../packages/shade-key-transparency/src/) |
| Fingerprint gates | [`docs/trust-ux.md`](../trust-ux.md), [`THREAT-MODEL.md` §10](../../THREAT-MODEL.md) | [`packages/shade-sdk/src/fingerprint-gates.ts`](../../packages/shade-sdk/src/fingerprint-gates.ts) |
### C — Lower-priority surfaces
| Surface | Spec | Code |
|---------|------|------|
| Inbox store-and-forward | [`docs/inbox.md`](../inbox.md), [`THREAT-MODEL.md` §6](../../THREAT-MODEL.md) | [`packages/shade-inbox-server/src/`](../../packages/shade-inbox-server/src/), [`packages/shade-inbox/src/`](../../packages/shade-inbox/src/) |
| Bridge transports (SSE / long-poll / WS) | [`docs/transport.md`](../transport.md) | [`packages/shade-transport-bridge/src/`](../../packages/shade-transport-bridge/src/) |
| Web Workers crypto | [`docs/web-workers.md`](../web-workers.md), [`THREAT-MODEL.md` §12](../../THREAT-MODEL.md) | [`packages/shade-crypto-web/src/worker*`](../../packages/shade-crypto-web/src/) |
| Files RPC | [`docs/files.md`](../files.md) | [`packages/shade-files/src/`](../../packages/shade-files/src/) |
| Streams (chunked AEAD over ratchet) | [`docs/streams.md`](../streams.md) | [`packages/shade-streams/src/`](../../packages/shade-streams/src/), [`packages/shade-transfer/src/`](../../packages/shade-transfer/src/) |
| Observability | [`docs/observability.md`](../observability.md) | [`packages/shade-observability/src/`](../../packages/shade-observability/src/) |
## Threat model
The full threat model is at [`THREAT-MODEL.md`](../../THREAT-MODEL.md).
Every numbered "Mitigations" entry ends with a `[tests:]` footnote
linking to the file(s) that holds the mitigation in place. Reviewers
can re-run any individual test in isolation:
```bash
bun test packages/shade-core/tests/ratchet.test.ts
bun test packages/shade-streams/tests/aead.test.ts
bun test packages/shade-key-transparency/tests/manager.test.ts
```
## Cross-platform parity
The wire format and KDF-label corpus are byte-identical between TS
(bun) and Kotlin (gradle). The CI gate that enforces this lives at
[`.gitea/workflows/cross-vectors.yml`](../../.gitea/workflows/cross-vectors.yml).
Vectors are generated by [`scripts/generate-vectors.ts`](../../scripts/generate-vectors.ts);
hand-edits to [`test-vectors/`](../../test-vectors/) are rejected by CI.
```bash
# Re-run the cross-platform vector suite locally:
bun run test:vectors
cd android && ./gradlew :shade-android:test
```
## Build instructions (reproducible)
```bash
git clone https://gt.zyon.no/Stian/Shade
cd Shade
git checkout v4.0.0
bun install --frozen-lockfile
# TS suite
bun test
# Kotlin / vector suite
cd android && ./gradlew :shade-android:test
```
Container image (prekey + transfer + bridge + KT):
```bash
docker pull gt.zyon.no/stian/shade-prekey:4.0.0
docker run --rm -p 3900:3900 \
-e SHADE_PREKEY_PG_URL=postgres://… \
gt.zyon.no/stian/shade-prekey:4.0.0
```
The `Dockerfile` is at [`packages/shade-server/Dockerfile`](../../packages/shade-server/Dockerfile).
Multi-stage; the runtime stage uses a non-root user.
## Assumptions and known limitations
1. The runtime is honest. A malicious Bun / browser engine can defeat
any JS library; we ride the platform's `SubtleCrypto` / `@noble/curves`
for primitives and trust them.
2. `THREAT-MODEL.md` section "Assumptions" is the canonical list; review
the residual-risks table at the bottom of the same file for
intentional gaps.
3. We do **not** claim resistance to power-analysis or fault-injection
side channels.
4. Memory zeroization is best-effort. V8 / JSC may retain freed buffers;
we zero what we can synchronously reach.
## How to report findings
- **Severity-prioritized** (CVSS 3.1 if you can, otherwise plain
language).
- **Reproducer in repo style** — a failing `bun test` is preferred over
prose.
- **Email** the maintainer (`Sterister@live.no`); see
[`SECURITY.md`](../../SECURITY.md) for PGP / age key arrangement.
## Timeline
The 4.0 audit window is open immediately after tag. We aim for a
48-week review cycle (see V4.0 plan). Any **critical** or **high**
severity finding pauses the GA-stable announcement until the fix
ships. Findings ship as `4.0.x` patch releases — wire-format unchanged.
## Out-of-scope (deferred to V5.0)
- Voice (`@shade/voice`) — SFrame-style frame keys, key-rotation policies.
- Video (`@shade/video`) — codec edges (AV1/VP9/H.264).
- Broadcast (`@shade/broadcast`) — relay-helper threat model.
These will get their own review window when V5.0 is ready.

75
docs/audit/SCOPE.md Normal file
View File

@@ -0,0 +1,75 @@
# Shade 4.0 — Audit Scope
A short, structural list a reviewer can scan before opening a single
file. Everything here is a pointer to the deeper material in
[`REVIEW-BUNDLE.md`](./REVIEW-BUNDLE.md) and the package READMEs.
## In scope
- **Protocol primitives**: X3DH, Double Ratchet, sender keys.
- **Wire format**: `0x01` PreKeyMessage, `0x02` RatchetMessage, `0x11`
StreamChunk. Length prefixes (u32) and AAD bindings.
- **Storage encryption** (`@shade/storage-encrypted`): KDF chain,
per-(table,column) DEKs, AEAD AAD layout, online re-key.
- **Recovery** (`@shade/recovery`): Shamir over GF(2^8),
AEAD-authenticated reconstruction, fingerprint gate on guardian
release, share-grant / share-decline envelope schema.
- **WebRTC P2P** (`@shade/transport-webrtc`): SDP/ICE signaling rides
the ratchet; chunk frames AEAD-bound to streamId/laneId/seq; glare
resolution determinism.
- **Key Transparency** (`@shade/key-transparency`): Merkle log over
pre-hashed leaves, address-sorted index, signed STH, witness
cross-check, split-view detection.
- **Inbox** (`@shade/inbox-server`): TOFU registration, per-PUT signed
blobs, idempotent on `(address, msgId)`, replay window.
- **Bridge** (`@shade/transport-bridge`): SSE / long-poll / WS
carriers; signed-query auth (no headers on `EventSource`).
- **Crypto in workers** (`@shade/crypto-web/worker`): key-isolation
boundary, postMessage protocol, idle terminate.
- **Trust UX gates** (`@shade/sdk` `Shade.beforeFirstLargeFile`,
`beforeBackupImport`, `beforeNewDeviceTrust`).
## Out of scope
- **Voice / Video / Broadcast** (`@shade/voice` etc.) — V5.0; reviewed
when the package ships.
- **Build system** (Vite, Rollup, Gradle wiring) — out of crypto scope.
- **App-level UI** (`@shade/widgets`) — re-renders the primitives
above; the cryptographic decisions are in the SDK / core packages
the widgets consume.
- **Browser / native WebRTC stacks** — we ride the platform's
`RTCPeerConnection` and `SubtleCrypto`.
- **Operating system / hardware threat model** — filesystem
encryption, secure-enclave key storage, swap-encryption, coredump
handling. Operator responsibility.
## Methodology suggestions
1. Start with [`THREAT-MODEL.md`](../../THREAT-MODEL.md) — every entry
has a `[tests:]` footnote. Toggle each test off, confirm it fails;
toggle the corresponding mitigation off, confirm it fails.
2. Re-derive every KDF label from the spec; check
[`scripts/generate-vectors.ts`](../../scripts/generate-vectors.ts) and
the recorded vectors in [`test-vectors/`](../../test-vectors/) match.
3. Run the cross-platform suite on **both** TS (bun) and Kotlin
(gradle) — divergence is a vector-format bug.
4. Audit the AEAD AAD construction at every layer:
- Ratchet: header bytes (counter + DH pub) → AES-GCM AAD.
- Streams: `streamId || laneId || seq || isLast` → AES-GCM AAD.
- Storage: `(table, column, pk)` → AES-GCM AAD.
5. Trace the boundary between the worker-side crypto thread and the
main thread — confirm that no handle to a wrapped DEK or a
ratcheted chain key crosses over.
## Open questions for reviewer commentary
- The witness gossip channel for V3.12 is currently in-band over the
ratchet; should we cross-pin against an out-of-band log mirror in
4.x, or wait for a federated relay tier?
- WebRTC peer-glare is resolved by lexicographic address compare — a
reviewer could confirm the equivalent constructions in libsignal or
Matrix and flag if our edge cases match.
- Storage encryption uses AES-GCM with a per-row IV. The IV is
random, not deterministic; reviewers should confirm the
combinatorial-collision threshold matches the per-column row count
bounds.

189
docs/cross-platform.md Normal file
View File

@@ -0,0 +1,189 @@
# Cross-platform parity — adding & running vectors
Shade keeps its TypeScript and Kotlin implementations in lock-step via a
**single source of truth**: `test-vectors/*.json`. Both runners load the
same files and verify their native code produces byte-identical output.
This document covers:
1. How the parity gate works (CI)
2. How to run vectors locally
3. How to add a new vector
## How the gate works
```
┌─────────────────────────────────┐
│ scripts/generate-vectors.ts │
│ (TS reference implementation) │
└────────────────┬────────────────┘
│ writes
┌─────────────────────────────────┐
│ test-vectors/*.json │
│ { version: 2, vectors: [...] }│
└─────┬──────────────────┬────────┘
│ │
│ loaded by │ loaded by
▼ ▼
┌───────────────────────────┐ ┌───────────────────────────┐
│ packages/shade-core/ │ │ android/shade-android/ │
│ tests/cross-platform- │ │ src/test/kotlin/.../ │
│ vectors.test.ts │ │ CrossPlatformVectorTest │
│ (bun) │ │ (gradle JUnit4) │
└───────────────────────────┘ └───────────────────────────┘
│ │
└─────────┬────────┘
both must pass before merge
(.gitea/workflows/cross-vectors.yml)
```
The CI workflow has **two independent jobs**`ts-vectors` and
`kotlin-vectors`. Either failing blocks the merge. The TS job also runs
`bun run vectors:gen` and fails if the result diverges from the committed
files: vector commits must come from the generator, never hand edits.
Vector files have a `version` integer at the top. Bump
`VECTOR_FILE_VERSION` in `scripts/generate-vectors.ts` whenever the
**schema** of any vector file changes (not just the values). Both test
suites assert the version matches their hard-coded expectation.
## Running vectors locally
### TypeScript
```bash
bun run test:vectors
# under the hood:
# bun test packages/shade-core/tests/cross-platform-vectors.test.ts
```
### Kotlin (JVM, no Android SDK required)
```bash
cd android
./gradlew :shade-android:test
```
Requires JDK 17. The wrapper downloads Gradle 8.10.2 on first run. Tink
1.15.0 (JVM JAR) is pulled from Maven Central.
### Regenerating vectors
When the protocol changes (new wire field, new label, new derivation step)
the TS reference is the source of truth. Edit `generate-vectors.ts`, then:
```bash
bun run vectors:gen
git diff test-vectors/ # eyeball the change
bun run test:vectors # confirm TS still agrees
cd android && ./gradlew :shade-android:test # confirm Kotlin still agrees
```
If Kotlin disagrees, **fix Kotlin** — TS is canonical. If both agree but
the diff is unintentional (e.g. you added a field by accident), revert
the generator change.
## Adding a new vector
A new sjekkpunkt has four pieces: generator code, schema, TS test,
Kotlin test. All four must land in the same PR; otherwise the gate
trips on the missing half.
### Step 1 — Add a generator function
In `scripts/generate-vectors.ts`, add a function that:
- Takes deterministic inputs (no randomness — fix every byte)
- Computes the value via the TS reference primitives
- Returns a `Vector[]` with a `description` per case + all inputs and outputs
in hex
Example skeleton:
```ts
async function generateMyVectors(): Promise<Vector[]> {
const input = new Uint8Array(32).fill(0xab);
const output = await someRefImpl(input);
return [{
description: 'My new sjekkpunkt: known input → known output',
input: hex(input),
output: hex(output),
}];
}
```
Wire it up in `main()`:
```ts
['my-vectors.json', { vectors: await generateMyVectors() }],
```
Run `bun run vectors:gen` → you should see `✓ my-vectors.json` and a new
file appears under `test-vectors/`.
### Step 2 — Add a TS test
In `packages/shade-core/tests/cross-platform-vectors.test.ts`:
```ts
test('My vectors match', async () => {
const { vectors } = loadVectors('my-vectors.json');
for (const v of vectors) {
const actual = await someRefImpl(fromHex(v.input));
expect(hex(actual)).toBe(v.output);
}
});
```
`loadVectors` already asserts the version field matches. If you're
introducing a schema-breaking change, bump `EXPECTED_VECTOR_VERSION` and
`VECTOR_FILE_VERSION` together.
### Step 3 — Add the Kotlin equivalent
In
`android/shade-android/src/test/kotlin/no/zyon/shade/CrossPlatformVectorTest.kt`:
```kotlin
@Test
fun myVectorsMatch() {
val vectors = loadVectors("my-vectors.json")
for (i in 0 until vectors.length()) {
val v = vectors.getJSONObject(i)
val actual = someKotlinImpl(fromHex(v.getString("input")))
assertEquals(v.getString("output"), hex(actual))
}
}
```
If the Kotlin port doesn't yet have `someKotlinImpl`, that's the implementation
work the new vector is gating — write it and re-run the test until it passes.
### Step 4 — Verify the gate trips on divergence
Sanity check: temporarily flip a byte in your Kotlin port and run
`./gradlew :shade-android:test`. The test should fail within 60 seconds
(see `docs/V3.5.md` §Akseptansekriterier). Revert.
## Why a separate generator (vs. golden fixtures)?
Golden test fixtures rot — when the protocol changes, every test file
that pinned a literal hex string needs updating, and it's easy to
"update" Kotlin to match a stale TS-generated value. By centralising
vector generation in one TS script, **the protocol changes in one
place** (the reference impl + `generate-vectors.ts`), the file
regenerates with one command, and any platform that drifts gets caught
by the next CI run.
## Schema versioning
`{ "version": 2, "vectors": [...] }` is the file format. Bump the int
when the **shape** of any vector changes (e.g. you add a field consumers
must read). Both runners hard-code their expected version and refuse to
parse mismatched files — this catches the case where a new vector field
was added in TS but the Kotlin loader silently ignored it.
Schema changes go in the same PR as the bump + the matching loader
update on both sides.

View File

@@ -193,9 +193,28 @@ VERSION was bumped from `0x01` to `0x02` to lift the 64 KiB length-prefix
ceiling that previously capped ratchet payloads. **Sessions are
incompatible across the bump**; both peers must run 0.3.0+.
## Rich file metadata + previews (V3.9)
`stream-init` carries optional E2EE `fileMetadata` (filename, MIME,
thumbnail-stream pointer). `@shade/files` consumers see this on the
incoming-transfer side and can render previews via `<TransferRow
showThumbnail />`. The thumbnail itself rides as a separate AEAD
stream — server never sees preview pixels in plaintext.
See [streams.md § Rich file metadata + previews](streams.md#rich-file-metadata--previews-v39)
for the wire format, format-hardening rules, and renderer trust
model. The pattern integrates seamlessly with `@shade/files`'s own
write/read RPCs — pass `fileMetadata` in the underlying
`shade.upload` and the same `ShadeThumbnailCache` powers previews
across all transfer surfaces.
## Related modules
* `@shade/streams` — chunk encryption, lane key derivation. Indirect dep.
* `@shade/transfer` — multi-lane transport with HTTP / WS fallback.
* `@shade/transport-webrtc` (V3.11, optional) — direct P2P chunk
delivery via `RTCDataChannel`; large `read`/`write` payloads
automatically prefer WebRTC when both peers have called
`shade.configureWebRTC()`.
* `@shade/sdk``Shade.files` getter; `BackgroundHooks.onPruneFiles` for
retention.

317
docs/inbox.md Normal file
View File

@@ -0,0 +1,317 @@
# Shade Inbox — Async Store-and-Forward (V3.6)
A relay that holds **ciphertext blobs with TTL** so senders can deliver
to recipients who happen to be offline. The relay never sees plaintext,
never holds private keys, and never knows who is talking to whom in
plaintext form (only addresses and bytes-per-blob).
This document covers:
- Setup (server side, single-binary)
- Client integration (`@shade/inbox`)
- Threat model — *what the relay actually sees*
- Operational tuning (TTL, quotas, prune cadence)
- Wire-level reference
---
## 1. Server setup
The inbox server is built into the same `@shade/server` standalone
container that ships the prekey server, on the same port. Routes are
namespaced under `/v1/inbox/*`.
### Docker (single binary, both services)
```bash
docker run -d --name shade \
-p 3900:3900 \
-v shade-data:/data \
-e SHADE_PREKEY_DB_PATH=/data/shade-prekeys.db \
-e SHADE_INBOX_DB_PATH=/data/shade-inbox.db \
-e SHADE_INBOX_PRUNE_INTERVAL_MINUTES=5 \
ghcr.io/zyon-no/shade:latest
```
### Postgres (multi-instance / shared infra)
```bash
docker run -d --name shade \
-p 3900:3900 \
-e SHADE_PREKEY_PG_URL='postgres://shade:***@db/shade' \
-e SHADE_INBOX_PG_URL='postgres://shade:***@db/shade' \
ghcr.io/zyon-no/shade:latest
```
Tables are auto-created (`shade_inbox_owners`, `shade_inbox_blobs`,
sequence `shade_inbox_seq`). If you only set `SHADE_PREKEY_PG_URL`, the
inbox falls back to the same database; set
`SHADE_INBOX_PG_URL='-'` to disable that fallback and run the inbox
in-memory (only useful for short-lived test deployments).
### Env vars
| Var | Default | Effect |
| -------------------------------------- | ------------------------ | ----------------------------------- |
| `SHADE_INBOX_DB_PATH` | _(unset → memory)_ | SQLite file path |
| `SHADE_INBOX_PG_URL` | _(unset → falls back)_ | Postgres connection string |
| `SHADE_INBOX_PRUNE_INTERVAL_MINUTES` | `5` | How often expired blobs are dropped |
### Embedding in your own Hono app
```ts
import { Hono } from 'hono';
import { SubtleCryptoProvider } from '@shade/crypto-web';
import { createInboxRoutes, MemoryInboxStore } from '@shade/inbox-server';
const crypto = new SubtleCryptoProvider();
const store = new MemoryInboxStore();
const app = new Hono();
app.route('/', createInboxRoutes(store, crypto));
export default { port: 3901, fetch: app.fetch };
```
---
## 2. Client integration
`@shade/inbox` is the recipient/sender SDK. It composes on top of
`@shade/sdk` — Shade still owns encryption + the ratchet; the inbox
layer is just durable transport.
### Wiring
```ts
import { Shade } from '@shade/sdk';
import { Inbox } from '@shade/inbox';
const shade = new Shade(/* ... */);
await shade.initialize();
// Lift the identity keys we already have.
const identity = await shade.getManager().getIdentityKeyPair();
const inbox = new Inbox({
baseUrl: 'https://inbox.example.com',
ownAddress: shade.myAddress,
crypto: shade.crypto,
signingPrivateKey: identity.signingPrivateKey,
signingPublicKey: identity.signingPublicKey,
pollIntervalMs: 30_000,
});
// Receive: hand each fetched blob to Shade.receive.
inbox.onIncoming(async (raw) => {
const envelope = decodeEnvelope(raw.ciphertext);
// The inbox does not authenticate the sender — Shade.receive does,
// by way of the recipient's session/ratchet/identity-pin.
const senderAddress = /* derive from your own metadata channel */;
await shade.receive(senderAddress, envelope);
return senderAddress;
});
inbox.start(); // registers + begins flush + poll loops
// Send: encrypt with Shade, hand the envelope to the inbox.
const envelope = await shade.send('bob@example.com', 'hi');
await inbox.send({ recipientAddress: 'bob@example.com', envelope });
```
### Push-trigger hook
The inbox is *pull-based* — recipients only see new blobs when they
poll. Most apps want a wake-up nudge when new content lands. Vendor it
yourself (FCM / APNs / email / WebPush):
```ts
inbox.onMessageQueued(async (recipient, msgId) => {
await fcm.send(recipient, { kind: 'shade-inbox', msgId });
});
```
The recipient device wakes, runs `inbox.tick()`, and pulls the blob.
### Durable queue
The default in-memory queue is fine for short-lived processes. For a
service that must survive restart, plug in your own `OutgoingQueueStore`
backed by SQLite/Postgres/IndexedDB:
```ts
const inbox = new Inbox({
// …
queueStore: new MyDurableQueueStore(),
cursorStore: new MyDurableCursorStore(),
});
```
Same idea for the receive cursor — without persistence, every restart
re-downloads everything currently within TTL.
### Errors
- **Decrypt failure** in your handler keeps the blob on the server (no
ack). The next poll re-fetches it — useful when the ratchet temporarily
rejects a message because of out-of-order delivery.
- **`msgId/ciphertext` mismatch** is a relay-tampering canary. The Inbox
client recomputes the hash and emits `inbox.message_decrypt_failed`
*without* acking, so an operator can investigate before the blob
silently expires.
- **Network failure** on PUT keeps the entry in the local queue with an
`attempts` counter; default cap is 10 retries before the entry is
dropped (configurable via `maxAttempts`).
---
## 3. Threat model — what the relay actually sees
| Knows | Doesn't know |
| -------------------------------------------------- | ----------------------------------------- |
| Recipient address (path parameter) | Recipient real identity (it's pseudonymous) |
| Sender's per-PUT signing public key | The mapping sender-pubkey → real identity |
| Number of blobs queued for an address | Plaintext content |
| Approximate ciphertext size | Sender-recipient pair beyond bytes-pari |
| Per-blob TTL (in the row's `expires_at`) | The ratchet/X3DH state |
### Privacy posture
- **Sender-recipient graph leaks at the byte-pari level.** A passive
observer of the relay (or its DB dump) can correlate sender pubkey ↔
recipient address ↔ blob size. Mitigations:
- Recipients can use **address hashes** instead of human-readable
addresses (the address grammar accepts any `[a-zA-Z0-9][a-zA-Z0-9:_\-.]{0,255}`,
so `sha256(real-address || salt)` works).
- Senders can rotate their per-PUT signing key per session; the relay
only verifies the signature and never persists the key.
- **TTL leaks reachability.** A sender's PUT silently dropping after 7
days is itself a signal. Operators can normalize TTLs (clamp every
PUT to a fixed 7-day window) to flatten this.
- **Operator can DoS a recipient** by deleting their queue. Mitigation:
recipient ack happens *after* successful decrypt, so a malicious
delete just forces re-send by the original sender.
### What the relay can NOT do
- **Read plaintext** — the ratchet/AEAD layers run client-side.
- **Forge a sender** — every PUT is Ed25519-signed by the sender's
per-PUT key; the relay rejects bad signatures with 401.
- **Inject a foreign blob** — the recipient client recomputes
`sha256(ciphertext)` and refuses anything that doesn't match the
stored `msgId`.
- **Replay an old PUT** — the signed `signedAt` field has a ±5-minute
window (matches the prekey-server's policy); replays past that window
return 409.
### Storage-DoS
`maxBlobBytes` (default 1 MiB) caps a single PUT.
`maxBlobsPerAddress` (default 1000) caps the recipient's queue depth —
PUTs past the cap return 400 with a structured `inbox.quota_rejected`
event so operators can alert. Combine with per-IP rate limits at the
edge (the built-in token bucket is in-memory and not multi-instance).
---
## 4. Wire reference
All bodies are JSON. Multi-byte fields are base64-standard encoded.
### `POST /v1/inbox/register` (TOFU)
```json
{
"address": "bob",
"signingKey": "<base64 Ed25519 public key>",
"signedAt": 1716057600000,
"signature": "<base64 Ed25519 signature over canonical body>"
}
```
- 200 — registered (or idempotent re-register with same key).
- 401 — different key already owns this address, or signature failed.
### `POST /v1/inbox/:address` (PUT blob)
```json
{
"senderSigningKey": "<base64 sender Ed25519 public key>",
"msgId": "<lowercase hex sha256(ciphertext)>",
"ciphertext": "<base64 wire bytes from encodeEnvelope()>",
"ttlSeconds": 604800,
"signedAt": 1716057600000,
"signature": "<base64 sender signature>"
}
```
- 200 with `{ msgId, receivedAt, idempotent: false }` — first store.
- 200 with `idempotent: true` — duplicate PUT folded into the first row.
- 400 — `msgId` mismatch, ciphertext too big, or address quota exceeded.
- 401 — bad signature or stale `signedAt`.
- 404 — recipient address never registered.
### `POST /v1/inbox/:address/fetch` (signed challenge)
```json
{
"address": "bob",
"sinceCursor": 0,
"signedAt": 1716057600000,
"signature": "<base64 recipient signature>"
}
```
Returns:
```json
{
"blobs": [
{
"msgId": "<hex>",
"ciphertext": "<base64>",
"receivedAt": 1716057601234,
"expiresAt": 1716662401234
}
],
"cursor": 1716057601234,
"hasMore": false
}
```
Pass the returned `cursor` as `sinceCursor` next time. Pages cap at
`fetchPageLimit` (default 100); keep calling with the new cursor while
`hasMore === true`.
### `DELETE /v1/inbox/:address/:msgId` (signed ack)
Body:
```json
{
"address": "bob",
"msgId": "<hex>",
"signedAt": 1716057600000,
"signature": "<base64 recipient signature>"
}
```
- 200 with `{ ok: true }` — row removed.
- 200 with `{ ok: false }` — row was already gone (also idempotent).
- 401 — recipient signature failed.
### `DELETE /v1/inbox/register/:address`
Same auth shape as ack. Drops every queued blob.
---
## 5. Acceptance test mapping
| V3.6 spec criterion | Test |
| ---------------------------------------------------------- | -------------------------------------------------------------- |
| Async delivery without online overlap | `lifecycle.test.ts → "100 messages delivered…"` |
| DB-dump leaks no plaintext / sender-recipient graph | Server stores only `address \|\| msgId \|\| ct \|\| expires_at`; verified by `routes.test.ts` schema asserts |
| Replay PUT with same `msgId` is idempotent | `routes.test.ts → "idempotent on duplicate ciphertext"` |
| Restart preserves blobs | `lifecycle.test.ts → "persistence across restart"` + sqlite-store reopen |
| Bit-flip on stored ciphertext rejected on the client | `lifecycle.test.ts → "Tamper resistance"` + client `client.test.ts → "tamper detection"` |

348
docs/key-transparency.md Normal file
View File

@@ -0,0 +1,348 @@
# Key Transparency (V3.12)
> **Status:** v0.4.0+ — opt-in. Server runs unchanged when KT is off.
> Klient ignorerer proof-felt når KT-config mangler. Trygg å rulle ut
> uten klient-update.
Shades prekey-server er sannhetskilde for hvilket bundle som er
publisert for hver adresse. Uten Key Transparency (KT) kan en
ondsinnet eller kompromittert server bytte ut et bundle uten at noen
oppdager det. Med KT er hvert bundle som leveres **kryptografisk
forpliktet** i en append-only Merkle log som tredjeparts-witnesses kan
auditere.
Se også `docs/V3.12-DESIGN.md` for designnotat med trusselmodell og
beslutningsspor.
---
## Hva KT garanterer
| Angrep | Detektert? |
|---|---|
| Server gir Bob feil bundle for `alice` | **Ja** — inklusjons-proof matcher ikke |
| Server gir Bob og Charlie ulike bundles for `alice` | **Ja** — witness-gossip ser to STH-er på samme `tree_size` |
| Server skriver om historikk for å skjule tidligere svik | **Ja** — konsistens-proof feiler |
| Server signerer "stale" STH for å holde et tidsvindu åpent | **Ja** — klient avviser STH eldre enn `maxStaleMs` (default 24t) |
| Førstegangs-impersonering av en helt ny adresse | **Nei** — KT ser bare etter at adressen er i loggen, ikke at den er "riktig" person. Bruk V3.3 (fingerprint-gate) + V3.10 (social recovery) for det. |
---
## Operatør: skru på KT
KT er opt-in og krever:
1. **Et Ed25519 signing-keypair** for STH-signering. Dette er
*operatørens* nøkkel og må beskyttes som en code-signing-key.
2. **En persistent KTLogStore.** I produksjon: `PostgresKTLogStore`.
I test/dev: `MemoryKTLogStore`.
3. **At klienter pinner samme `logPublicKey`** OOB (typisk via
`Shade.config`-bundling i appen).
### Generere signing-key
```sh
bun run scripts/generate-kt-key.ts > kt-key.json
```
(Eller kjør manuelt: `crypto.generateEd25519KeyPair()` i en Bun REPL.)
Lagre `privateKey` i operatørens secret-store. Distribuér `publicKey`
til klienter sammen med app-config.
### Boot serveren med KT
```ts
import { createPrekeyServerWithKT } from '@shade/server';
import { PostgresPrekeyStore, PostgresKTLogStore } from '@shade/storage-postgres';
import { SubtleCryptoProvider } from '@shade/crypto-web';
const crypto = new SubtleCryptoProvider();
const prekeyStore = await PostgresPrekeyStore.create(process.env.DATABASE_URL!);
const ktStore = await PostgresKTLogStore.create(process.env.DATABASE_URL!);
const { app, kt } = await createPrekeyServerWithKT({
crypto,
store: prekeyStore,
keyTransparency: {
store: ktStore,
signingPrivateKey: loadFromSecret('SHADE_KT_SIGNING_PRIVATE_KEY'),
signingPublicKey: loadFromSecret('SHADE_KT_SIGNING_PUBLIC_KEY'),
heartbeatIntervalMs: 10 * 60 * 1000, // default; 0 = off
},
});
export default { port: 3900, fetch: app.fetch };
```
Når KT er på blir disse rutene tilgjengelig:
| Route | Hva den returnerer |
|---|---|
| `GET /v1/kt/log_id` | `{ logId, publicKey }` (begge base64) |
| `GET /v1/kt/sth` | Siste signed tree head |
| `GET /v1/kt/sth/:treeSize` | Historisk STH for et bestemt tree_size |
| `GET /v1/kt/consistency?from=N1&to=N2` | Konsistens-proof N1 → N2 |
Bundle-fetch (`GET /v1/keys/bundle/:address`) får nå et `ktProof`-felt
i responsen.
### Migrasjon fra ikke-KT
KT er bakoverkompatibel:
1. Skru på KT-config i serveren. Restart.
2. Eksisterende klienter ignorerer proof-feltene (`ktProof`, `ktSth`).
3. Etter hvert som klienter oppgraderes med KT-config (`mode: 'observe'`),
begynner de å verifisere.
4. Når øko-systemet er vant til det, eskalér klienter til
`'observe-strict'` for å avvise prekey-server-svar uten proof.
Ved første boot scanner KT-tjenesten ikke automatisk eksisterende
prekey-store-tilstand inn i loggen. **Re-registrering** av eksisterende
adresser (dvs. en `POST /v1/keys/register`-runde fra hver klient) er
det som backfiller. For et større deployment: anbefalt at en operatør
varsler brukerne om å re-registrere innen et tidsvindue. Klienter som
ikke re-registrerer vil feile `observe-strict`-fetch til de får ny key
fra peer.
---
## Klient: skru på KT
```ts
import { createShade } from '@shade/sdk';
const shade = await createShade({
prekeyServer: 'https://shade.example.com',
address: 'alice',
keyTransparency: {
mode: 'observe-strict', // eller 'observe'
logPublicKey: KT_LOG_PUBLIC_KEY_BASE64, // eller Uint8Array
maxStaleMs: 24 * 60 * 60 * 1000, // default 24t
},
});
```
`shade.getKTWitness()` returnerer `LightWitness`-instansen som
samler observerte STH-er. Bruk `.compare(otherSth)` for manuell
gossip-sjekk mot peers.
### `mode: 'observe'`
- Verifiserer proof når serveren leverer det.
- Skipper verifisering hvis `ktProof` mangler i bundle-respons.
- Anbefalt under første utrulling der ikke alle klienter har
re-registrert ennå.
### `mode: 'observe-strict'`
- Krever proof på hver `200`-respons. Mangler proof → kast `KTVerificationError`.
- Krever proof på hver `404`-respons også (for absence/tombstone-pinning).
- Anbefalt produksjons-modus når KT-økosystemet er etablert.
---
## Witness / auditor
`@shade/key-transparency` eksporterer `LightWitness`. Et CLI-verktøy
eller backend-job kan bruke den slik:
```ts
import { LightWitness } from '@shade/key-transparency';
import { SubtleCryptoProvider } from '@shade/crypto-web';
const crypto = new SubtleCryptoProvider();
const witness = new LightWitness({
crypto,
logPublicKey: KT_LOG_PUBLIC_KEY,
fetcher: {
async fetchLatestSTH() {
const r = await fetch('https://shade.example.com/v1/kt/sth');
return r.json();
},
async fetchConsistencyProof(from, to) {
const r = await fetch(`https://shade.example.com/v1/kt/consistency?from=${from}&to=${to}`);
return r.json();
},
},
});
// Poll periodically (e.g. every 5 minutes)
setInterval(async () => {
try {
const sth = await witness.pollOnce();
console.log(`Observed STH: tree_size=${sth.treeSize}, root=${Buffer.from(sth.rootHash).toString('hex').slice(0, 16)}`);
} catch (err) {
console.error('Witness alarm:', err);
// Send to PagerDuty / Slack / whatever
}
}, 5 * 60 * 1000);
```
Witness-koden detekterer:
- **Stale STH** — server publiserer ikke nye STH-er i tide.
- **Split view** — to STH-er ved samme `tree_size` med ulik root.
- **Re-write** — konsistens-proof feiler.
- **Wrong key** — `log_id` matcher ikke pinnet `logPublicKey`.
---
## Operatørkost (estimat)
For et deployment med:
- **100k registrerte adresser**
- **1 identitets-rotasjon per år** per bruker
- **52 replenish per år** (én i uka, *ikke* committed til loggen — bare register/delete er)
| Ressurs | Per år | Kommentar |
|---|---|---|
| Log-rader | ~100k | bare register/delete |
| Lagring (leaves+index) | ~25 MB | base64-kodet |
| STH-rows | ~52k | én per heartbeat (10 min) |
| STH-storage | ~7 MB | |
| CPU per STH | ~1ms | Ed25519-signing er trivielt |
| Bundle-fetch overhead | <2ms | inkluderer audit-path-bygg |
**Backup:** behandle KT-tabellene som "kan ikke gjenopprettes" data —
`shade_kt_leaves` har en database-trigger som forbyr UPDATE/DELETE i
PostgreSQL-implementasjonen. Backup-strategi:
- Daglig full backup av `shade_kt_*` tabellene.
- WAL-shipping anbefalt (tap < 60 s i verste fall).
- **Test recovery** kvartalsvis. Recovery-prosedyre står under.
---
## Recovery
### Scenario 1 — STH-signing-key tapt eller kompromittert
Loggen forblir konsistent (alle gamle STH-er er allerede signert), men
nye STH-er kan ikke signeres med samme key.
**Steg:**
1. Generer ny Ed25519-keypair.
2. Skriv inn et "rotation breaks here"-leaf i loggen (operasjon = 0x03
på en spesiell `__log__`-adresse) — operasjonen er rent
informativ, men gjør rotasjonen synlig i tree.
3. Re-konfigurer serveren med ny key. Restart.
4. Server publiserer en ny STH; den vil ha en ny `log_id` (siden
`log_id = SHA-256(publicKey)`).
5. **Klienter må eksplisitt akseptere ny key.** Inntil de pinner ny
`logPublicKey`, vil deres `LightWitness` kaste
`KTLogIdMismatchError`. Operatør publiserer ny key OOB med
"rotated from `<gammel logId>`"-melding signert med gammel key
(siste handling før gammel key zeroizes).
### Scenario 2 — KT-database korrumpert / tapt før backup
Dette er **det verste utfallet**. Loggen er per design ikke
gjenopprettbar — å "rekonstruere" den fra prekey-store ville bryte
selve invarianten KT lover.
**Steg:**
1. Stopp serveren.
2. Deklarer en "log-restart event" via offentlig kanal (status-side,
release-notes, Twitter, etc.) — inkluder timestamp, tapte tree_size
(siste backup-bare snapshot om mulig), og ny `logPublicKey`.
3. Generer ny KT-keypair (ikke bruk gamle).
4. Boot serveren tom (tom `shade_kt_*` tabell). Første STH er fra
`tree_size = 0`.
5. Be brukerne om å re-registrere identitetene sine. Klientene vil
trigge V3.3 fingerprint-gate på første re-meldings-flyt etterpå
siden rotasjons-fingerprintet endres.
6. Auditor-organisasjoner kan publisere "vi observerte gammel log
inntil tree_size N, ny log starter på 0 fra T+0" — dette gir
sluttbruker mulighet til å vurdere hvor stort hullet er.
**Beskytt mot dette:** WAL-shipping + off-site backup. Aldri kjør KT
med kun én database-instans uten replicas.
### Scenario 3 — Witness oppdager split-view
Witness kaster `KTSplitViewError` i `LightWitness.observe()` eller
`KTVerificationError` i transport. Dette betyr:
- Operatøren har enten
(a) hatt en software-bug som signerte to ulike STH-er ved samme
tree_size, eller
(b) er kompromittert / ondsinnet.
**Operatør-handling:**
1. Pause `POST /v1/keys/register`, `DELETE`, og bundle-fetch
umiddelbart (return 503).
2. Audit `shade_kt_sths` — hvis du finner to rader med samme
`tree_size` men ulik `root_hash`, har serveren gjort feil. Dette er
alvorlig — finn root cause før du fortsetter.
3. Kommuniser ut til brukerne. Forutsett at en angriper har vært
inne; trigge en bredere reset (recovery scenario 2) hvis det er
mistanke om tampering.
**Klient-handling:**
- `LightWitness` har allerede holdt brukeren tilbake.
- SDK-en surfacer feilen som `KTSplitViewError` til app-koden.
- App-en bør vise advarsel: "Operatørens server kan ikke verifiseres.
Avstå fra sending av sensitive meldinger inntil videre."
---
## Sikkerhets-anbefalinger
1. **Kjør minst én uavhengig witness.** Operatørens egen "witness"
teller ikke — det må være en separat prosess på separate
infrastruktur eid av en separat aktør (community-medlem, security
firm, e.l.).
2. **Pin `logPublicKey` i app-binær eller signert config.** En
man-in-the-middle som kan bytte både prekey-server og KT-key
fanges ikke av KT alene.
3. **Loggrotasjon krever menneske-i-løkken.** Ikke automatiser
key-rotation for KT — den eksplisitte breaking-event er en feature.
4. **`maxStaleMs` bør samsvare med din heartbeat.** 24t default tåler
en heartbeat-pause på opptil et døgn; senk til 14t hvis du har
strenge krav til friskhet.
5. **`observe-strict` bør være standard når økosystemet er etablert.**
Default `'observe'` er en operasjonell overgangsmodus, ikke et
sluttmål.
---
## Kjente begrensninger
- **Federation mellom flere prekey-servere** er ikke støttet i V3.12.
Hver Shade-deployment har én log eller ingen.
- **Sparse Merkle tree for adresse-index** brukes ikke i V3.12 —
fravær-proof er foreløpig nabopar-bevis. <100 KB ved 100k adresser
er akseptabelt; sparse tree blir relevant fra ~10M+ adresser.
- **One-time prekey-rotasjon committes ikke** til loggen. OTP er
ephemerale og inkludering ville støy-fylle loggen. Dette betyr at
en server som svarer med riktig identitet men feil OTP fanges ikke
av KT — forsvar mot dette ligger i V3.3 fingerprint-gate (samme
identitet) + sesjons-etableringens X3DH (feil OTP gir feil shared
secret → første melding feiler decryption).
---
## Tester og test-vektorer
- `packages/shade-key-transparency/tests/` — RFC 6962-kompatibel
Merkle-log + STH + index-proofs (58 tests).
- `packages/shade-server/tests/kt.test.ts` — server-integrasjon (8
tests).
- `packages/shade-transport/tests/kt-transport.test.ts` — klient-
verifikasjon over HTTP (4 tests).
- `packages/shade-transport/tests/kt-split-view-e2e.test.ts`
V3.12-akseptanse split-view-deteksjon (3 tests).
- `packages/shade-sdk/tests/kt.test.ts` — SDK-config + witness wiring
(3 tests).
Totalt 76 tester dedikert til KT.

193
docs/observability.md Normal file
View File

@@ -0,0 +1,193 @@
# Observability v2 — OpenTelemetry tracing
Shade ships an opt-in OpenTelemetry layer that wraps `TransferEngine`,
`ShadeSessionManager`, the prekey HTTP routes, and `@shade/files`
op-handlers in distributed spans. The layer is **off by default** and
PII-safe by construction — span attributes never include peer addresses,
plaintext payloads, or exact byte counts.
This complements the always-on Prometheus metrics exposed by
`@shade/server` and the structural events emitted by `@shade/core`. Use
metrics for aggregate counters and histograms, tracing for per-request
causality and tail-latency hunting.
---
## Quick start
```ts
import { trace } from '@opentelemetry/api';
import { withTracer } from '@shade/observability';
import { createShade } from '@shade/sdk';
// Use the OTel SDK of your choice (NodeSDK + OTLP exporter, Honeycomb,
// Sentry's OTel adapter, …) to register a tracer provider on the
// `@opentelemetry/api` global. Then:
const tracer = trace.getTracer('my-app');
const shade = await createShade({
prekeyServer: 'https://shade.example.com',
storage: 'sqlite:/data/shade.db',
observability: withTracer(tracer, { sample: 0.1 }),
});
```
The hook propagates automatically to:
- `ShadeSessionManager.encrypt` / `.decrypt` (per-peer mutex acquisition,
ratchet step).
- `TransferEngine.upload` / accepted incoming downloads (lane count,
retry count, partition mode).
- `@shade/files` op-handlers (per request, with op + result).
For the prekey server pass the hook to `createPrekeyRoutes`:
```ts
import { createPrekeyRoutes } from '@shade/server';
import { withTracer } from '@shade/observability';
const app = createPrekeyRoutes(store, crypto, {
observability: withTracer(tracer),
});
```
---
## Off-by-default semantics
`withTracer()` returns a no-op hook — the SDK never starts spans — when
**any** of the following are true:
1. The `tracer` argument is `undefined`/`null`.
2. The `SHADE_OTEL_ENABLED` env-var is not set to `1` or `true`. Override
with `withTracer(tracer, { force: true })`, or override the var name
with `withTracer(tracer, { envVar: 'MY_VAR' })`.
3. The configured `sample` rate is `0`.
Per-span sampling (`sample: 0.1` = 10 %) keeps trace volume bounded in
production. Default is `1` (sample everything when the hook is active).
---
## PII policy — what is safe to log, and what isn't
| Category | Status | Why |
|----------|--------|-----|
| **Peer hash** (`shade.peer.hash`) | ✅ allowed | 8-hex-char pseudonym derived via SHA-256. Stable across spans for a given address but does not expose the address itself. |
| **Bytes bin** (`shade.bytes.bin`) | ✅ allowed | One of `≤4KB`, `464KB`, `64KB1MB`, `110MB`, `10100MB`, `100MB1GB`, `≥1GB`. Coarse enough to mask file-size fingerprinting. |
| **Lane count** (`shade.lane.count`) | ✅ allowed | Snapped to `{1, 4, 16, 64}`. |
| **Retry count** (`shade.retry.count`) | ✅ allowed | Integer. |
| **Error code** (`shade.error.code`) | ✅ allowed | `SHADE_*` stable string code — never the full message, which may interpolate user input. |
| **Op kind** (`shade.op`) | ✅ allowed | `list`, `read`, `write`, `custom:foo`, etc. |
| **Route template** (`shade.route`) | ✅ allowed | `/v1/keys/bundle/:address` — the template, never the resolved path. |
| **HTTP status** (`shade.http.status`) | ✅ allowed | Integer status code. |
| **Partition mode** (`shade.partition`) | ✅ allowed | `range` or `round-robin`. |
| **Direction** (`shade.direction`) | ✅ allowed | `upload` or `download`. |
| Plaintext peer addresses | ❌ forbidden | Use `peerHash()`. |
| Plaintext message/file payloads | ❌ forbidden | Encryption boundary — never log. |
| Exact byte counts | ❌ forbidden | Use `bytesBin()`. |
| User identifiers (email, DID, `device:UUID`) | ❌ forbidden | Treat as PII. |
The full attribute-key allow-list is exported from `@shade/observability`
as `ATTR_*` constants. Plug-in authors who want to attach their own tags
should pass each `(key, value)` through `safeAttribute()`, which throws
`UnsafeAttributeError` for any key/value pair that looks like the
forbidden categories above (heuristics: `@`, `device:`, `did:`, key
fragments such as `peer.address` / `bytes.exact`, oversized strings).
---
## Span surface
### `shade.session.encrypt` / `shade.session.decrypt`
Wraps each per-peer `encrypt`/`decrypt` call. Includes the time spent
waiting on the per-peer mutex (`shade.lock.wait_ms`) — handy for
diagnosing ratchet contention under load.
### `shade.transfer.upload` / `shade.transfer.upload.resume`
Wraps an outbound stream transfer end-to-end. Attributes: `peer.hash`,
`bytes.bin`, `lane.count`, `partition`, `retry.count`, `result`,
`error.code`.
### `shade.transfer.download`
Started when the consumer calls `incoming.accept(...)`, ended when the
transfer completes, aborts, or fails an integrity check. Same attribute
set as upload.
### `shade.prekey.request`
One span per HTTP request handled by `@shade/server`'s prekey routes.
Attributes: `route` (the template), `http.status`, `error.code` on
failure. The address path-parameter is **never** placed on the span.
### `shade.files.op`
One span per `@shade/files` RPC. Attributes: `peer.hash`, `op` (the
resolved op kind, e.g. `read` or `custom:foo`), `bytes.bin` (estimated
plaintext size, binned), `result`, `error.code`.
---
## Recording & testing
`@shade/observability` ships a deterministic in-memory recorder for
unit tests:
```ts
import { createRecorder } from '@shade/observability';
const rec = createRecorder();
const shade = await createShade({ ..., observability: rec });
// … exercise code under test …
const hits = rec.scanForPII(['alice@example.com', 'plaintext-secret']);
expect(hits).toHaveLength(0);
```
The Shade test suite runs this recorder over every documented entry
point — see
`packages/shade-observability/tests/integration-pii.test.ts` and
`packages/shade-transfer/tests/observability.test.ts`. Any new
instrumentation must keep the suite green.
---
## Performance characteristics
- With OTel **off** (default): every Shade hook resolves to the shared
`NOOP_HOOK` instance. The cost is one function call + an object
allocation that V8 hoists out in the steady state — measured at
< 1 % overhead vs the pre-V3.4 baseline in the upload roundtrip
benchmark.
- With OTel **on**: cost depends entirely on the configured exporter.
Use `sample: 0.1` (or smaller) on hot paths in production.
---
## Adding new instrumentation
1. Identify a logical operation worth a span — typically anything that
crosses a network/disk boundary or contends on a lock.
2. Add an `observability?: ObservabilityHook` to the relevant config
surface, default to `NOOP_HOOK`.
3. Name the span `shade.<area>.<op>` to keep cardinality bounded.
4. Set attributes via the `ATTR_*` constants from
`@shade/observability`. **Never** introduce a new attribute key
without a PII review — if you must, run the value through
`safeAttribute()`.
5. Add a test that exercises the new instrumentation under the
`createRecorder()` recorder and asserts no PII leaks.
---
## Migration
Previous versions had no tracing — only Prometheus metrics. Adding the
`observability` field to existing configs is fully backwards-compatible
and never required. The `SHADE_OTEL_ENABLED` gate ensures forgetting to
flip the env-var in production won't surprise anyone with unexpected
overhead.

308
docs/recovery.md Normal file
View File

@@ -0,0 +1,308 @@
# Social Key Recovery (`@shade/recovery`)
V3.10 closes the biggest UX hole in any E2EE system: **"What happens
if I lose my phone?"**. Shade's social-recovery flow lets a user
designate `n` guardians (family / friends / co-workers) at setup time
such that any threshold-many `k` of them can together restore the
user's identity onto a new device — without any single guardian
being able to do it alone, and without the prekey server ever seeing
the recovered key material.
The whole flow ships entirely over existing 1:1 Shade sessions; no
server-side recovery agent, no escrow service, no "cloud guardian".
---
## Threat model recap
| # | Adversary | Recovered? |
|---|-----------|------------|
| 1 | Coalition of ≤ k-1 guardians | **No** (information-theoretic, by Shamir construction) |
| 2 | Prekey server alone | **No** (server only relays Double-Ratchet ciphertext) |
| 3 | Single malicious guardian who forges a share | **Detected** — AES-GCM tag mismatch on the backup blob; `requestRecovery` exhaustively tries threshold-sized subsets and rejects when none authenticate |
| 4 | Social engineering (impersonator calls a guardian) | **Mitigated, not eliminated** — guardians MUST OOB-confirm the new device's safety number before approving (see `<RecoveryApprove />`) |
| 5 | Compromised guardian device | **Out of scope** — see "Guardian compromise" below |
| 6 | Compromised primary device at setup time | **Out of scope** — recovery only protects the device; if setup material is exfiltrated, all bets are off |
---
## Setup
### What the user does
1. Pick `n` guardians from their existing peers.
2. Pick a threshold `k` (typically `⌈n/2⌉ + 1` to avoid pure-majority
dominance but still survive losing one or two).
3. Run `setupRecovery(...)`.
4. Print / record a **recovery card** with:
- The user's own address
- `setupId`
- `k` and `n`
- The list of guardian addresses
- Setup-time safety number
The recovery card is the only piece of state the user must remember
out-of-band (or store in a password manager). Without it, the user
cannot drive recovery on a new device — the new device needs to know
who the guardians are.
### What happens cryptographically
```text
recoveryKey = random(32 bytes)
backupBlob = Shade.exportBackup(passphrase = "shade-rk:" + base64url(recoveryKey),
knownAddresses = [...])
shares[i] = Shamir-split(recoveryKey, k, n)
```
For each guardian `i`:
```text
share-deposit envelope:
shadeRecovery: 1
type: "share-deposit"
flowId, setupId, originalAddress
threshold (k), guardianCount (n), shareIndex (i)
shareBytes: base64url( encodeShare(shares[i]) )
backupBlob: Shade.exportBackup output (identical for every guardian)
setupFingerprint, createdAt
```
The envelope rides through `Shade.send` like any other plaintext —
double-ratchet encrypted, AAD-bound, replay-safe.
The `recoveryKey` is **zeroized** on the primary device immediately
after the split returns. The primary therefore retains nothing
except `setupId` and the public roster.
### What each guardian stores
Per (`originalAddress`, `setupId`):
```text
{
shareIndex, // 1..n
shareBytes, // base64url-encoded Shamir share
backupBlob, // identical for every guardian
setupFingerprint, // for sanity-checks at recovery time
guardianCount, threshold,
receivedAt
}
```
The guardian's app provides a `RecoveryStore` implementation. The
package ships `MemoryRecoveryStore` for tests and small one-shot
demos; production guardian apps MUST supply a persistent store
(IndexedDB, AsyncStorage, SQLite, etc.). See "Persistence
recommendations" below.
---
## Recovery
### What the user does on the new device
1. Boot a fresh Shade with a temporary identity.
2. Read the recovery card.
3. In the recovery widget, type / paste:
- `originalAddress`
- `setupId`
- `threshold`
- The guardian roster
4. Read the new device's safety number (the widget displays it
prominently) to each guardian over a side channel — phone call,
in person, whatever they trust.
5. Wait for `≥ k` guardians to approve.
### What happens cryptographically
For each guardian, the new device sends:
```text
recovery-request envelope:
shadeRecovery: 1
type: "recovery-request"
flowId, originalAddress, setupId
requesterFingerprint (= safety number of the temporary identity)
requestedAt
```
Each guardian's `attachGuardian` handler:
1. Looks up its stored deposit by `(originalAddress, setupId)`. If
missing, replies with `share-decline` (`reason = "unknown setup"`).
2. Invokes the `approve` callback with the requester's address +
fingerprint + the original device's setup-time fingerprint. The
callback is the **OOB-confirmation gate** — it MUST require an
explicit user click after they verified the fingerprint. The
`<RecoveryApprove />` widget enforces this with a two-checkbox
gate.
3. On approve → ships `share-grant`. On reject → ships
`share-decline` with a short reason.
The new device collects grants, and as soon as `k` arrive:
1. Combines the `k` shares via Lagrange interpolation at `x = 0` to
reconstruct `recoveryKey`.
2. Derives `passphrase = "shade-rk:" + base64url(recoveryKey)`.
3. Calls `Shade.importBackup(backupBlob, passphrase)` — the
AES-GCM tag in the blob authenticates the reconstruction. **A
forged share is detected here.**
4. If a guardian forged a share, `importBackup` throws. The
reconstruction loop then tries every other threshold-sized subset
of grants until one authenticates (the V3.10 acceptance criterion
"no coalition of (k-1) guardians can rebuild the secret" is the
safety invariant; the AEAD authenticates which subset is
honest).
5. If every subset fails, `RecoveryReconstructionError` is raised
and the user is told that at least one guardian is malicious.
After `importBackup` succeeds, the new device hosts the original
identity and immediately calls `Shade.rotate()` to retire the
recovery-recovered key material from the conversation graph (the
old session keys persisted in the backup blob are now considered
"compromised — used for recovery").
> **The `Shade.beforeBackupImport` gate fires automatically.**
> Without a registered handler the SDK falls back to TOFU-with-warning
> (consistent with the V3.3 contract). Production apps SHOULD register
> a handler that pops the user one more confirmation before the
> identity rotates.
---
## Acceptance criteria status
- [x] **3-of-5 recovery works end-to-end on two separate Shade
instances.** See `tests/integration.test.ts`.
- [x] **No coalition of (k-1) guardians can reconstruct
`recoveryKey`.** Property test asserts this with `fast-check`
across random k/n configurations.
See `tests/shamir.test.ts` and
`tests/adversarial.test.ts`.
- [x] **Guardian-side widget requires fingerprint-confirmation
before sending.** `<RecoveryApprove />` enforces a
two-checkbox gate; `tests/adversarial.test.ts` exercises
both the matching-OOB and rejecting-OOB code paths.
---
## Persistence recommendations
The `RecoveryStore` interface is intentionally small (4 methods).
Pick the implementation that fits your platform:
| Platform | Suggested backing store |
|--------------------------|----------------------------------------|
| Browser (PWA) | IndexedDB (one object store, idb) |
| Browser (extension) | `chrome.storage.local` |
| React Native | AsyncStorage (with crypto-protected blob) |
| Bun / Node server | SQLite via `@shade/storage-sqlite` extension table OR a side file |
| Android (native) | Room / EncryptedSharedPreferences |
Whatever you pick, the records ARE NOT secret on their own — without
threshold-many other guardians' shares they're useless — but they
should still be stored encrypted-at-rest like any other Shade state.
Do not commit them to plaintext logs or network-replicated state.
---
## Guardian-UX guide
### How many guardians?
| n | Survives | Comment |
|---|----------|---------|
| 3, k=2 | 1 lost guardian | Minimum useful — one device away from danger |
| 5, k=3 | 2 lost guardians | Sweet spot for most users |
| 7, k=4 | 3 lost guardians | Suitable when you genuinely have 7+ trustworthy people |
| n=k | 0 lost | DO NOT USE — single point of failure |
The widget defaults to `k = ⌈n/2⌉` which is liberal but
collusion-resistant for `n ≥ 3`. Apps targeting paranoid users may
want to bump that to `⌈2n/3⌉`.
### Replacing a guardian
If a guardian dies, loses their device permanently, or you no longer
trust them:
1. Pick a replacement.
2. Run `setupRecovery` again with the new roster — this generates a
fresh `setupId` and a fresh `recoveryKey`. The old shares become
garbage (no guardian set can use them, because the
`backupBlob` is different).
The widget records the new `setupId` on the recovery card. Treat
this as a hard rotation; the user MUST re-record the card.
### Guardian health checks
Periodically (the V3.10 plan suggests a quarterly prompt), the user
should confirm each guardian is still reachable. Any guardian who
can't be reached in two consecutive prompts SHOULD trigger a
re-setup with a fresh roster. The widget UX track is to be added in
a follow-up release; the primitive is in place.
---
## Wiring example
```ts
import {
setupRecovery,
attachGuardian,
requestRecovery,
MemoryRecoveryStore,
} from '@shade/recovery';
// On the primary device:
const result = await setupRecovery({
shade,
guardians: ['bob', 'carol', 'dan', 'eve', 'faythe'],
threshold: 3,
deliver: async (to, envelope) => {
// wire to your app's existing message-delivery layer
await myMessageOutbox.send(to, envelope);
},
});
console.log(result.setupId);
// On each guardian device:
const stop = attachGuardian({
shade,
store: myPersistentStore, // see "Persistence" above
approve: async (ctx) => {
// Show ctx.requesterFingerprint to the user.
// Block until they confirm OOB and click "Release share".
return await myUI.askApproval(ctx);
},
deliver: myMessageOutbox.send,
});
// On the new device:
const recovered = await requestRecovery({
shade: temporaryShade, // fresh identity for now
originalAddress: 'alice',
setupId: 'sid-from-recovery-card',
threshold: 3,
guardians: ['bob', 'carol', 'dan', 'eve', 'faythe'],
deliver: myMessageOutbox.send,
onProgress: (p) => myUI.showProgress(p),
});
// `temporaryShade` now hosts the original identity.
```
---
## Out of scope (V3.10)
- **Cloud guardian / Shade-operated recovery agent.** Explicit
non-goal; the spec rejects any centralized component that can
recover on its own.
- **Auto-distribution.** The user must explicitly pick guardians.
- **Multi-share-per-guardian.** Each guardian holds exactly one
share. Apps that need redundancy should bump `n`, not give the
same guardian multiple shares.
- **Guardian ZK-proofs of liveness.** A guardian who refuses to
respond is treated as offline; we don't try to compel them.

160
docs/storage-encryption.md Normal file
View File

@@ -0,0 +1,160 @@
# At-Rest Storage Encryption (V3.2)
**Status:** Implemented in `@shade/storage-encrypted` 0.4.0
**Adresses:** THREAT-MODEL §4 — Compromised device storage
Shade's default `SQLiteStorage` and `PostgresStorage` write private keys and
session state to disk *unencrypted* — the threat model assumes the DB lives
inside a trusted environment. For deployments that need defence in depth,
`@shade/storage-encrypted` adds opt-in at-rest encryption: a stolen DB file
alone yields no usable private key material.
## At a glance
```ts
import { KeyManager, EncryptedSQLiteStorage } from '@shade/storage-encrypted';
const km = await KeyManager.open({
kind: 'passphrase',
passphrase: process.env.SHADE_STORAGE_PASSPHRASE!,
salt: loadSaltFromDisk(), // 16+ bytes, persisted alongside the DB
});
const storage = await EncryptedSQLiteStorage.open({
dbPath: '/data/shade-client.db',
keyManager: km,
});
// Use it exactly like SQLiteStorage — implements the same StorageProvider.
const manager = new ShadeSessionManager(crypto, storage);
```
## What is encrypted
Per-row AEAD over the sensitive payload of every row:
| Table | Encrypted |
|--------------------------------|-----------|
| `identity_enc` | the entire keypair (4× 32-byte keys) |
| `config_enc` | `registrationId` |
| `signed_prekeys_enc` | full `SignedPreKey` (incl. private half) |
| `one_time_prekeys_enc` | full `OneTimePreKey` |
| `sessions_enc` | the Double-Ratchet `SessionState` JSON |
| `trusted_identities_enc` | the trusted peer identity key |
| `retired_identities_enc` | full retired keypair |
| `stream_state_enc.ciphertext` | partition / lane / IO descriptor / streamSecret |
Routing fields on `stream_state_enc` (`stream_id`, `direction`,
`peer_address`, `status`, timestamps) stay plaintext so `listActiveStreamStates()`
remains an indexed query.
## Cryptographic design
```
masterKey (passphrase / keychain / app-injected)
├─ HKDF-SHA-256("shade-storage-v1") → storageKey (32 bytes)
│ └─ HKDF-SHA-256(storageKey, "shade-field-v1:{table}:{column}") → fieldKey (32 bytes)
└─ Used (transitively) for fingerprint checks
```
For each encrypted blob:
- `nonce = HKDF(fieldKey, "shade-row-nonce-v1:{table}:{pk}")[..12]`
deterministic per (key, row), safe because the per-(table, column)
fieldKey is unique. AES-GCM nonce reuse is catastrophic only if the
*same* key is reused with the *same* nonce on different plaintexts;
here every (key, row) pair has a unique nonce.
- `aad = "shade-aad-v1|{table}|{column}|{pk}"` — binds the ciphertext
to its row identity so a row swap or column move triggers decrypt
failure.
- `wire = nonce(12) || ciphertext || tag(16)` — stored as a single
`BLOB`/`BYTEA` column.
## Key sources
`KeyManager.open(...)` accepts three sources:
1. **Passphrase + KDF** — scrypt over `(passphrase, salt)`. Default
parameters: `N=2^17, r=8, p=1, dkLen=32` (~250 ms on a modern laptop).
The salt MUST be persisted alongside the DB (e.g. `<db>.salt`).
2. **OS keychain** — via `@shade/keychain`. Backends:
- macOS: `security` CLI (Keychain).
- Linux: `secret-tool` (libsecret).
- Windows: PowerShell + `CredentialManager` module.
No native deps; `createIfMissing: true` generates and stores a fresh
32-byte key.
3. **App-injected** — caller supplies a 32-byte raw key. Most flexible;
plug your own KMS / HSM / Vault path here.
Wrong-passphrase detection is built in: a fingerprint of the storageKey
is persisted in `shade_meta_enc` on first open and compared on every
subsequent open. A mismatch raises with a clear error — never silently
writing under the wrong key.
## Migration
CLI:
```bash
# Encrypt an existing unencrypted DB (atomic per row, .bak written first).
shade migrate-storage \
--key-source passphrase \
--passphrase "$SHADE_STORAGE_PASSPHRASE" \
--salt-file /data/shade-client.db.salt
# Validate without writing.
shade migrate-storage ... --dry-run
# Keychain mode.
shade migrate-storage --key-source keychain \
--keychain-service shade.storage --keychain-account default
# Inject a raw key (e.g. from your KMS).
shade migrate-storage --key-source injected \
--key-hex "$(cat ~/.shade/storage.key.hex)"
```
The migration is *resumable*: re-running it on a partially-migrated DB
re-writes the same rows under the same key (idempotent). On clean
completion, the unencrypted tables are dropped (use `--keep-original`
to preserve them).
## Rotation
```bash
shade rotate-storage-key \
--key-source passphrase --passphrase "$OLD_PASS" \
--new-key-source passphrase --new-passphrase "$NEW_PASS" \
--new-salt-file /data/shade-client.db.salt.new
```
Reads each encrypted row under the old key, re-seals under the new key.
The DB stays online; brief read-after-write inconsistency for in-flight
readers is acceptable for the supported deployments (CLI tools,
single-process servers). On completion the fingerprint is updated and
the old key no longer opens the DB.
## What this does *not* protect
Even with at-rest enabled:
- A live process holds the storageKey and fieldKeys in memory. An attacker
who can dump process memory (`/proc/<pid>/mem`, swap, hibernation,
coredump) recovers the keys.
- Swap is not encrypted by Shade. Use an encrypted swap device.
- The `.bak` file produced during migration is plaintext during the
migration window. Treat it like the original DB and store securely.
- Lost master key = lost DB. V3.10 (Social Recovery) is the long-term
mitigation.
See `THREAT-MODEL.md` §4 for the full list, including the "with at-rest
enabled" boundary.
## Cross-implementation parity
`test-vectors/storage-encryption.json` pins KDF parameters, info strings,
nonce derivation, and AAD format. The Android implementation (V3.5) MUST
produce byte-identical outputs for the same inputs — covered by
`packages/shade-storage-encrypted/tests/test-vectors.test.ts`.

View File

@@ -107,11 +107,264 @@ manually after rotation.
| S7 | seq overflow practical-impossible (u64 max) |
| S8 | At-rest streamSecret encrypted under device-key |
## Hardening
`@shade/streams` ships unbounded by default — a peer can declare a
1 PiB transfer and the receiver will dutifully allocate lane state for
it. Production receivers must enforce limits at the boundary. The
`@shade/files` package wires the same patterns up for its filesystem
RPC; copy the shapes that fit your app.
### Per-stream caps
The receiver sees the declared plaintext size in the `stream-init`
control message before it accepts. Reject above your tolerance:
```ts
shade.onIncomingTransfer(async (incoming) => {
if (incoming.metadata.totalBytes > 256 * 1024 * 1024) {
await incoming.decline({ reason: 'stream too large' });
return;
}
await incoming.accept({ output: ... });
});
```
Recommended ceilings (tune to your product, not these):
| Tier | totalBytes ceiling | Rationale |
|------|--------------------|-----------|
| Chat attachment | 25 MiB | matches mobile MMS / Slack expectations |
| Photo / doc share | 256 MiB | covers raw RAW + most desktop docs |
| Backup / dataset | 4 GiB | larger needs explicit operator opt-in |
### Per-chunk cap
`createTransferRoutes` accepts `maxChunkBytes` (default ≈ 16 MiB +
header). Lower it if your sink can't absorb that — the receiver will
413 anything over the limit before the chunk is decrypted, which
keeps DoS cost bounded.
### Per-sender quotas
`@shade/files` ships a `RateLimiter` (`packages/shade-files/src/server/rate-limiter.ts`)
that enforces both ops-per-window and bytes-per-hour caps per sender
address. The same shape is the recommended template for guarding raw
streams: wrap `incoming.accept` in a check that consumes from a token
bucket keyed by `incoming.fromAddress`, and reject with `decline()`
when the bucket is empty. See
`packages/shade-files/tests/security/quota.test.ts` for the test
shape.
### TTL on idle streams
A `paused` stream-state record consumes a row in your storage and an
encrypted streamSecret slot until it expires. Use the **Retention**
defaults below to expire abandoned streams; pair with a metric
(`shade_stream_states_active`) and an alert when the count grows
unbounded. A peer that opens streams and never finishes them is the
dominant abuse pattern for resumable transfer.
### Trust gates
For high-stakes transfers (backups, key material, internal docs),
gate `accept()` on a verified fingerprint. The pattern mirrors
`@shade/files`'s fingerprint gate — see
`packages/shade-files/tests/security/fingerprint-gate.test.ts`.
## Retention
Resumable streams persist a `PersistedStreamState` per in-flight
transfer, encrypted under a device key. Without retention, every
crashed or abandoned upload leaves a row behind forever.
### Defaults
The shipped `bun-server` SDK template (`shade init --template bun-server`)
schedules `pruneStreamStates` on a daily cron with a **14-day**
horizon. That is: any stream-state record whose `updatedAt` is older
than 14 days is removed at the next sweep. If a sender resumes a
14-day-old stream, it will get a "no state" 404 and start over —
which is the right answer for a transfer that has been idle for two
weeks.
### Tuning the horizon
Set `SHADE_STREAM_RETENTION_DAYS` in the template's environment to
override the 14-day default. Recommended ranges:
| Use case | Horizon | Why |
|----------|---------|-----|
| Synchronous chat | 13 days | resume-after-crash, not resume-after-vacation |
| File-share product | 714 days | covers a typical user vacation |
| Cold backup target | 30+ days | deliberate, but plan for storage growth |
### Hooking the prune call manually
If you bring your own server (no `bun-server` template), call the
storage method on your own schedule:
```ts
import { setInterval } from 'node:timers';
const ONE_DAY_MS = 24 * 60 * 60 * 1000;
const HORIZON_MS = 14 * ONE_DAY_MS;
setInterval(async () => {
if (storage.pruneStreamStates !== undefined) {
await storage.pruneStreamStates(Date.now() - HORIZON_MS);
}
}, ONE_DAY_MS);
```
`pruneStreamStates(olderThan)` removes records whose `updatedAt` is
strictly less than `olderThan`. It is idempotent and safe to call
concurrently.
## Rich file metadata + previews (V3.9)
`stream-init` plaintext can carry an optional `fileMetadata` field that
ships filename, MIME-type, and a thumbnail-stream pointer **end-to-end
encrypted**. Older receivers ignore the field — backwards-compatible
with 0.2.x / 0.3.x peers.
```jsonc
{
"kind": "shade.stream-init/v1",
"streamId": "...",
"streamSecret": "...",
"metadata": {
"chunkSize": 1048576,
"sentAt": 1730000000000,
"fileMetadata": {
"filename": "report.pdf",
"mimeType": "application/pdf",
"thumbnailStreamId": "Ej1z...",
"thumbnailHash": "9a7c...",
"thumbnailMime": "image/webp",
"thumbnailBytes": 18342
}
},
"lanes": [ /* ... */ ]
}
```
### What rides where
| Field | Plane | Visible to server? |
|-------|-------|--------------------|
| `filename` | inside Double Ratchet plaintext | no |
| `mimeType` | inside Double Ratchet plaintext | no |
| `thumbnailStreamId` | streamId of companion stream | yes (random ID, no info leak) |
| `thumbnailHash` | sha256 of preview plaintext | base64 hash only, no pixels |
| `thumbnailMime` | one of `image/jpeg / image/webp / image/png` | yes (allowlist enforced) |
| `thumbnailBytes` | declared length, capped at 64 KiB | yes |
| thumbnail bytes themselves | separate AEAD stream, own lane | no |
The thumbnail rides as its **own stream-transfer**, keyed independently
from the main stream. A server compromise leaks neither preview pixels
nor original bytes.
### Sender — attach a preview
```ts
// Pre-computed preview (server-side pipeline path):
await shade.upload({
to: 'bob',
input: pdfBytes,
thumbnail: { bytes: previewWebp, mime: 'image/webp' },
metadata: { fileMetadata: { filename: 'report.pdf', mimeType: 'application/pdf' } },
});
// Browser auto-generation (image File / Blob → 256×256 preview):
await shade.upload({
to: 'bob',
input: imageFile, // a `File` from <input type="file">
generateThumbnail: true, // OffscreenCanvas + createImageBitmap
});
```
`generateThumbnail` is a no-op on runtimes lacking
`OffscreenCanvas + createImageBitmap` (Bun, Node) — those callers should
pre-generate and pass `thumbnail` directly, or skip the preview entirely.
### Receiver — render in widgets
The bundled `@shade/widgets` `useShadeDownload` hook auto-accepts
thumbnail streams (marked by `userMetadata.shadeThumbnail = '1'`) into
an in-memory `ShadeThumbnailCache`. `<TransferRow showThumbnail
fileMetadata={...} />` reads from the same cache and renders inside an
`<img>` element so the browser's image-decoding sandbox is the trust
boundary for format parsing.
```tsx
<ShadeThumbnailProvider>
<TransferRow
handle={handle}
progress={progress}
showThumbnail
fileMetadata={incoming.metadata.fileMetadata}
/>
</ShadeThumbnailProvider>
```
### Format-hardening (sender + receiver)
Both sides enforce the same rules — single source of truth in
`@shade/streams/file-metadata.ts`:
| Rule | Limit |
|------|-------|
| `thumbnailMime` allowlist | `image/jpeg`, `image/webp`, `image/png` |
| `thumbnailBytes` cap | 64 KiB (`THUMBNAIL_MAX_BYTES`) |
| `filename` length | ≤ 1024 chars, no control characters |
| `mimeType` shape | RFC 7231 `type/subtype` token |
| Hash binding | declared `thumbnailHash` = sha256(preview bytes); mismatched bytes are dropped at the cache before any render |
A hostile peer cannot:
- smuggle exotic image formats past the allowlist (envelope parser
rejects at decode-time),
- substitute different bytes for a declared preview (cache verifies
sha256 before exposing bytes to a renderer),
- inflate the cache to OOM the receiver (LRU + 1 MiB total cap).
### Risks consciously accepted
- **Preview-arrival ≠ send completion.** A receiver may see the
thumbnail before the main upload finishes. For high-stakes flows
where "did Alice send X?" is itself sensitive, send the preview
*only* after main completion (set `thumbnail` to `null` and instead
ship a follow-up `stream-init` with the preview). The default
ordering optimizes UX, not metadata-secrecy.
- **Renderer trust.** We render through a Blob-URL `<img>`. A 0-day
in the browser's image decoder would still reach the receiver. Keep
browsers patched; rely on the CSP of your embedding app.
## API surface
See package READMEs:
- `packages/shade-streams/README.md` — crypto + state machines
- `packages/shade-transfer/README.md` — orchestration, transports, persistence
- `packages/shade-transport-webrtc/README.md` — V3.11 P2P transport plug-in
- `packages/shade-sdk/README.md` — magic drop-in
- `packages/shade-widgets/README.md` — React UI
## Transports
`@shade/transfer` ships HTTP + WebSocket chunk transports. V3.11 adds an
opt-in P2P chunk transport via `RTCDataChannel`:
- HTTP — `ShadeTransferHttpTransport`. POST per chunk; the receiver-
side route is `app.route('/v1/transfer', await shade.transferRoute())`.
- WebSocket — `ShadeTransferWsTransport`. One connection per peer,
binary-framed chunks, JSON acks; same wire format inside the frame as
the WebRTC transport.
- WebRTC — `WebRtcTransferTransport` from `@shade/transport-webrtc`.
Wired automatically by `shade.configureWebRTC()` as the primary
layer of a `MultiTransportFallback([webrtc, http])`. See
[docs/webrtc.md](./webrtc.md).
`MultiTransportFallback` is the N-ary generalisation of
`FallbackTransferTransport`: pass an ordered list of named transports
and the engine demotes sticky on `TransferTransportError`.

224
docs/transport.md Normal file
View File

@@ -0,0 +1,224 @@
# Shade Transport — Bridge Layer (V3.7)
> **Looking for V3.11 (peer-to-peer chunk transport via `RTCDataChannel`)?**
> See [docs/webrtc.md](./webrtc.md). This page covers the V3.7 bridge
> layer that ships ciphertext *envelopes* (control plane) over
> WS / SSE / long-poll. The two are orthogonal: the bridge handles
> store-and-forward control envelopes; WebRTC handles direct chunk data.
The bridge layer is the answer to: **"my client is a browser extension /
strict-corp-proxy / edge-runtime / iOS app — I cannot keep a WebSocket
open. How do I receive ciphertext envelopes?"**
It is built on top of the V3.6 inbox: every transport delivers the same
inbox blobs, with the same authentication semantics. Application code
sees a single `IncomingMessage` shape and never branches on transport.
```
┌─────────────────────────────────────────────────────────────────┐
│ application code │
│ │
│ bridge.connect({ onMessage: (m) => decrypt(m.bytes) }) │
└────────────────────────────────┬────────────────────────────────┘
┌─────────────────────────┴──────────────────────────┐
│ FallbackBridgeTransport │
│ (sticky-after-first-success) │
└──┬──────────────────┬─────────────────────────┬────┘
│ │ │
┌──────▼─────┐ ┌──────▼─────┐ ┌──────▼─────┐
│ WsBridge │ │ SseBridge │ │ LongPoll │
│ /v1/ │ │ /v1/ │ │ Bridge │
│ bridge/ws │ │ bridge/ │ │ /v1/bridge │
│ │ │ stream │ │ /poll │
└──────┬─────┘ └──────┬─────┘ └──────┬─────┘
│ │ │
└──────────────────┼─────────────────────────┘
┌─────▼──────┐
│ inbox │ ← the same V3.6 store
│ blobs │ and events
└────────────┘
```
## When to reach for which
| Transport | Latency | Proxy resilience | Browser | Server cost |
|-------------|----------|------------------|---------|-------------|
| WebSocket | ms | breaks under strict CONNECT-blocking proxies | ✓ | one socket per client |
| SSE | ms | passes most HTTP proxies (text/event-stream) | ✓ | one streamed response per client |
| long-poll | ≤ 25 s | passes anything that allows GET | ✓ | one held request per client |
The recommended composition:
```ts
import {
FallbackBridgeTransport,
WsBridge,
SseBridge,
LongPollBridge,
} from '@shade/transport-bridge';
const auth = {
crypto, // CryptoProvider
signingPrivateKey, // recipient's Ed25519 private key
address: 'bob',
};
const bridge = new FallbackBridgeTransport([
new WsBridge({ baseUrl: 'https://relay.example.com', auth }),
new SseBridge({ baseUrl: 'https://relay.example.com', auth }),
new LongPollBridge({ baseUrl: 'https://relay.example.com', auth }),
]);
await bridge.connect({
onMessage: async (msg) => {
// msg.bytes is a Uint8Array — pass it to your decrypt path.
// msg.from is the relay-known sender hint (may be empty); the
// authoritative sender comes from the decrypted envelope.
// msg.msgId is the relay's deterministic message id (sha256(ciphertext)).
const envelope = decodeEnvelope(msg.bytes);
await shade.receive(senderAddress, envelope);
},
});
// Read which transport the fallback chain settled on:
console.log(bridge.activeKind); // "ws" | "sse" | "long-poll"
```
## The IncomingMessage shape
```ts
interface IncomingMessage {
from: string; // relay-side sender hint (may be "")
bytes: Uint8Array; // the ciphertext envelope, exactly as PUT
receivedAt: number; // relay-monotonic cursor — NOT wall-clock arrival
msgId?: string; // sha256(bytes) — useful for ack/dedup
}
```
`from` is intentionally a hint — sender provenance lives inside the
encrypted envelope and is recovered post-decrypt. The bridge layer is
plaintext-blind by design.
## Auth — signed query parameters
Every bridge request signs the canonical
`{address, kind, since, signedAt}` payload with the recipient's Ed25519
signing private key. The server looks up the address-owner key
registered via `/v1/inbox/register` and verifies the signature.
`kind` is bound into the canonical payload so a signature for `/poll`
cannot be replayed against `/stream` or `/ws`.
The browser `EventSource` API does not let callers attach custom
headers; query parameters are the only portable carrier and so the
bridge protocol uses them uniformly across all three transports.
## Server-side — `createBridgeRoutes`
```ts
import { createBridgeRoutes } from '@shade/inbox-server';
import { Hono } from 'hono';
const inbox = new MemoryInboxStore();
const events = new InboxServerEvents();
const bridge = createBridgeRoutes({
store: inbox,
crypto,
events,
longPollTimeoutMs: 25_000, // default — under typical proxy idle limits
heartbeatIntervalMs: 15_000, // SSE keepalive comments
fallbackPollIntervalMs: 1_000, // when no `events` emitter is wired
});
const app = new Hono();
app.route('/', bridge.app);
Bun.serve({
port: 3900,
fetch: (req, srv) => app.fetch(req, srv),
websocket: bridge.websocket as any,
});
```
The bridge subscribes to `InboxServerEvents` (`inbox.blob_stored`) for
push-style delivery — when an event fires for a connected address, the
server fetches new blobs and forwards them. If no events emitter is
wired, the server falls back to a small in-process polling timer at
`fallbackPollIntervalMs` cadence.
## Cursor & resume
Every `IncomingMessage.receivedAt` is the relay's monotonic cursor for
the address. Bridges expose `getCursor()` so applications can persist
the high-water mark and pass it as `startCursor` on the next
`connect()`:
```ts
const sse = new SseBridge({
baseUrl,
auth,
startCursor: await persistedCursor.load(),
});
await sse.connect({
onMessage: async (msg) => {
await persistedCursor.save(msg.receivedAt);
// …
},
});
```
For SSE specifically, the server emits an `id:` field per event; the
bridge sends it back as `Last-Event-ID` plus the `since=` query
parameter on reconnect, so a flapping connection picks up exactly where
it left off without duplicates.
## Reconnect & backoff
| Bridge | Auto-reconnect | Backoff |
|-------------|----------------|----------------------|
| WS | yes (default) | 250 ms → 10 s exponential |
| SSE | yes (default) | 250 ms → 10 s exponential |
| long-poll | always on (the loop *is* the reconnect) | 2 s on hard error |
Pass `disableAutoReconnect: true` (WS / SSE) for tests where you want a
single attempt and immediate surfaced error.
## Long-poll concurrency
The `LongPollBridge` issues exactly one request at a time. The next
request fires after the previous one resolves. This guarantees a
client never holds more than one TCP connection on the server, which
matches the V3.7 acceptance criterion and keeps capacity planning
simple: max in-flight long-poll requests = number of connected clients.
## Failure modes
- **WS handshake rejected (4xxx code).** `WsBridge.connect` rejects.
Caller (or `FallbackBridgeTransport`) moves on.
- **SSE returns non-200.** `SseBridge.connect` throws a `BridgeError`
with `httpStatus`.
- **Long-poll returns non-200.** Same — `BridgeError` with `httpStatus`.
- **Mid-stream error after connect.** WS/SSE auto-reconnect; long-poll
swallows transient errors and continues looping. Errors flow to the
caller's `onError` handler.
## Acceptance test coverage (V3.7)
`packages/shade-transport-bridge/tests/bridge.test.ts` covers:
- "Send 100 small messages" — one test per transport, all pass.
- "WS blocked by proxy → SSE → long-poll" — fallback test boots a
server where the WS endpoint is unreachable and the SSE endpoint
returns 502, verifies the chain falls all the way through to
long-poll without message loss.
- "Long-poll uses ≤ 1 outstanding request" — wraps `fetch` to count
in-flight requests over 1.5 s of steady-state operation.
- Cursor resume — tears down an SSE connection mid-stream, pushes more
blobs, reconnects with the persisted cursor, asserts exactly the new
blobs are delivered (no overlap with the pre-disconnect set).
- Auth rejection — wrong signing key and unregistered address both
produce hard `connect` rejections so the fallback chain advances.

156
docs/trust-ux.md Normal file
View File

@@ -0,0 +1,156 @@
# Trust UX — Fingerprint Gates (V3.3)
> Status: shipped in 0.4.0, GA-frozen in 4.0 — see [V3.3 plan](./archive/V3.3.md).
Shade ships with a small number of **blocking** verification gates that
fire automatically before the operations where MITM risk is highest.
Each gate calls a handler you register on the SDK; until the user (or
your handler) approves, the operation aborts with
`FingerprintNotVerifiedError`.
The point of the gate model is to be alert-fatigue-free: you don't see
a prompt before every chat message, just before the handful of moments
that genuinely matter.
---
## What the gates protect
| Gate | Fires when | Default policy |
|------|------------|----------------|
| `first-large-file` | `Shade.upload(...)` for an unverified peer with a known size at or above the configured threshold. | Threshold `10 MiB`. Below = no gate. |
| `backup-import` | `Shade.importBackup(...)` before any state is written. Handler receives the fingerprint of the identity *embedded in the backup*. | Always fires. |
| `new-device-trust` | `Shade.acceptIdentityChange(...)` after a peer rotates identity. The peer's `identity_version` is bumped first so any prior verification is automatically stale. | Always fires. |
| `inbox-fanout` | Reserved for V3.6 (`@shade/inbox`). Per-recipient hook is wired today so apps can register it now. | Always fires. |
---
## Registering handlers
```ts
const shade = await createShade({
prekeyServer: 'https://prekeys.example.com',
storage: 'sqlite:/data/shade.db',
});
shade.beforeFirstLargeFile(10 * 1024 * 1024, async (ctx) => {
// ctx.peerAddress, ctx.fingerprint, ctx.fileSize
return await ui.confirmFingerprintModal(ctx);
});
shade.beforeBackupImport(async (ctx) => {
// ctx.fingerprint = fingerprint of the identity in the backup blob
return await ui.confirmBackupOwner(ctx);
});
shade.beforeNewDeviceTrust(async (ctx) => {
// ctx.fingerprint = fingerprint of the rotated identity
return await ui.confirmDeviceRotation(ctx);
});
```
Return `true` to allow the operation and persist a `'user'` verification.
Return `false` (or throw) to abort with `FingerprintNotVerifiedError`.
If you don't register a handler, the gate **logs a one-time warning per
peer and proceeds on TOFU**, persisting a `'tofu-after-warning'`
verification. This satisfies the V3.3 acceptance criterion that apps
without registered gates get sane defaults instead of hard-failing — but
it does mean the gate is informational, not a hard wall, in that
configuration. Always register handlers in production.
---
## Manual verification
The handler model assumes your app drives the OOB compare/confirm
flow. If the user verifies through some other path (QR code scan, audio
read-aloud, transitive trust from V3.10), call:
```ts
await shade.markPeerVerified('bob'); // pin current fingerprint
await shade.unmarkPeerVerified('bob'); // revoke
const ok = await shade.isPeerVerified('bob'); // check status
```
`markPeerVerified` reads the peer's *current* fingerprint and pins it
together with the per-peer `identity_version`. When the peer rotates
(`acceptIdentityChange`), the version bumps and the saved verification
goes stale automatically — `isPeerVerified` will return `false` until
the user re-verifies.
---
## Tuning thresholds
The `first-large-file` threshold is the only knob that's customer-tunable
without code changes. The defaults are conservative:
- **Default:** `10 MiB`. Big enough that ordinary chat attachments don't
trigger; small enough that obvious "exfil candidates" do.
- **Lower** (e.g. `1 MiB`) for high-sensitivity deployments — every
document goes through the gate.
- **Raise** (e.g. `100 MiB`) only for use cases where small uploads are
routine and large transfers are deliberate / pre-arranged.
`backup-import` and `new-device-trust` have no threshold by design — the
spec mandates an irremovable minimum gate for both, since each one
either trusts a fresh identity or overwrites pinned trust wholesale.
---
## React widget
Use `<FingerprintGate />` from `@shade/widgets` to block UI on
verification status:
```tsx
import { FingerprintGate } from '@shade/widgets';
<FingerprintGate peerAddress="bob">
<ChatThread peer="bob" />
</FingerprintGate>
```
The default fallback shows the safety number, a "Copy OOB text" button,
and an "I have verified" button that calls `Shade.markPeerVerified`.
Pass a `fallback` render prop to use your own UI, or `onVerified` to
react to the unverified → verified transition.
`<FingerprintCompare />` is the existing observer-dashboard widget; it
now exposes the same Copy-OOB / verify actions when an `onVerified`
prop is wired.
---
## Errors
`FingerprintNotVerifiedError` carries:
- `peerAddress` — the address the gate was protecting.
- `gate``'first-large-file' | 'backup-import' | 'new-device-trust' | 'inbox-fanout'`.
- `code = 'SHADE_FINGERPRINT_NOT_VERIFIED'` — maps to HTTP 403.
Catch it explicitly when wrapping `upload`, `importBackup`, and
`acceptIdentityChange`:
```ts
try {
await shade.upload({ to: 'bob', input: bytes });
} catch (err) {
if (err instanceof FingerprintNotVerifiedError) {
showVerifyFirst(err.peerAddress);
return;
}
throw err;
}
```
---
## Migration from 0.3.x
No breaking changes: existing apps gain warning-mode gates automatically
(see the no-handler note above). To upgrade to hard gates, register
handlers for the operations you use. Your existing `FingerprintCompare`
calls keep working; pass `onVerified` to enable the new actions.

276
docs/web-workers.md Normal file
View File

@@ -0,0 +1,276 @@
# Web Workers Crypto
Status: Implemented (V3.8 — `0.4.0`).
`@shade/crypto-web` ships with an opt-in dedicated Web Worker that keeps
AES-GCM, HKDF, HMAC, X25519 and Ed25519 — and full per-lane stream state —
off the main thread. Big in-browser uploads (100 MB+) stay smooth without
frame drops.
This doc covers:
- [When to use it](#when-to-use-it)
- [Setup](#setup)
- [API](#api)
- [Bundler recipes](#bundler-recipes)
- [Safari notes](#safari-notes)
- [SharedArrayBuffer (COOP/COEP)](#sharedarraybuffer-coopcoep)
- [Lifecycle and rotation](#lifecycle-and-rotation)
- [Threat-model considerations](#threat-model-considerations)
---
## When to use it
The default `SubtleCryptoProvider` runs on whatever thread you give it.
For the SDK that means the main thread. AES-GCM via SubtleCrypto is fast
(hardware-accelerated), but a 100 MB file at 256 KiB chunks is ~400 AEAD
calls — each one queues a microtask on the main thread. Layered on top of
React reflows and large `postMessage` payloads to the network worker, you
*will* see frame drops.
Reach for the Worker pipeline when:
- You upload or download files that don't fit in a single AEAD chunk
(≥ ~1 MB) inside a UI-bearing browser tab.
- You generate or rotate identity / device keys in a UI thread that must
stay interactive.
- You do batch AEAD (e.g. backup export over many records).
You can keep using `SubtleCryptoProvider` for short ops (Signal session
encrypt/decrypt for a chat message). The cost of a `postMessage` round-
trip dwarfs the cost of a single 256-byte AES call.
---
## Setup
`@shade/crypto-web` exposes the worker as a separate subpath, so your
bundler can resolve it through the standard `new Worker(new URL(...,
import.meta.url))` idiom.
```ts
import { createShade } from '@shade/sdk';
const shade = await createShade({ /* ... */ });
shade.configureWorkerCrypto({
workerUrl: new URL('@shade/crypto-web/worker', import.meta.url),
});
```
After `configureWorkerCrypto`, the SDK exposes:
- `shade.encryptStream({ streamId, streamSecret, ... })` — returns a
`TransformStream<Uint8Array, Uint8Array>` and a `laneSha256` promise.
- `shade.decryptStream({ streamId, streamSecret, ... })` — inverse.
- `shade.getWorkerCrypto()` — direct access to the `WorkerCryptoProvider`
for one-off ops (HKDF batches, X25519 batch DH, etc.).
The worker is spawned on first use and self-terminates after
`idleTimeoutMs` (default 30 s) — no manual lifecycle management required.
---
## API
### Stream encryption
```ts
const { stream, laneSha256 } = await shade.encryptStream({
streamId: streamId, // 16 random bytes, agreed with peer
streamSecret: streamSecret,// 32 random bytes, derived via Double Ratchet
laneId: 0, // lane index (use multi-lane for parallel HTTP)
chunkSize: 256 * 1024, // optional; default 256 KiB
});
await file.stream()
.pipeThrough(stream)
.pipeTo(transferSink); // your HTTP-shipping WritableStream
const sha256 = await laneSha256; // for end-to-end integrity proof
```
`stream` consumes plaintext and emits one wire-encoded
`stream-chunk` envelope per write. `flush` always emits a final chunk
with `isLast=true` (even if the trailing slice is empty), so receivers
see a clean termination.
### Stream decryption
```ts
const { stream, laneSha256 } = await shade.decryptStream({
streamId,
streamSecret,
laneId: 0,
});
await incomingChunkStream
.pipeThrough(stream)
.pipeTo(fileSink);
const sha = await laneSha256;
if (!equal(sha, peerLaneSha256)) throw new IntegrityError();
```
Each input chunk MUST be a complete wire envelope. The transport-layer
caller is responsible for framing (one envelope per write). Out-of-order
or replayed chunks reject the stream — the lane key never crosses thread
boundaries, so a man-in-the-middle script in the page can't recover key
material to replay against.
### Direct provider access
```ts
const crypto = await shade.getWorkerCrypto();
// Implements `CryptoProvider` — drop-in replacement for SubtleCryptoProvider
const { ciphertext, nonce } = await crypto.aesGcmEncrypt(key, plaintext);
```
`randomBytes`, `randomUint32`, `constantTimeEqual`, `zeroize` execute on
the calling thread (no round-trip). Async ops forward to the worker.
---
## Bundler recipes
### Vite
```ts
shade.configureWorkerCrypto({
workerUrl: new URL('@shade/crypto-web/worker', import.meta.url),
});
```
Vite resolves the URL via `import.meta.url` and emits a discrete chunk
for the worker. No additional config required for Vite ≥ 5.
If your build complains about `?worker` syntax, use the explicit URL
form (above) — it's the standard Vite idiom.
### Webpack 5 / Rspack
Same idiom — Webpack 5 understands `new URL('./worker.js', import.meta.url)`
natively as long as the source is ESM:
```ts
new Worker(new URL('@shade/crypto-web/worker', import.meta.url), {
type: 'module',
});
```
For Webpack 4 or non-ESM builds, you need `worker-loader` (legacy). We
do not officially support Webpack 4.
### Rollup
Rollup needs `@rollup/plugin-web-worker-loader` or a recent
`rollup-plugin-import-meta-url`. The standard idiom works once the
plugin is wired:
```ts
new URL('@shade/crypto-web/worker', import.meta.url)
```
If your bundler can't resolve `@shade/crypto-web/worker`, copy
`node_modules/@shade/crypto-web/src/worker.ts` (or the compiled `.js`
once we ship dist artefacts) into your `public/` directory and pass an
absolute URL:
```ts
shade.configureWorkerCrypto({ workerUrl: '/shade-crypto.worker.js' });
```
---
## Safari notes
Safari ≤ 17 has a smaller `postMessage` transferable budget than Chrome /
Firefox. Single transfers above ~64 MB occasionally fail silently. The
shipped pipeline already chunks plaintext to 256 KiB before AEAD, so
each `postMessage` carries ≤ ~256 KiB + AEAD overhead — well under any
known Safari limit.
If you override `chunkSize`, keep individual buffers below 16 MiB:
```ts
shade.encryptStream({
streamId, streamSecret,
chunkSize: 8 * 1024 * 1024, // 8 MiB — safe across all browsers
});
```
We do not officially support Safari ≤ 14 (no module workers).
---
## SharedArrayBuffer (COOP/COEP)
The default pipeline uses `ArrayBuffer` transfer (zero-copy ownership
hand-off). It does **not** require COOP/COEP headers.
For multi-lane parallel transfers across multiple workers, you may opt
in to `SharedArrayBuffer` for the AEAD plaintext buffers. That requires
your origin to serve:
```
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
```
`SharedArrayBuffer` support is gated behind a future `useSharedBuffers`
option and is not enabled in V3.8. See `docs/V4.0.md` if/when this lands.
---
## Lifecycle and rotation
```ts
const crypto = await shade.getWorkerCrypto();
await crypto.rotate(); // tear down the current worker, respawn lazily
await crypto.destroy(); // permanent — every subsequent call rejects
```
`shade.shutdown()` calls `destroy()` automatically. The idle-timer fires
30 seconds after the last response (configurable via
`configureWorkerCrypto({ idleTimeoutMs })`); if the timer fires while
calls are pending, it does nothing and reschedules.
---
## Threat-model considerations
- The worker runs in the same origin and the same browsing context as
the main thread. It is **not** a sandbox against a compromised page;
any script that can `eval` in your tab can also `postMessage` to the
worker. The Worker is a *performance* boundary, not a *security*
boundary.
- Lane keys derived inside the worker stay there; they are never
postMessage'd to the main thread. This narrows the window during which
a key sits in main-thread heap, which helps against post-mortem heap
inspection by a curious extension. It does not help against an active
in-page attacker.
- `randomBytes` runs on the calling thread (uses `crypto.getRandomValues`
directly). The worker has its own random source for ops that derive
inside it (nonces are derived deterministically from `(laneId, seq)`).
For the full picture, see `THREAT-MODEL.md`.
---
## Verifying main-thread budget
V3.8 acceptance: 100 MB upload in Chrome without main thread blocked
> 16 ms in P99.
To verify in your app:
1. Open Chrome DevTools → Performance.
2. Record a 100 MB upload.
3. Inspect the main-thread flame chart. Look at "Long Tasks" and
"Self time" of `Shade.encryptStream`.
4. Confirm no contiguous block exceeds ~16 ms (one frame at 60 fps).
If you observe long tasks, lower `chunkSize` (more frequent yields) or
report the trace — see [`docs/archive/V3.8.md`](./archive/V3.8.md) for
the original acceptance criteria.

302
docs/webrtc.md Normal file
View File

@@ -0,0 +1,302 @@
# Shade Transport — WebRTC P2P Layer (V3.11)
`@shade/transport-webrtc` adds a direct peer-to-peer chunk transport on
top of the existing `@shade/transfer` engine. When two clients can reach
each other through NAT/firewall, large transfers (`@shade/files`,
`@shade/transfer`) flow over a single bidirectional `RTCDataChannel`
instead of paying the round-trip cost of HTTP-relayed POSTs. When NAT
traversal fails, the multi-transport fallback automatically demotes the
chain back to HTTP — without losing any chunks already in flight.
The wire payload is unchanged: every chunk is still a Shade ratchet /
streams envelope (AES-256-GCM under HKDF-derived per-lane keys). DTLS-
SRTP is only the WebRTC transport secret; turning a TURN-relay on does
not give the relay operator access to plaintext.
```
┌───────────────────────────────────────────────────────────────┐
│ application code │
│ │
│ shade.upload({ to: 'bob', input: file }) │
└────────────────────────────────┬──────────────────────────────┘
┌─────────▼──────────┐
│ TransferEngine │
└─────────┬──────────┘
│ ITransferTransport
┌─────────▼──────────┐
│ MultiTransport │
│ Fallback (sticky) │
└────┬─────┬─────┬───┘
│ │ │
┌─────────────▼┐ ┌─▼─┐ ┌▼────────────┐
│ WebRtcTransfer│ │WS │ │ ShadeTransfer│
│ Transport │ │… │ │ HttpTransport│
└─────┬─────────┘ └───┘ └──────────────┘
│ DataChannel binary frames
┌─────▼─────────┐
│ WebRtcConn │ ←──── SDP/ICE over Shade.send
│ Manager │ (ratchet-encrypted)
└───────────────┘
```
## When to reach for it
| Scenario | Default (HTTP) | + WebRTC |
|---------------------------------------|----------------|----------------|
| Two clients on the same LAN | server-relayed | direct, P2P |
| One peer behind enterprise NAT only | works | TURN-relay |
| Both peers behind symmetric NAT | works | falls back to HTTP |
| One peer offline | inbox-buffered | inbox-buffered (HTTP path) |
| Browser extension with strict CSP | works | works (uses RTCPeerConnection) |
Use cases:
- `@shade/transfer` upload of multi-MB / multi-GB files
- `@shade/files` `read`/`write` of large inline blobs
- Future: `@shade/streams` real-time channels (V5.0 reuses this same DataChannel)
## Quick start (browser)
```ts
import { createShade } from '@shade/sdk';
import { nativeRtcFactory } from '@shade/transport-webrtc';
const shade = await createShade({ prekeyServer: 'https://prekey.example.com' });
// IMPORTANT: configureWebRTC MUST be called BEFORE the first upload() /
// onIncomingTransfer() / transferRoute() call, because those build the
// transfer engine — and the engine captures its transport stack at
// construction time.
shade.configureWebRTC({
factory: nativeRtcFactory(),
// Optional — defaults to two public Google STUN servers.
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{
urls: 'turn:turn.example.com:3478',
username: 'shade',
credential: 'YOUR_TURN_SECRET',
},
],
});
shade.configureTransfers({
resolveBaseUrl: async (peer) => directory.lookup(peer),
});
await shade.upload({ to: 'bob', input: file }); // → P2P when NAT allows
```
## Quick start (Bun / Node)
Bun does not yet expose `RTCPeerConnection` natively. Use one of:
- [`node-datachannel`](https://github.com/murat-dogan/node-datachannel)
— small, stable, libdatachannel under the hood
- [`@roamhq/wrtc`](https://www.npmjs.com/package/@roamhq/wrtc) — fork of
the Google `wrtc` bindings
Wrap the chosen library behind an `IRtcFactory` (the package only depends
on a narrow surface — `createPeerConnection`, `createDataChannel`,
`addEventListener`):
```ts
import { IRtcFactory, IPeerConnection, IDataChannel } from '@shade/transport-webrtc';
// pseudo-adapter for node-datachannel
class NodeDataChannelFactory implements IRtcFactory {
createPeerConnection(config) { /* ... return adapter wrapping nodeDc PeerConnection */ }
}
shade.configureWebRTC({ factory: new NodeDataChannelFactory(), iceServers });
```
## Connection flow
```
Alice initiates Bob receives
─────────────── ────────────
1. createOffer() → SDP 2. shade.send delivers offer
→ Bob.createAnswer()
3. shade.send delivers answer 4. setRemoteDescription(answer)
5. trickle ICE candidates (both directions) 6. trickle ICE candidates
7. DataChannel onopen (both sides) 7. DataChannel onopen
```
All four signaling kinds (`shade.webrtc-offer/v1`, `shade.webrtc-answer/v1`,
`shade.webrtc-ice/v1`, `shade.webrtc-bye/v1`) ride the existing Shade
ratchet — the relay sees only ciphertext envelopes.
### Glare resolution
If both peers call `getOrCreate()` simultaneously, the manager uses
lexicographic tiebreak: the side with the smaller address wins
caller-role; the side with the larger address closes its outgoing
connection and accepts the inbound offer instead. Both peers ultimately
converge on a single `WebRtcConnection`.
## Backpressure
The `WebRtcTransferTransport` polls `RTCDataChannel.bufferedAmount` and
suspends new sends once the buffer crosses `backpressureThresholdBytes`
(default 4 MiB). This avoids SCTP queue runaway when the application
pushes faster than the network can drain. Tune lower for memory-
constrained clients (mobile / extension contexts).
## Auto-fallback
Configuring WebRTC wires `MultiTransportFallback([webrtc, http])` as the
engine's transport. The chain is sticky-after-first-failure: when WebRTC
raises a `TransferTransportError` (timeout, ICE failed, data channel
closed, frame too large), the fallback advances to HTTP and stays there
for the lifetime of the engine.
For three-tier composition (e.g. WebRTC → WebSocket → HTTP), build the
fallback yourself and pass a custom transport via the engine deps:
```ts
import { MultiTransportFallback } from '@shade/sdk';
const stack = new MultiTransportFallback([
{ name: 'webrtc', transport: rtcTransport },
{ name: 'ws', transport: wsTransport },
{ name: 'http', transport: httpTransport },
]);
stack.onSwitch((from, to) => metrics.observe('shade.transport.demoted', { from, to }));
```
The `WebRtcConnectionManager`'s connect timeout (default 30 s) is the
upper bound on how long the chain dwells on WebRTC before demoting. The
V3.11 acceptance criterion is "P2P-død → HTTP innen 5 s" — set
`connectTimeoutMs: 4_000` in your `configureWebRTC()` call to keep the
upper bound at 4 seconds and meet the SLO with margin.
## ICE server config
| Setting | Default | When to override |
|------------------------|-----------------------------------|------------------|
| `iceServers` | Google public STUN (×2) | Production — pin your own STUN to avoid Google rate limits, plus your TURN credentials |
| `iceTransportPolicy` | `'all'` (host + reflexive + relay)| `'relay'` to mandate TURN-only routing (e.g. inside a corporate network where direct connectivity must never leak) |
| `bundlePolicy` | spec default (`'balanced'`) | rarely |
Public STUN works for ~80% of consumer NATs. The remaining 20% (symmetric
NAT, paranoid corporate proxies, mobile carrier-grade NAT) need TURN.
Run your own [coturn](https://github.com/coturn/coturn) or use a managed
provider — but **TURN traffic is real bandwidth through your server**, so
budget accordingly. Shade's wire format is at least as efficient over
TURN as over HTTPS (no per-request HTTP framing overhead).
## NAT-traversal: hopes and realities
What works without TURN, in our testing:
- Same NAT (LAN): always
- Two clients behind cone NATs: usually
- One client behind symmetric NAT, the other behind any cone NAT: usually
- Two clients behind symmetric NATs: rarely — falls back to TURN
What doesn't work:
- Two clients behind strict carrier-grade NAT (CGNAT): TURN required
- Clients on networks that block UDP entirely: TURN over TCP/443 required
When in doubt, configure TURN over TCP/443 — it impersonates HTTPS and
gets through nearly every middlebox.
## Diagnostics
The SDK exposes the live runtime via `shade.getWebRtcRuntime()`:
```ts
const runtime = shade.getWebRtcRuntime();
if (runtime !== null) {
console.log('active transport:', runtime.fallback.activeName);
console.log('peers:', [...runtime.manager.byPeer ?? []]);
runtime.fallback.onSwitch((from, to) => {
console.warn(`shade transport demoted ${from}${to}`);
});
}
```
The `failures` array on `MultiTransportFallback` records every
demotion's reason — wire it to your observability backend to track
NAT/TURN problems in production.
## Sample code
End-to-end test using `MemoryRtcFactory` (no real network):
```ts
import { MemoryRtcFactory } from '@shade/transport-webrtc';
const factory = new MemoryRtcFactory();
alice.configureWebRTC({ factory });
bob.configureWebRTC({ factory });
await alice.upload({ to: 'bob', input: bytes }); // → P2P loopback
```
See `packages/shade-sdk/tests/webrtc-integration.test.ts` for the full
loopback test, `webrtc-failover.test.ts` for the auto-fallback test, and
`packages/shade-transport-webrtc/tests/` for the unit tests covering
wire format, signaling, glare, and TURN-only configuration.
## Wire format inside the DataChannel
The DataChannel is a single bidirectional pipe shared by every in-flight
stream between two peers. Each frame is a self-describing binary blob:
```
client → server server → client
─────────────── ───────────────
0x01 chunk reqId(16) sid(16) lane(u32) seq(u64) env(...) 0x81 chunk-ack reqId(16) lastSeq(u32) bytesRecv(u32)
0x02 resume-query reqId(16) sid(16) 0x82 resume-state reqId(16) jsonBody(utf-8)
0x03 ping reqId(16) nonce(u64) 0x83 pong reqId(16) nonce(u64)
0xFE error reqId(16) jsonBody(utf-8)
```
`reqId` is a 16-byte random correlation token; the responder echoes it
verbatim so multiple in-flight requests can be matched without a stream
multiplexer on top of SCTP.
The wire matches `ShadeTransferWsTransport` exactly — adapters for
either transport can interoperate by translating between SCTP message-
framing and WS binary frames at the byte level.
## Limits
- Max DataChannel message: **256 KiB** (Chrome's safe ceiling). Configure
`chunkSize` ≤ 256 KiB on uploads that prefer WebRTC. The transport
raises a clear error when an envelope exceeds the cap; the engine then
retries via HTTP.
- One DataChannel per peer pair (label `shade-transfer/v1`). Multiple
in-flight transfers from the same peer pair multiplex via `reqId`.
- No SFU/MCU — group transfers fan out at the application layer.
- DTLS-fingerprint binding to Shade's identity-fingerprint is **not** in
V3.11 (deferred as hardening work — DataChannel is already inside a
ratchet-authenticated session, so the practical exposure window is
limited to in-process MITM scenarios that already require malware).
## Migration
Opt-in. If you don't call `configureWebRTC`, your existing HTTP/WS
transport stack is unchanged.
When you do opt in, the **engine must not be built yet** — the easy way
to ensure this is to call `configureWebRTC` before `configureTransfers`
or before any of `upload` / `onIncomingTransfer` / `transferRoute`.
Receiver-side: the WebRTC manager wires receiver-hooks into the engine
during `engine()` construction, so make sure both sides do `configureWebRTC`
+ `configureTransfers` before the first `transferRoute()` call.
## Related modules
- [`@shade/transfer`](../packages/shade-transfer/) — engine, lane queues,
HTTP transport, multi-fallback wrapper.
- [`@shade/streams`](./streams.md) — chunk encryption + lane key
derivation. Indirect dep.
- [`@shade/transport-bridge`](./transport.md) — V3.7 bridge layer (WS /
SSE / long-poll for control envelopes). Orthogonal to V3.11.
- [V5.0 — real-time channels](./V5.0.md) — downstream consumer of the
same DataChannel for voice/video/broadcast.