release(v4.0.0): Shade GA — V3.x consolidation + audit prep
Some checks failed
Test / test (push) Has been cancelled
Cross-platform vectors / TypeScript vectors (bun) (push) Has been cancelled
Cross-platform vectors / Kotlin vectors (gradle) (push) Has been cancelled
Docker build and publish / docker (push) Has been cancelled
Publish / publish (push) Has been cancelled

V3.1 → V3.12 consolidated and tagged for the first GA release. Wire
format unchanged from 0.4.x — 4.0 peers interoperate with 0.4.x peers
byte-for-byte. The version bump is semantic: audit-cycle complete,
opt-in surface fully exposed, threat model refreshed for every new
surface.

Highlights:
- All 24 @shade/* packages bumped to 4.0.0 in lockstep.
- CHANGELOG 4.0.0 section is the canonical manifest of what landed.
- THREAT-MODEL extended (§10 fingerprint gates, §11 WebRTC P2P, §12
  Web-Worker boundary) + residual-risks table refreshed.
- OpenAPI now covers all 27 routes: prekey, transfer, KT, inbox,
  bridge, observer, /metrics, /healthz, /ready.
- MIGRATION 0.3.x → 4.0 documented + smoke-tested against
  shade migrate-storage on a real SQLite DB.
- docs/audit/REVIEW-BUNDLE.md + SCOPE.md ready for external reviewer.
- scripts/soak.ts harness for the GA-stable 2-week soak window.
- All V*.md plans archived under docs/archive/ with Status: Done.
- Voice/Video carved out into V5.0; 4.0 audit focuses on the frozen
  non-realtime stack.

Tests: TS 1000/1000 + Kotlin 11/11 cross-platform vectors green.
Docker: gt.zyon.no/stian/shade-prekey:4.0.0 builds and reports
  version 4.0.0 on /health.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-03 18:35:35 +02:00
parent 8b055912b7
commit e6fdf31b49
298 changed files with 37909 additions and 256 deletions

302
docs/webrtc.md Normal file
View File

@@ -0,0 +1,302 @@
# Shade Transport — WebRTC P2P Layer (V3.11)
`@shade/transport-webrtc` adds a direct peer-to-peer chunk transport on
top of the existing `@shade/transfer` engine. When two clients can reach
each other through NAT/firewall, large transfers (`@shade/files`,
`@shade/transfer`) flow over a single bidirectional `RTCDataChannel`
instead of paying the round-trip cost of HTTP-relayed POSTs. When NAT
traversal fails, the multi-transport fallback automatically demotes the
chain back to HTTP — without losing any chunks already in flight.
The wire payload is unchanged: every chunk is still a Shade ratchet /
streams envelope (AES-256-GCM under HKDF-derived per-lane keys). DTLS-
SRTP is only the WebRTC transport secret; turning a TURN-relay on does
not give the relay operator access to plaintext.
```
┌───────────────────────────────────────────────────────────────┐
│ application code │
│ │
│ shade.upload({ to: 'bob', input: file }) │
└────────────────────────────────┬──────────────────────────────┘
┌─────────▼──────────┐
│ TransferEngine │
└─────────┬──────────┘
│ ITransferTransport
┌─────────▼──────────┐
│ MultiTransport │
│ Fallback (sticky) │
└────┬─────┬─────┬───┘
│ │ │
┌─────────────▼┐ ┌─▼─┐ ┌▼────────────┐
│ WebRtcTransfer│ │WS │ │ ShadeTransfer│
│ Transport │ │… │ │ HttpTransport│
└─────┬─────────┘ └───┘ └──────────────┘
│ DataChannel binary frames
┌─────▼─────────┐
│ WebRtcConn │ ←──── SDP/ICE over Shade.send
│ Manager │ (ratchet-encrypted)
└───────────────┘
```
## When to reach for it
| Scenario | Default (HTTP) | + WebRTC |
|---------------------------------------|----------------|----------------|
| Two clients on the same LAN | server-relayed | direct, P2P |
| One peer behind enterprise NAT only | works | TURN-relay |
| Both peers behind symmetric NAT | works | falls back to HTTP |
| One peer offline | inbox-buffered | inbox-buffered (HTTP path) |
| Browser extension with strict CSP | works | works (uses RTCPeerConnection) |
Use cases:
- `@shade/transfer` upload of multi-MB / multi-GB files
- `@shade/files` `read`/`write` of large inline blobs
- Future: `@shade/streams` real-time channels (V5.0 reuses this same DataChannel)
## Quick start (browser)
```ts
import { createShade } from '@shade/sdk';
import { nativeRtcFactory } from '@shade/transport-webrtc';
const shade = await createShade({ prekeyServer: 'https://prekey.example.com' });
// IMPORTANT: configureWebRTC MUST be called BEFORE the first upload() /
// onIncomingTransfer() / transferRoute() call, because those build the
// transfer engine — and the engine captures its transport stack at
// construction time.
shade.configureWebRTC({
factory: nativeRtcFactory(),
// Optional — defaults to two public Google STUN servers.
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{
urls: 'turn:turn.example.com:3478',
username: 'shade',
credential: 'YOUR_TURN_SECRET',
},
],
});
shade.configureTransfers({
resolveBaseUrl: async (peer) => directory.lookup(peer),
});
await shade.upload({ to: 'bob', input: file }); // → P2P when NAT allows
```
## Quick start (Bun / Node)
Bun does not yet expose `RTCPeerConnection` natively. Use one of:
- [`node-datachannel`](https://github.com/murat-dogan/node-datachannel)
— small, stable, libdatachannel under the hood
- [`@roamhq/wrtc`](https://www.npmjs.com/package/@roamhq/wrtc) — fork of
the Google `wrtc` bindings
Wrap the chosen library behind an `IRtcFactory` (the package only depends
on a narrow surface — `createPeerConnection`, `createDataChannel`,
`addEventListener`):
```ts
import { IRtcFactory, IPeerConnection, IDataChannel } from '@shade/transport-webrtc';
// pseudo-adapter for node-datachannel
class NodeDataChannelFactory implements IRtcFactory {
createPeerConnection(config) { /* ... return adapter wrapping nodeDc PeerConnection */ }
}
shade.configureWebRTC({ factory: new NodeDataChannelFactory(), iceServers });
```
## Connection flow
```
Alice initiates Bob receives
─────────────── ────────────
1. createOffer() → SDP 2. shade.send delivers offer
→ Bob.createAnswer()
3. shade.send delivers answer 4. setRemoteDescription(answer)
5. trickle ICE candidates (both directions) 6. trickle ICE candidates
7. DataChannel onopen (both sides) 7. DataChannel onopen
```
All four signaling kinds (`shade.webrtc-offer/v1`, `shade.webrtc-answer/v1`,
`shade.webrtc-ice/v1`, `shade.webrtc-bye/v1`) ride the existing Shade
ratchet — the relay sees only ciphertext envelopes.
### Glare resolution
If both peers call `getOrCreate()` simultaneously, the manager uses
lexicographic tiebreak: the side with the smaller address wins
caller-role; the side with the larger address closes its outgoing
connection and accepts the inbound offer instead. Both peers ultimately
converge on a single `WebRtcConnection`.
## Backpressure
The `WebRtcTransferTransport` polls `RTCDataChannel.bufferedAmount` and
suspends new sends once the buffer crosses `backpressureThresholdBytes`
(default 4 MiB). This avoids SCTP queue runaway when the application
pushes faster than the network can drain. Tune lower for memory-
constrained clients (mobile / extension contexts).
## Auto-fallback
Configuring WebRTC wires `MultiTransportFallback([webrtc, http])` as the
engine's transport. The chain is sticky-after-first-failure: when WebRTC
raises a `TransferTransportError` (timeout, ICE failed, data channel
closed, frame too large), the fallback advances to HTTP and stays there
for the lifetime of the engine.
For three-tier composition (e.g. WebRTC → WebSocket → HTTP), build the
fallback yourself and pass a custom transport via the engine deps:
```ts
import { MultiTransportFallback } from '@shade/sdk';
const stack = new MultiTransportFallback([
{ name: 'webrtc', transport: rtcTransport },
{ name: 'ws', transport: wsTransport },
{ name: 'http', transport: httpTransport },
]);
stack.onSwitch((from, to) => metrics.observe('shade.transport.demoted', { from, to }));
```
The `WebRtcConnectionManager`'s connect timeout (default 30 s) is the
upper bound on how long the chain dwells on WebRTC before demoting. The
V3.11 acceptance criterion is "P2P-død → HTTP innen 5 s" — set
`connectTimeoutMs: 4_000` in your `configureWebRTC()` call to keep the
upper bound at 4 seconds and meet the SLO with margin.
## ICE server config
| Setting | Default | When to override |
|------------------------|-----------------------------------|------------------|
| `iceServers` | Google public STUN (×2) | Production — pin your own STUN to avoid Google rate limits, plus your TURN credentials |
| `iceTransportPolicy` | `'all'` (host + reflexive + relay)| `'relay'` to mandate TURN-only routing (e.g. inside a corporate network where direct connectivity must never leak) |
| `bundlePolicy` | spec default (`'balanced'`) | rarely |
Public STUN works for ~80% of consumer NATs. The remaining 20% (symmetric
NAT, paranoid corporate proxies, mobile carrier-grade NAT) need TURN.
Run your own [coturn](https://github.com/coturn/coturn) or use a managed
provider — but **TURN traffic is real bandwidth through your server**, so
budget accordingly. Shade's wire format is at least as efficient over
TURN as over HTTPS (no per-request HTTP framing overhead).
## NAT-traversal: hopes and realities
What works without TURN, in our testing:
- Same NAT (LAN): always
- Two clients behind cone NATs: usually
- One client behind symmetric NAT, the other behind any cone NAT: usually
- Two clients behind symmetric NATs: rarely — falls back to TURN
What doesn't work:
- Two clients behind strict carrier-grade NAT (CGNAT): TURN required
- Clients on networks that block UDP entirely: TURN over TCP/443 required
When in doubt, configure TURN over TCP/443 — it impersonates HTTPS and
gets through nearly every middlebox.
## Diagnostics
The SDK exposes the live runtime via `shade.getWebRtcRuntime()`:
```ts
const runtime = shade.getWebRtcRuntime();
if (runtime !== null) {
console.log('active transport:', runtime.fallback.activeName);
console.log('peers:', [...runtime.manager.byPeer ?? []]);
runtime.fallback.onSwitch((from, to) => {
console.warn(`shade transport demoted ${from}${to}`);
});
}
```
The `failures` array on `MultiTransportFallback` records every
demotion's reason — wire it to your observability backend to track
NAT/TURN problems in production.
## Sample code
End-to-end test using `MemoryRtcFactory` (no real network):
```ts
import { MemoryRtcFactory } from '@shade/transport-webrtc';
const factory = new MemoryRtcFactory();
alice.configureWebRTC({ factory });
bob.configureWebRTC({ factory });
await alice.upload({ to: 'bob', input: bytes }); // → P2P loopback
```
See `packages/shade-sdk/tests/webrtc-integration.test.ts` for the full
loopback test, `webrtc-failover.test.ts` for the auto-fallback test, and
`packages/shade-transport-webrtc/tests/` for the unit tests covering
wire format, signaling, glare, and TURN-only configuration.
## Wire format inside the DataChannel
The DataChannel is a single bidirectional pipe shared by every in-flight
stream between two peers. Each frame is a self-describing binary blob:
```
client → server server → client
─────────────── ───────────────
0x01 chunk reqId(16) sid(16) lane(u32) seq(u64) env(...) 0x81 chunk-ack reqId(16) lastSeq(u32) bytesRecv(u32)
0x02 resume-query reqId(16) sid(16) 0x82 resume-state reqId(16) jsonBody(utf-8)
0x03 ping reqId(16) nonce(u64) 0x83 pong reqId(16) nonce(u64)
0xFE error reqId(16) jsonBody(utf-8)
```
`reqId` is a 16-byte random correlation token; the responder echoes it
verbatim so multiple in-flight requests can be matched without a stream
multiplexer on top of SCTP.
The wire matches `ShadeTransferWsTransport` exactly — adapters for
either transport can interoperate by translating between SCTP message-
framing and WS binary frames at the byte level.
## Limits
- Max DataChannel message: **256 KiB** (Chrome's safe ceiling). Configure
`chunkSize` ≤ 256 KiB on uploads that prefer WebRTC. The transport
raises a clear error when an envelope exceeds the cap; the engine then
retries via HTTP.
- One DataChannel per peer pair (label `shade-transfer/v1`). Multiple
in-flight transfers from the same peer pair multiplex via `reqId`.
- No SFU/MCU — group transfers fan out at the application layer.
- DTLS-fingerprint binding to Shade's identity-fingerprint is **not** in
V3.11 (deferred as hardening work — DataChannel is already inside a
ratchet-authenticated session, so the practical exposure window is
limited to in-process MITM scenarios that already require malware).
## Migration
Opt-in. If you don't call `configureWebRTC`, your existing HTTP/WS
transport stack is unchanged.
When you do opt in, the **engine must not be built yet** — the easy way
to ensure this is to call `configureWebRTC` before `configureTransfers`
or before any of `upload` / `onIncomingTransfer` / `transferRoute`.
Receiver-side: the WebRTC manager wires receiver-hooks into the engine
during `engine()` construction, so make sure both sides do `configureWebRTC`
+ `configureTransfers` before the first `transferRoute()` call.
## Related modules
- [`@shade/transfer`](../packages/shade-transfer/) — engine, lane queues,
HTTP transport, multi-fallback wrapper.
- [`@shade/streams`](./streams.md) — chunk encryption + lane key
derivation. Indirect dep.
- [`@shade/transport-bridge`](./transport.md) — V3.7 bridge layer (WS /
SSE / long-poll for control envelopes). Orthogonal to V3.11.
- [V5.0 — real-time channels](./V5.0.md) — downstream consumer of the
same DataChannel for voice/video/broadcast.