Files
Shade/docs/webrtc.md
Sterister e6fdf31b49
Some checks failed
Test / test (push) Has been cancelled
Cross-platform vectors / TypeScript vectors (bun) (push) Has been cancelled
Cross-platform vectors / Kotlin vectors (gradle) (push) Has been cancelled
Docker build and publish / docker (push) Has been cancelled
Publish / publish (push) Has been cancelled
release(v4.0.0): Shade GA — V3.x consolidation + audit prep
V3.1 → V3.12 consolidated and tagged for the first GA release. Wire
format unchanged from 0.4.x — 4.0 peers interoperate with 0.4.x peers
byte-for-byte. The version bump is semantic: audit-cycle complete,
opt-in surface fully exposed, threat model refreshed for every new
surface.

Highlights:
- All 24 @shade/* packages bumped to 4.0.0 in lockstep.
- CHANGELOG 4.0.0 section is the canonical manifest of what landed.
- THREAT-MODEL extended (§10 fingerprint gates, §11 WebRTC P2P, §12
  Web-Worker boundary) + residual-risks table refreshed.
- OpenAPI now covers all 27 routes: prekey, transfer, KT, inbox,
  bridge, observer, /metrics, /healthz, /ready.
- MIGRATION 0.3.x → 4.0 documented + smoke-tested against
  shade migrate-storage on a real SQLite DB.
- docs/audit/REVIEW-BUNDLE.md + SCOPE.md ready for external reviewer.
- scripts/soak.ts harness for the GA-stable 2-week soak window.
- All V*.md plans archived under docs/archive/ with Status: Done.
- Voice/Video carved out into V5.0; 4.0 audit focuses on the frozen
  non-realtime stack.

Tests: TS 1000/1000 + Kotlin 11/11 cross-platform vectors green.
Docker: gt.zyon.no/stian/shade-prekey:4.0.0 builds and reports
  version 4.0.0 on /health.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:35:35 +02:00

303 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Shade Transport — WebRTC P2P Layer (V3.11)
`@shade/transport-webrtc` adds a direct peer-to-peer chunk transport on
top of the existing `@shade/transfer` engine. When two clients can reach
each other through NAT/firewall, large transfers (`@shade/files`,
`@shade/transfer`) flow over a single bidirectional `RTCDataChannel`
instead of paying the round-trip cost of HTTP-relayed POSTs. When NAT
traversal fails, the multi-transport fallback automatically demotes the
chain back to HTTP — without losing any chunks already in flight.
The wire payload is unchanged: every chunk is still a Shade ratchet /
streams envelope (AES-256-GCM under HKDF-derived per-lane keys). DTLS-
SRTP is only the WebRTC transport secret; turning a TURN-relay on does
not give the relay operator access to plaintext.
```
┌───────────────────────────────────────────────────────────────┐
│ application code │
│ │
│ shade.upload({ to: 'bob', input: file }) │
└────────────────────────────────┬──────────────────────────────┘
┌─────────▼──────────┐
│ TransferEngine │
└─────────┬──────────┘
│ ITransferTransport
┌─────────▼──────────┐
│ MultiTransport │
│ Fallback (sticky) │
└────┬─────┬─────┬───┘
│ │ │
┌─────────────▼┐ ┌─▼─┐ ┌▼────────────┐
│ WebRtcTransfer│ │WS │ │ ShadeTransfer│
│ Transport │ │… │ │ HttpTransport│
└─────┬─────────┘ └───┘ └──────────────┘
│ DataChannel binary frames
┌─────▼─────────┐
│ WebRtcConn │ ←──── SDP/ICE over Shade.send
│ Manager │ (ratchet-encrypted)
└───────────────┘
```
## When to reach for it
| Scenario | Default (HTTP) | + WebRTC |
|---------------------------------------|----------------|----------------|
| Two clients on the same LAN | server-relayed | direct, P2P |
| One peer behind enterprise NAT only | works | TURN-relay |
| Both peers behind symmetric NAT | works | falls back to HTTP |
| One peer offline | inbox-buffered | inbox-buffered (HTTP path) |
| Browser extension with strict CSP | works | works (uses RTCPeerConnection) |
Use cases:
- `@shade/transfer` upload of multi-MB / multi-GB files
- `@shade/files` `read`/`write` of large inline blobs
- Future: `@shade/streams` real-time channels (V5.0 reuses this same DataChannel)
## Quick start (browser)
```ts
import { createShade } from '@shade/sdk';
import { nativeRtcFactory } from '@shade/transport-webrtc';
const shade = await createShade({ prekeyServer: 'https://prekey.example.com' });
// IMPORTANT: configureWebRTC MUST be called BEFORE the first upload() /
// onIncomingTransfer() / transferRoute() call, because those build the
// transfer engine — and the engine captures its transport stack at
// construction time.
shade.configureWebRTC({
factory: nativeRtcFactory(),
// Optional — defaults to two public Google STUN servers.
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{
urls: 'turn:turn.example.com:3478',
username: 'shade',
credential: 'YOUR_TURN_SECRET',
},
],
});
shade.configureTransfers({
resolveBaseUrl: async (peer) => directory.lookup(peer),
});
await shade.upload({ to: 'bob', input: file }); // → P2P when NAT allows
```
## Quick start (Bun / Node)
Bun does not yet expose `RTCPeerConnection` natively. Use one of:
- [`node-datachannel`](https://github.com/murat-dogan/node-datachannel)
— small, stable, libdatachannel under the hood
- [`@roamhq/wrtc`](https://www.npmjs.com/package/@roamhq/wrtc) — fork of
the Google `wrtc` bindings
Wrap the chosen library behind an `IRtcFactory` (the package only depends
on a narrow surface — `createPeerConnection`, `createDataChannel`,
`addEventListener`):
```ts
import { IRtcFactory, IPeerConnection, IDataChannel } from '@shade/transport-webrtc';
// pseudo-adapter for node-datachannel
class NodeDataChannelFactory implements IRtcFactory {
createPeerConnection(config) { /* ... return adapter wrapping nodeDc PeerConnection */ }
}
shade.configureWebRTC({ factory: new NodeDataChannelFactory(), iceServers });
```
## Connection flow
```
Alice initiates Bob receives
─────────────── ────────────
1. createOffer() → SDP 2. shade.send delivers offer
→ Bob.createAnswer()
3. shade.send delivers answer 4. setRemoteDescription(answer)
5. trickle ICE candidates (both directions) 6. trickle ICE candidates
7. DataChannel onopen (both sides) 7. DataChannel onopen
```
All four signaling kinds (`shade.webrtc-offer/v1`, `shade.webrtc-answer/v1`,
`shade.webrtc-ice/v1`, `shade.webrtc-bye/v1`) ride the existing Shade
ratchet — the relay sees only ciphertext envelopes.
### Glare resolution
If both peers call `getOrCreate()` simultaneously, the manager uses
lexicographic tiebreak: the side with the smaller address wins
caller-role; the side with the larger address closes its outgoing
connection and accepts the inbound offer instead. Both peers ultimately
converge on a single `WebRtcConnection`.
## Backpressure
The `WebRtcTransferTransport` polls `RTCDataChannel.bufferedAmount` and
suspends new sends once the buffer crosses `backpressureThresholdBytes`
(default 4 MiB). This avoids SCTP queue runaway when the application
pushes faster than the network can drain. Tune lower for memory-
constrained clients (mobile / extension contexts).
## Auto-fallback
Configuring WebRTC wires `MultiTransportFallback([webrtc, http])` as the
engine's transport. The chain is sticky-after-first-failure: when WebRTC
raises a `TransferTransportError` (timeout, ICE failed, data channel
closed, frame too large), the fallback advances to HTTP and stays there
for the lifetime of the engine.
For three-tier composition (e.g. WebRTC → WebSocket → HTTP), build the
fallback yourself and pass a custom transport via the engine deps:
```ts
import { MultiTransportFallback } from '@shade/sdk';
const stack = new MultiTransportFallback([
{ name: 'webrtc', transport: rtcTransport },
{ name: 'ws', transport: wsTransport },
{ name: 'http', transport: httpTransport },
]);
stack.onSwitch((from, to) => metrics.observe('shade.transport.demoted', { from, to }));
```
The `WebRtcConnectionManager`'s connect timeout (default 30 s) is the
upper bound on how long the chain dwells on WebRTC before demoting. The
V3.11 acceptance criterion is "P2P-død → HTTP innen 5 s" — set
`connectTimeoutMs: 4_000` in your `configureWebRTC()` call to keep the
upper bound at 4 seconds and meet the SLO with margin.
## ICE server config
| Setting | Default | When to override |
|------------------------|-----------------------------------|------------------|
| `iceServers` | Google public STUN (×2) | Production — pin your own STUN to avoid Google rate limits, plus your TURN credentials |
| `iceTransportPolicy` | `'all'` (host + reflexive + relay)| `'relay'` to mandate TURN-only routing (e.g. inside a corporate network where direct connectivity must never leak) |
| `bundlePolicy` | spec default (`'balanced'`) | rarely |
Public STUN works for ~80% of consumer NATs. The remaining 20% (symmetric
NAT, paranoid corporate proxies, mobile carrier-grade NAT) need TURN.
Run your own [coturn](https://github.com/coturn/coturn) or use a managed
provider — but **TURN traffic is real bandwidth through your server**, so
budget accordingly. Shade's wire format is at least as efficient over
TURN as over HTTPS (no per-request HTTP framing overhead).
## NAT-traversal: hopes and realities
What works without TURN, in our testing:
- Same NAT (LAN): always
- Two clients behind cone NATs: usually
- One client behind symmetric NAT, the other behind any cone NAT: usually
- Two clients behind symmetric NATs: rarely — falls back to TURN
What doesn't work:
- Two clients behind strict carrier-grade NAT (CGNAT): TURN required
- Clients on networks that block UDP entirely: TURN over TCP/443 required
When in doubt, configure TURN over TCP/443 — it impersonates HTTPS and
gets through nearly every middlebox.
## Diagnostics
The SDK exposes the live runtime via `shade.getWebRtcRuntime()`:
```ts
const runtime = shade.getWebRtcRuntime();
if (runtime !== null) {
console.log('active transport:', runtime.fallback.activeName);
console.log('peers:', [...runtime.manager.byPeer ?? []]);
runtime.fallback.onSwitch((from, to) => {
console.warn(`shade transport demoted ${from}${to}`);
});
}
```
The `failures` array on `MultiTransportFallback` records every
demotion's reason — wire it to your observability backend to track
NAT/TURN problems in production.
## Sample code
End-to-end test using `MemoryRtcFactory` (no real network):
```ts
import { MemoryRtcFactory } from '@shade/transport-webrtc';
const factory = new MemoryRtcFactory();
alice.configureWebRTC({ factory });
bob.configureWebRTC({ factory });
await alice.upload({ to: 'bob', input: bytes }); // → P2P loopback
```
See `packages/shade-sdk/tests/webrtc-integration.test.ts` for the full
loopback test, `webrtc-failover.test.ts` for the auto-fallback test, and
`packages/shade-transport-webrtc/tests/` for the unit tests covering
wire format, signaling, glare, and TURN-only configuration.
## Wire format inside the DataChannel
The DataChannel is a single bidirectional pipe shared by every in-flight
stream between two peers. Each frame is a self-describing binary blob:
```
client → server server → client
─────────────── ───────────────
0x01 chunk reqId(16) sid(16) lane(u32) seq(u64) env(...) 0x81 chunk-ack reqId(16) lastSeq(u32) bytesRecv(u32)
0x02 resume-query reqId(16) sid(16) 0x82 resume-state reqId(16) jsonBody(utf-8)
0x03 ping reqId(16) nonce(u64) 0x83 pong reqId(16) nonce(u64)
0xFE error reqId(16) jsonBody(utf-8)
```
`reqId` is a 16-byte random correlation token; the responder echoes it
verbatim so multiple in-flight requests can be matched without a stream
multiplexer on top of SCTP.
The wire matches `ShadeTransferWsTransport` exactly — adapters for
either transport can interoperate by translating between SCTP message-
framing and WS binary frames at the byte level.
## Limits
- Max DataChannel message: **256 KiB** (Chrome's safe ceiling). Configure
`chunkSize` ≤ 256 KiB on uploads that prefer WebRTC. The transport
raises a clear error when an envelope exceeds the cap; the engine then
retries via HTTP.
- One DataChannel per peer pair (label `shade-transfer/v1`). Multiple
in-flight transfers from the same peer pair multiplex via `reqId`.
- No SFU/MCU — group transfers fan out at the application layer.
- DTLS-fingerprint binding to Shade's identity-fingerprint is **not** in
V3.11 (deferred as hardening work — DataChannel is already inside a
ratchet-authenticated session, so the practical exposure window is
limited to in-process MITM scenarios that already require malware).
## Migration
Opt-in. If you don't call `configureWebRTC`, your existing HTTP/WS
transport stack is unchanged.
When you do opt in, the **engine must not be built yet** — the easy way
to ensure this is to call `configureWebRTC` before `configureTransfers`
or before any of `upload` / `onIncomingTransfer` / `transferRoute`.
Receiver-side: the WebRTC manager wires receiver-hooks into the engine
during `engine()` construction, so make sure both sides do `configureWebRTC`
+ `configureTransfers` before the first `transferRoute()` call.
## Related modules
- [`@shade/transfer`](../packages/shade-transfer/) — engine, lane queues,
HTTP transport, multi-fallback wrapper.
- [`@shade/streams`](./streams.md) — chunk encryption + lane key
derivation. Indirect dep.
- [`@shade/transport-bridge`](./transport.md) — V3.7 bridge layer (WS /
SSE / long-poll for control envelopes). Orthogonal to V3.11.
- [V5.0 — real-time channels](./V5.0.md) — downstream consumer of the
same DataChannel for voice/video/broadcast.