303 lines
13 KiB
Markdown
303 lines
13 KiB
Markdown
|
|
# Shade Transport — WebRTC P2P Layer (V3.11)
|
|||
|
|
|
|||
|
|
`@shade/transport-webrtc` adds a direct peer-to-peer chunk transport on
|
|||
|
|
top of the existing `@shade/transfer` engine. When two clients can reach
|
|||
|
|
each other through NAT/firewall, large transfers (`@shade/files`,
|
|||
|
|
`@shade/transfer`) flow over a single bidirectional `RTCDataChannel`
|
|||
|
|
instead of paying the round-trip cost of HTTP-relayed POSTs. When NAT
|
|||
|
|
traversal fails, the multi-transport fallback automatically demotes the
|
|||
|
|
chain back to HTTP — without losing any chunks already in flight.
|
|||
|
|
|
|||
|
|
The wire payload is unchanged: every chunk is still a Shade ratchet /
|
|||
|
|
streams envelope (AES-256-GCM under HKDF-derived per-lane keys). DTLS-
|
|||
|
|
SRTP is only the WebRTC transport secret; turning a TURN-relay on does
|
|||
|
|
not give the relay operator access to plaintext.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌───────────────────────────────────────────────────────────────┐
|
|||
|
|
│ application code │
|
|||
|
|
│ │
|
|||
|
|
│ shade.upload({ to: 'bob', input: file }) │
|
|||
|
|
└────────────────────────────────┬──────────────────────────────┘
|
|||
|
|
│
|
|||
|
|
┌─────────▼──────────┐
|
|||
|
|
│ TransferEngine │
|
|||
|
|
└─────────┬──────────┘
|
|||
|
|
│ ITransferTransport
|
|||
|
|
┌─────────▼──────────┐
|
|||
|
|
│ MultiTransport │
|
|||
|
|
│ Fallback (sticky) │
|
|||
|
|
└────┬─────┬─────┬───┘
|
|||
|
|
│ │ │
|
|||
|
|
┌─────────────▼┐ ┌─▼─┐ ┌▼────────────┐
|
|||
|
|
│ WebRtcTransfer│ │WS │ │ ShadeTransfer│
|
|||
|
|
│ Transport │ │… │ │ HttpTransport│
|
|||
|
|
└─────┬─────────┘ └───┘ └──────────────┘
|
|||
|
|
│ DataChannel binary frames
|
|||
|
|
┌─────▼─────────┐
|
|||
|
|
│ WebRtcConn │ ←──── SDP/ICE over Shade.send
|
|||
|
|
│ Manager │ (ratchet-encrypted)
|
|||
|
|
└───────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## When to reach for it
|
|||
|
|
|
|||
|
|
| Scenario | Default (HTTP) | + WebRTC |
|
|||
|
|
|---------------------------------------|----------------|----------------|
|
|||
|
|
| Two clients on the same LAN | server-relayed | direct, P2P |
|
|||
|
|
| One peer behind enterprise NAT only | works | TURN-relay |
|
|||
|
|
| Both peers behind symmetric NAT | works | falls back to HTTP |
|
|||
|
|
| One peer offline | inbox-buffered | inbox-buffered (HTTP path) |
|
|||
|
|
| Browser extension with strict CSP | works | works (uses RTCPeerConnection) |
|
|||
|
|
|
|||
|
|
Use cases:
|
|||
|
|
|
|||
|
|
- `@shade/transfer` upload of multi-MB / multi-GB files
|
|||
|
|
- `@shade/files` `read`/`write` of large inline blobs
|
|||
|
|
- Future: `@shade/streams` real-time channels (V5.0 reuses this same DataChannel)
|
|||
|
|
|
|||
|
|
## Quick start (browser)
|
|||
|
|
|
|||
|
|
```ts
|
|||
|
|
import { createShade } from '@shade/sdk';
|
|||
|
|
import { nativeRtcFactory } from '@shade/transport-webrtc';
|
|||
|
|
|
|||
|
|
const shade = await createShade({ prekeyServer: 'https://prekey.example.com' });
|
|||
|
|
|
|||
|
|
// IMPORTANT: configureWebRTC MUST be called BEFORE the first upload() /
|
|||
|
|
// onIncomingTransfer() / transferRoute() call, because those build the
|
|||
|
|
// transfer engine — and the engine captures its transport stack at
|
|||
|
|
// construction time.
|
|||
|
|
shade.configureWebRTC({
|
|||
|
|
factory: nativeRtcFactory(),
|
|||
|
|
// Optional — defaults to two public Google STUN servers.
|
|||
|
|
iceServers: [
|
|||
|
|
{ urls: 'stun:stun.l.google.com:19302' },
|
|||
|
|
{
|
|||
|
|
urls: 'turn:turn.example.com:3478',
|
|||
|
|
username: 'shade',
|
|||
|
|
credential: 'YOUR_TURN_SECRET',
|
|||
|
|
},
|
|||
|
|
],
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
shade.configureTransfers({
|
|||
|
|
resolveBaseUrl: async (peer) => directory.lookup(peer),
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
await shade.upload({ to: 'bob', input: file }); // → P2P when NAT allows
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Quick start (Bun / Node)
|
|||
|
|
|
|||
|
|
Bun does not yet expose `RTCPeerConnection` natively. Use one of:
|
|||
|
|
|
|||
|
|
- [`node-datachannel`](https://github.com/murat-dogan/node-datachannel)
|
|||
|
|
— small, stable, libdatachannel under the hood
|
|||
|
|
- [`@roamhq/wrtc`](https://www.npmjs.com/package/@roamhq/wrtc) — fork of
|
|||
|
|
the Google `wrtc` bindings
|
|||
|
|
|
|||
|
|
Wrap the chosen library behind an `IRtcFactory` (the package only depends
|
|||
|
|
on a narrow surface — `createPeerConnection`, `createDataChannel`,
|
|||
|
|
`addEventListener`):
|
|||
|
|
|
|||
|
|
```ts
|
|||
|
|
import { IRtcFactory, IPeerConnection, IDataChannel } from '@shade/transport-webrtc';
|
|||
|
|
// pseudo-adapter for node-datachannel
|
|||
|
|
class NodeDataChannelFactory implements IRtcFactory {
|
|||
|
|
createPeerConnection(config) { /* ... return adapter wrapping nodeDc PeerConnection */ }
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
shade.configureWebRTC({ factory: new NodeDataChannelFactory(), iceServers });
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Connection flow
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Alice initiates Bob receives
|
|||
|
|
─────────────── ────────────
|
|||
|
|
1. createOffer() → SDP 2. shade.send delivers offer
|
|||
|
|
→ Bob.createAnswer()
|
|||
|
|
3. shade.send delivers answer 4. setRemoteDescription(answer)
|
|||
|
|
5. trickle ICE candidates (both directions) 6. trickle ICE candidates
|
|||
|
|
7. DataChannel onopen (both sides) 7. DataChannel onopen
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
All four signaling kinds (`shade.webrtc-offer/v1`, `shade.webrtc-answer/v1`,
|
|||
|
|
`shade.webrtc-ice/v1`, `shade.webrtc-bye/v1`) ride the existing Shade
|
|||
|
|
ratchet — the relay sees only ciphertext envelopes.
|
|||
|
|
|
|||
|
|
### Glare resolution
|
|||
|
|
|
|||
|
|
If both peers call `getOrCreate()` simultaneously, the manager uses
|
|||
|
|
lexicographic tiebreak: the side with the smaller address wins
|
|||
|
|
caller-role; the side with the larger address closes its outgoing
|
|||
|
|
connection and accepts the inbound offer instead. Both peers ultimately
|
|||
|
|
converge on a single `WebRtcConnection`.
|
|||
|
|
|
|||
|
|
## Backpressure
|
|||
|
|
|
|||
|
|
The `WebRtcTransferTransport` polls `RTCDataChannel.bufferedAmount` and
|
|||
|
|
suspends new sends once the buffer crosses `backpressureThresholdBytes`
|
|||
|
|
(default 4 MiB). This avoids SCTP queue runaway when the application
|
|||
|
|
pushes faster than the network can drain. Tune lower for memory-
|
|||
|
|
constrained clients (mobile / extension contexts).
|
|||
|
|
|
|||
|
|
## Auto-fallback
|
|||
|
|
|
|||
|
|
Configuring WebRTC wires `MultiTransportFallback([webrtc, http])` as the
|
|||
|
|
engine's transport. The chain is sticky-after-first-failure: when WebRTC
|
|||
|
|
raises a `TransferTransportError` (timeout, ICE failed, data channel
|
|||
|
|
closed, frame too large), the fallback advances to HTTP and stays there
|
|||
|
|
for the lifetime of the engine.
|
|||
|
|
|
|||
|
|
For three-tier composition (e.g. WebRTC → WebSocket → HTTP), build the
|
|||
|
|
fallback yourself and pass a custom transport via the engine deps:
|
|||
|
|
|
|||
|
|
```ts
|
|||
|
|
import { MultiTransportFallback } from '@shade/sdk';
|
|||
|
|
|
|||
|
|
const stack = new MultiTransportFallback([
|
|||
|
|
{ name: 'webrtc', transport: rtcTransport },
|
|||
|
|
{ name: 'ws', transport: wsTransport },
|
|||
|
|
{ name: 'http', transport: httpTransport },
|
|||
|
|
]);
|
|||
|
|
stack.onSwitch((from, to) => metrics.observe('shade.transport.demoted', { from, to }));
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The `WebRtcConnectionManager`'s connect timeout (default 30 s) is the
|
|||
|
|
upper bound on how long the chain dwells on WebRTC before demoting. The
|
|||
|
|
V3.11 acceptance criterion is "P2P-død → HTTP innen 5 s" — set
|
|||
|
|
`connectTimeoutMs: 4_000` in your `configureWebRTC()` call to keep the
|
|||
|
|
upper bound at 4 seconds and meet the SLO with margin.
|
|||
|
|
|
|||
|
|
## ICE server config
|
|||
|
|
|
|||
|
|
| Setting | Default | When to override |
|
|||
|
|
|------------------------|-----------------------------------|------------------|
|
|||
|
|
| `iceServers` | Google public STUN (×2) | Production — pin your own STUN to avoid Google rate limits, plus your TURN credentials |
|
|||
|
|
| `iceTransportPolicy` | `'all'` (host + reflexive + relay)| `'relay'` to mandate TURN-only routing (e.g. inside a corporate network where direct connectivity must never leak) |
|
|||
|
|
| `bundlePolicy` | spec default (`'balanced'`) | rarely |
|
|||
|
|
|
|||
|
|
Public STUN works for ~80% of consumer NATs. The remaining 20% (symmetric
|
|||
|
|
NAT, paranoid corporate proxies, mobile carrier-grade NAT) need TURN.
|
|||
|
|
Run your own [coturn](https://github.com/coturn/coturn) or use a managed
|
|||
|
|
provider — but **TURN traffic is real bandwidth through your server**, so
|
|||
|
|
budget accordingly. Shade's wire format is at least as efficient over
|
|||
|
|
TURN as over HTTPS (no per-request HTTP framing overhead).
|
|||
|
|
|
|||
|
|
## NAT-traversal: hopes and realities
|
|||
|
|
|
|||
|
|
What works without TURN, in our testing:
|
|||
|
|
|
|||
|
|
- Same NAT (LAN): always
|
|||
|
|
- Two clients behind cone NATs: usually
|
|||
|
|
- One client behind symmetric NAT, the other behind any cone NAT: usually
|
|||
|
|
- Two clients behind symmetric NATs: rarely — falls back to TURN
|
|||
|
|
|
|||
|
|
What doesn't work:
|
|||
|
|
|
|||
|
|
- Two clients behind strict carrier-grade NAT (CGNAT): TURN required
|
|||
|
|
- Clients on networks that block UDP entirely: TURN over TCP/443 required
|
|||
|
|
|
|||
|
|
When in doubt, configure TURN over TCP/443 — it impersonates HTTPS and
|
|||
|
|
gets through nearly every middlebox.
|
|||
|
|
|
|||
|
|
## Diagnostics
|
|||
|
|
|
|||
|
|
The SDK exposes the live runtime via `shade.getWebRtcRuntime()`:
|
|||
|
|
|
|||
|
|
```ts
|
|||
|
|
const runtime = shade.getWebRtcRuntime();
|
|||
|
|
if (runtime !== null) {
|
|||
|
|
console.log('active transport:', runtime.fallback.activeName);
|
|||
|
|
console.log('peers:', [...runtime.manager.byPeer ?? []]);
|
|||
|
|
|
|||
|
|
runtime.fallback.onSwitch((from, to) => {
|
|||
|
|
console.warn(`shade transport demoted ${from} → ${to}`);
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The `failures` array on `MultiTransportFallback` records every
|
|||
|
|
demotion's reason — wire it to your observability backend to track
|
|||
|
|
NAT/TURN problems in production.
|
|||
|
|
|
|||
|
|
## Sample code
|
|||
|
|
|
|||
|
|
End-to-end test using `MemoryRtcFactory` (no real network):
|
|||
|
|
|
|||
|
|
```ts
|
|||
|
|
import { MemoryRtcFactory } from '@shade/transport-webrtc';
|
|||
|
|
|
|||
|
|
const factory = new MemoryRtcFactory();
|
|||
|
|
alice.configureWebRTC({ factory });
|
|||
|
|
bob.configureWebRTC({ factory });
|
|||
|
|
|
|||
|
|
await alice.upload({ to: 'bob', input: bytes }); // → P2P loopback
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
See `packages/shade-sdk/tests/webrtc-integration.test.ts` for the full
|
|||
|
|
loopback test, `webrtc-failover.test.ts` for the auto-fallback test, and
|
|||
|
|
`packages/shade-transport-webrtc/tests/` for the unit tests covering
|
|||
|
|
wire format, signaling, glare, and TURN-only configuration.
|
|||
|
|
|
|||
|
|
## Wire format inside the DataChannel
|
|||
|
|
|
|||
|
|
The DataChannel is a single bidirectional pipe shared by every in-flight
|
|||
|
|
stream between two peers. Each frame is a self-describing binary blob:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
client → server server → client
|
|||
|
|
─────────────── ───────────────
|
|||
|
|
0x01 chunk reqId(16) sid(16) lane(u32) seq(u64) env(...) 0x81 chunk-ack reqId(16) lastSeq(u32) bytesRecv(u32)
|
|||
|
|
0x02 resume-query reqId(16) sid(16) 0x82 resume-state reqId(16) jsonBody(utf-8)
|
|||
|
|
0x03 ping reqId(16) nonce(u64) 0x83 pong reqId(16) nonce(u64)
|
|||
|
|
0xFE error reqId(16) jsonBody(utf-8)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
`reqId` is a 16-byte random correlation token; the responder echoes it
|
|||
|
|
verbatim so multiple in-flight requests can be matched without a stream
|
|||
|
|
multiplexer on top of SCTP.
|
|||
|
|
|
|||
|
|
The wire matches `ShadeTransferWsTransport` exactly — adapters for
|
|||
|
|
either transport can interoperate by translating between SCTP message-
|
|||
|
|
framing and WS binary frames at the byte level.
|
|||
|
|
|
|||
|
|
## Limits
|
|||
|
|
|
|||
|
|
- Max DataChannel message: **256 KiB** (Chrome's safe ceiling). Configure
|
|||
|
|
`chunkSize` ≤ 256 KiB on uploads that prefer WebRTC. The transport
|
|||
|
|
raises a clear error when an envelope exceeds the cap; the engine then
|
|||
|
|
retries via HTTP.
|
|||
|
|
- One DataChannel per peer pair (label `shade-transfer/v1`). Multiple
|
|||
|
|
in-flight transfers from the same peer pair multiplex via `reqId`.
|
|||
|
|
- No SFU/MCU — group transfers fan out at the application layer.
|
|||
|
|
- DTLS-fingerprint binding to Shade's identity-fingerprint is **not** in
|
|||
|
|
V3.11 (deferred as hardening work — DataChannel is already inside a
|
|||
|
|
ratchet-authenticated session, so the practical exposure window is
|
|||
|
|
limited to in-process MITM scenarios that already require malware).
|
|||
|
|
|
|||
|
|
## Migration
|
|||
|
|
|
|||
|
|
Opt-in. If you don't call `configureWebRTC`, your existing HTTP/WS
|
|||
|
|
transport stack is unchanged.
|
|||
|
|
|
|||
|
|
When you do opt in, the **engine must not be built yet** — the easy way
|
|||
|
|
to ensure this is to call `configureWebRTC` before `configureTransfers`
|
|||
|
|
or before any of `upload` / `onIncomingTransfer` / `transferRoute`.
|
|||
|
|
Receiver-side: the WebRTC manager wires receiver-hooks into the engine
|
|||
|
|
during `engine()` construction, so make sure both sides do `configureWebRTC`
|
|||
|
|
+ `configureTransfers` before the first `transferRoute()` call.
|
|||
|
|
|
|||
|
|
## Related modules
|
|||
|
|
|
|||
|
|
- [`@shade/transfer`](../packages/shade-transfer/) — engine, lane queues,
|
|||
|
|
HTTP transport, multi-fallback wrapper.
|
|||
|
|
- [`@shade/streams`](./streams.md) — chunk encryption + lane key
|
|||
|
|
derivation. Indirect dep.
|
|||
|
|
- [`@shade/transport-bridge`](./transport.md) — V3.7 bridge layer (WS /
|
|||
|
|
SSE / long-poll for control envelopes). Orthogonal to V3.11.
|
|||
|
|
- [V5.0 — real-time channels](./V5.0.md) — downstream consumer of the
|
|||
|
|
same DataChannel for voice/video/broadcast.
|