V3.1 → V3.12 consolidated and tagged for the first GA release. Wire format unchanged from 0.4.x — 4.0 peers interoperate with 0.4.x peers byte-for-byte. The version bump is semantic: audit-cycle complete, opt-in surface fully exposed, threat model refreshed for every new surface. Highlights: - All 24 @shade/* packages bumped to 4.0.0 in lockstep. - CHANGELOG 4.0.0 section is the canonical manifest of what landed. - THREAT-MODEL extended (§10 fingerprint gates, §11 WebRTC P2P, §12 Web-Worker boundary) + residual-risks table refreshed. - OpenAPI now covers all 27 routes: prekey, transfer, KT, inbox, bridge, observer, /metrics, /healthz, /ready. - MIGRATION 0.3.x → 4.0 documented + smoke-tested against shade migrate-storage on a real SQLite DB. - docs/audit/REVIEW-BUNDLE.md + SCOPE.md ready for external reviewer. - scripts/soak.ts harness for the GA-stable 2-week soak window. - All V*.md plans archived under docs/archive/ with Status: Done. - Voice/Video carved out into V5.0; 4.0 audit focuses on the frozen non-realtime stack. Tests: TS 1000/1000 + Kotlin 11/11 cross-platform vectors green. Docker: gt.zyon.no/stian/shade-prekey:4.0.0 builds and reports version 4.0.0 on /health. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Shade Transport — WebRTC P2P Layer (V3.11)
@shade/transport-webrtc adds a direct peer-to-peer chunk transport on
top of the existing @shade/transfer engine. When two clients can reach
each other through NAT/firewall, large transfers (@shade/files,
@shade/transfer) flow over a single bidirectional RTCDataChannel
instead of paying the round-trip cost of HTTP-relayed POSTs. When NAT
traversal fails, the multi-transport fallback automatically demotes the
chain back to HTTP — without losing any chunks already in flight.
The wire payload is unchanged: every chunk is still a Shade ratchet / streams envelope (AES-256-GCM under HKDF-derived per-lane keys). DTLS- SRTP is only the WebRTC transport secret; turning a TURN-relay on does not give the relay operator access to plaintext.
┌───────────────────────────────────────────────────────────────┐
│ application code │
│ │
│ shade.upload({ to: 'bob', input: file }) │
└────────────────────────────────┬──────────────────────────────┘
│
┌─────────▼──────────┐
│ TransferEngine │
└─────────┬──────────┘
│ ITransferTransport
┌─────────▼──────────┐
│ MultiTransport │
│ Fallback (sticky) │
└────┬─────┬─────┬───┘
│ │ │
┌─────────────▼┐ ┌─▼─┐ ┌▼────────────┐
│ WebRtcTransfer│ │WS │ │ ShadeTransfer│
│ Transport │ │… │ │ HttpTransport│
└─────┬─────────┘ └───┘ └──────────────┘
│ DataChannel binary frames
┌─────▼─────────┐
│ WebRtcConn │ ←──── SDP/ICE over Shade.send
│ Manager │ (ratchet-encrypted)
└───────────────┘
When to reach for it
| Scenario | Default (HTTP) | + WebRTC |
|---|---|---|
| Two clients on the same LAN | server-relayed | direct, P2P |
| One peer behind enterprise NAT only | works | TURN-relay |
| Both peers behind symmetric NAT | works | falls back to HTTP |
| One peer offline | inbox-buffered | inbox-buffered (HTTP path) |
| Browser extension with strict CSP | works | works (uses RTCPeerConnection) |
Use cases:
@shade/transferupload of multi-MB / multi-GB files@shade/filesread/writeof large inline blobs- Future:
@shade/streamsreal-time channels (V5.0 reuses this same DataChannel)
Quick start (browser)
import { createShade } from '@shade/sdk';
import { nativeRtcFactory } from '@shade/transport-webrtc';
const shade = await createShade({ prekeyServer: 'https://prekey.example.com' });
// IMPORTANT: configureWebRTC MUST be called BEFORE the first upload() /
// onIncomingTransfer() / transferRoute() call, because those build the
// transfer engine — and the engine captures its transport stack at
// construction time.
shade.configureWebRTC({
factory: nativeRtcFactory(),
// Optional — defaults to two public Google STUN servers.
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{
urls: 'turn:turn.example.com:3478',
username: 'shade',
credential: 'YOUR_TURN_SECRET',
},
],
});
shade.configureTransfers({
resolveBaseUrl: async (peer) => directory.lookup(peer),
});
await shade.upload({ to: 'bob', input: file }); // → P2P when NAT allows
Quick start (Bun / Node)
Bun does not yet expose RTCPeerConnection natively. Use one of:
node-datachannel— small, stable, libdatachannel under the hood@roamhq/wrtc— fork of the Googlewrtcbindings
Wrap the chosen library behind an IRtcFactory (the package only depends
on a narrow surface — createPeerConnection, createDataChannel,
addEventListener):
import { IRtcFactory, IPeerConnection, IDataChannel } from '@shade/transport-webrtc';
// pseudo-adapter for node-datachannel
class NodeDataChannelFactory implements IRtcFactory {
createPeerConnection(config) { /* ... return adapter wrapping nodeDc PeerConnection */ }
}
shade.configureWebRTC({ factory: new NodeDataChannelFactory(), iceServers });
Connection flow
Alice initiates Bob receives
─────────────── ────────────
1. createOffer() → SDP 2. shade.send delivers offer
→ Bob.createAnswer()
3. shade.send delivers answer 4. setRemoteDescription(answer)
5. trickle ICE candidates (both directions) 6. trickle ICE candidates
7. DataChannel onopen (both sides) 7. DataChannel onopen
All four signaling kinds (shade.webrtc-offer/v1, shade.webrtc-answer/v1,
shade.webrtc-ice/v1, shade.webrtc-bye/v1) ride the existing Shade
ratchet — the relay sees only ciphertext envelopes.
Glare resolution
If both peers call getOrCreate() simultaneously, the manager uses
lexicographic tiebreak: the side with the smaller address wins
caller-role; the side with the larger address closes its outgoing
connection and accepts the inbound offer instead. Both peers ultimately
converge on a single WebRtcConnection.
Backpressure
The WebRtcTransferTransport polls RTCDataChannel.bufferedAmount and
suspends new sends once the buffer crosses backpressureThresholdBytes
(default 4 MiB). This avoids SCTP queue runaway when the application
pushes faster than the network can drain. Tune lower for memory-
constrained clients (mobile / extension contexts).
Auto-fallback
Configuring WebRTC wires MultiTransportFallback([webrtc, http]) as the
engine's transport. The chain is sticky-after-first-failure: when WebRTC
raises a TransferTransportError (timeout, ICE failed, data channel
closed, frame too large), the fallback advances to HTTP and stays there
for the lifetime of the engine.
For three-tier composition (e.g. WebRTC → WebSocket → HTTP), build the fallback yourself and pass a custom transport via the engine deps:
import { MultiTransportFallback } from '@shade/sdk';
const stack = new MultiTransportFallback([
{ name: 'webrtc', transport: rtcTransport },
{ name: 'ws', transport: wsTransport },
{ name: 'http', transport: httpTransport },
]);
stack.onSwitch((from, to) => metrics.observe('shade.transport.demoted', { from, to }));
The WebRtcConnectionManager's connect timeout (default 30 s) is the
upper bound on how long the chain dwells on WebRTC before demoting. The
V3.11 acceptance criterion is "P2P-død → HTTP innen 5 s" — set
connectTimeoutMs: 4_000 in your configureWebRTC() call to keep the
upper bound at 4 seconds and meet the SLO with margin.
ICE server config
| Setting | Default | When to override |
|---|---|---|
iceServers |
Google public STUN (×2) | Production — pin your own STUN to avoid Google rate limits, plus your TURN credentials |
iceTransportPolicy |
'all' (host + reflexive + relay) |
'relay' to mandate TURN-only routing (e.g. inside a corporate network where direct connectivity must never leak) |
bundlePolicy |
spec default ('balanced') |
rarely |
Public STUN works for ~80% of consumer NATs. The remaining 20% (symmetric NAT, paranoid corporate proxies, mobile carrier-grade NAT) need TURN. Run your own coturn or use a managed provider — but TURN traffic is real bandwidth through your server, so budget accordingly. Shade's wire format is at least as efficient over TURN as over HTTPS (no per-request HTTP framing overhead).
NAT-traversal: hopes and realities
What works without TURN, in our testing:
- Same NAT (LAN): always
- Two clients behind cone NATs: usually
- One client behind symmetric NAT, the other behind any cone NAT: usually
- Two clients behind symmetric NATs: rarely — falls back to TURN
What doesn't work:
- Two clients behind strict carrier-grade NAT (CGNAT): TURN required
- Clients on networks that block UDP entirely: TURN over TCP/443 required
When in doubt, configure TURN over TCP/443 — it impersonates HTTPS and gets through nearly every middlebox.
Diagnostics
The SDK exposes the live runtime via shade.getWebRtcRuntime():
const runtime = shade.getWebRtcRuntime();
if (runtime !== null) {
console.log('active transport:', runtime.fallback.activeName);
console.log('peers:', [...runtime.manager.byPeer ?? []]);
runtime.fallback.onSwitch((from, to) => {
console.warn(`shade transport demoted ${from} → ${to}`);
});
}
The failures array on MultiTransportFallback records every
demotion's reason — wire it to your observability backend to track
NAT/TURN problems in production.
Sample code
End-to-end test using MemoryRtcFactory (no real network):
import { MemoryRtcFactory } from '@shade/transport-webrtc';
const factory = new MemoryRtcFactory();
alice.configureWebRTC({ factory });
bob.configureWebRTC({ factory });
await alice.upload({ to: 'bob', input: bytes }); // → P2P loopback
See packages/shade-sdk/tests/webrtc-integration.test.ts for the full
loopback test, webrtc-failover.test.ts for the auto-fallback test, and
packages/shade-transport-webrtc/tests/ for the unit tests covering
wire format, signaling, glare, and TURN-only configuration.
Wire format inside the DataChannel
The DataChannel is a single bidirectional pipe shared by every in-flight stream between two peers. Each frame is a self-describing binary blob:
client → server server → client
─────────────── ───────────────
0x01 chunk reqId(16) sid(16) lane(u32) seq(u64) env(...) 0x81 chunk-ack reqId(16) lastSeq(u32) bytesRecv(u32)
0x02 resume-query reqId(16) sid(16) 0x82 resume-state reqId(16) jsonBody(utf-8)
0x03 ping reqId(16) nonce(u64) 0x83 pong reqId(16) nonce(u64)
0xFE error reqId(16) jsonBody(utf-8)
reqId is a 16-byte random correlation token; the responder echoes it
verbatim so multiple in-flight requests can be matched without a stream
multiplexer on top of SCTP.
The wire matches ShadeTransferWsTransport exactly — adapters for
either transport can interoperate by translating between SCTP message-
framing and WS binary frames at the byte level.
Limits
- Max DataChannel message: 256 KiB (Chrome's safe ceiling). Configure
chunkSize≤ 256 KiB on uploads that prefer WebRTC. The transport raises a clear error when an envelope exceeds the cap; the engine then retries via HTTP. - One DataChannel per peer pair (label
shade-transfer/v1). Multiple in-flight transfers from the same peer pair multiplex viareqId. - No SFU/MCU — group transfers fan out at the application layer.
- DTLS-fingerprint binding to Shade's identity-fingerprint is not in V3.11 (deferred as hardening work — DataChannel is already inside a ratchet-authenticated session, so the practical exposure window is limited to in-process MITM scenarios that already require malware).
Migration
Opt-in. If you don't call configureWebRTC, your existing HTTP/WS
transport stack is unchanged.
When you do opt in, the engine must not be built yet — the easy way
to ensure this is to call configureWebRTC before configureTransfers
or before any of upload / onIncomingTransfer / transferRoute.
Receiver-side: the WebRTC manager wires receiver-hooks into the engine
during engine() construction, so make sure both sides do configureWebRTC
configureTransfersbefore the firsttransferRoute()call.
Related modules
@shade/transfer— engine, lane queues, HTTP transport, multi-fallback wrapper.@shade/streams— chunk encryption + lane key derivation. Indirect dep.@shade/transport-bridge— V3.7 bridge layer (WS / SSE / long-poll for control envelopes). Orthogonal to V3.11.- V5.0 — real-time channels — downstream consumer of the same DataChannel for voice/video/broadcast.