release(v4.0.0): Shade GA — V3.x consolidation + audit prep
Some checks failed
Test / test (push) Has been cancelled
Cross-platform vectors / TypeScript vectors (bun) (push) Has been cancelled
Cross-platform vectors / Kotlin vectors (gradle) (push) Has been cancelled
Docker build and publish / docker (push) Has been cancelled
Publish / publish (push) Has been cancelled
Some checks failed
Test / test (push) Has been cancelled
Cross-platform vectors / TypeScript vectors (bun) (push) Has been cancelled
Cross-platform vectors / Kotlin vectors (gradle) (push) Has been cancelled
Docker build and publish / docker (push) Has been cancelled
Publish / publish (push) Has been cancelled
V3.1 → V3.12 consolidated and tagged for the first GA release. Wire format unchanged from 0.4.x — 4.0 peers interoperate with 0.4.x peers byte-for-byte. The version bump is semantic: audit-cycle complete, opt-in surface fully exposed, threat model refreshed for every new surface. Highlights: - All 24 @shade/* packages bumped to 4.0.0 in lockstep. - CHANGELOG 4.0.0 section is the canonical manifest of what landed. - THREAT-MODEL extended (§10 fingerprint gates, §11 WebRTC P2P, §12 Web-Worker boundary) + residual-risks table refreshed. - OpenAPI now covers all 27 routes: prekey, transfer, KT, inbox, bridge, observer, /metrics, /healthz, /ready. - MIGRATION 0.3.x → 4.0 documented + smoke-tested against shade migrate-storage on a real SQLite DB. - docs/audit/REVIEW-BUNDLE.md + SCOPE.md ready for external reviewer. - scripts/soak.ts harness for the GA-stable 2-week soak window. - All V*.md plans archived under docs/archive/ with Status: Done. - Voice/Video carved out into V5.0; 4.0 audit focuses on the frozen non-realtime stack. Tests: TS 1000/1000 + Kotlin 11/11 cross-platform vectors green. Docker: gt.zyon.no/stian/shade-prekey:4.0.0 builds and reports version 4.0.0 on /health. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
302
docs/webrtc.md
Normal file
302
docs/webrtc.md
Normal file
@@ -0,0 +1,302 @@
|
||||
# Shade Transport — WebRTC P2P Layer (V3.11)
|
||||
|
||||
`@shade/transport-webrtc` adds a direct peer-to-peer chunk transport on
|
||||
top of the existing `@shade/transfer` engine. When two clients can reach
|
||||
each other through NAT/firewall, large transfers (`@shade/files`,
|
||||
`@shade/transfer`) flow over a single bidirectional `RTCDataChannel`
|
||||
instead of paying the round-trip cost of HTTP-relayed POSTs. When NAT
|
||||
traversal fails, the multi-transport fallback automatically demotes the
|
||||
chain back to HTTP — without losing any chunks already in flight.
|
||||
|
||||
The wire payload is unchanged: every chunk is still a Shade ratchet /
|
||||
streams envelope (AES-256-GCM under HKDF-derived per-lane keys). DTLS-
|
||||
SRTP is only the WebRTC transport secret; turning a TURN-relay on does
|
||||
not give the relay operator access to plaintext.
|
||||
|
||||
```
|
||||
┌───────────────────────────────────────────────────────────────┐
|
||||
│ application code │
|
||||
│ │
|
||||
│ shade.upload({ to: 'bob', input: file }) │
|
||||
└────────────────────────────────┬──────────────────────────────┘
|
||||
│
|
||||
┌─────────▼──────────┐
|
||||
│ TransferEngine │
|
||||
└─────────┬──────────┘
|
||||
│ ITransferTransport
|
||||
┌─────────▼──────────┐
|
||||
│ MultiTransport │
|
||||
│ Fallback (sticky) │
|
||||
└────┬─────┬─────┬───┘
|
||||
│ │ │
|
||||
┌─────────────▼┐ ┌─▼─┐ ┌▼────────────┐
|
||||
│ WebRtcTransfer│ │WS │ │ ShadeTransfer│
|
||||
│ Transport │ │… │ │ HttpTransport│
|
||||
└─────┬─────────┘ └───┘ └──────────────┘
|
||||
│ DataChannel binary frames
|
||||
┌─────▼─────────┐
|
||||
│ WebRtcConn │ ←──── SDP/ICE over Shade.send
|
||||
│ Manager │ (ratchet-encrypted)
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
## When to reach for it
|
||||
|
||||
| Scenario | Default (HTTP) | + WebRTC |
|
||||
|---------------------------------------|----------------|----------------|
|
||||
| Two clients on the same LAN | server-relayed | direct, P2P |
|
||||
| One peer behind enterprise NAT only | works | TURN-relay |
|
||||
| Both peers behind symmetric NAT | works | falls back to HTTP |
|
||||
| One peer offline | inbox-buffered | inbox-buffered (HTTP path) |
|
||||
| Browser extension with strict CSP | works | works (uses RTCPeerConnection) |
|
||||
|
||||
Use cases:
|
||||
|
||||
- `@shade/transfer` upload of multi-MB / multi-GB files
|
||||
- `@shade/files` `read`/`write` of large inline blobs
|
||||
- Future: `@shade/streams` real-time channels (V5.0 reuses this same DataChannel)
|
||||
|
||||
## Quick start (browser)
|
||||
|
||||
```ts
|
||||
import { createShade } from '@shade/sdk';
|
||||
import { nativeRtcFactory } from '@shade/transport-webrtc';
|
||||
|
||||
const shade = await createShade({ prekeyServer: 'https://prekey.example.com' });
|
||||
|
||||
// IMPORTANT: configureWebRTC MUST be called BEFORE the first upload() /
|
||||
// onIncomingTransfer() / transferRoute() call, because those build the
|
||||
// transfer engine — and the engine captures its transport stack at
|
||||
// construction time.
|
||||
shade.configureWebRTC({
|
||||
factory: nativeRtcFactory(),
|
||||
// Optional — defaults to two public Google STUN servers.
|
||||
iceServers: [
|
||||
{ urls: 'stun:stun.l.google.com:19302' },
|
||||
{
|
||||
urls: 'turn:turn.example.com:3478',
|
||||
username: 'shade',
|
||||
credential: 'YOUR_TURN_SECRET',
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
shade.configureTransfers({
|
||||
resolveBaseUrl: async (peer) => directory.lookup(peer),
|
||||
});
|
||||
|
||||
await shade.upload({ to: 'bob', input: file }); // → P2P when NAT allows
|
||||
```
|
||||
|
||||
## Quick start (Bun / Node)
|
||||
|
||||
Bun does not yet expose `RTCPeerConnection` natively. Use one of:
|
||||
|
||||
- [`node-datachannel`](https://github.com/murat-dogan/node-datachannel)
|
||||
— small, stable, libdatachannel under the hood
|
||||
- [`@roamhq/wrtc`](https://www.npmjs.com/package/@roamhq/wrtc) — fork of
|
||||
the Google `wrtc` bindings
|
||||
|
||||
Wrap the chosen library behind an `IRtcFactory` (the package only depends
|
||||
on a narrow surface — `createPeerConnection`, `createDataChannel`,
|
||||
`addEventListener`):
|
||||
|
||||
```ts
|
||||
import { IRtcFactory, IPeerConnection, IDataChannel } from '@shade/transport-webrtc';
|
||||
// pseudo-adapter for node-datachannel
|
||||
class NodeDataChannelFactory implements IRtcFactory {
|
||||
createPeerConnection(config) { /* ... return adapter wrapping nodeDc PeerConnection */ }
|
||||
}
|
||||
|
||||
shade.configureWebRTC({ factory: new NodeDataChannelFactory(), iceServers });
|
||||
```
|
||||
|
||||
## Connection flow
|
||||
|
||||
```
|
||||
Alice initiates Bob receives
|
||||
─────────────── ────────────
|
||||
1. createOffer() → SDP 2. shade.send delivers offer
|
||||
→ Bob.createAnswer()
|
||||
3. shade.send delivers answer 4. setRemoteDescription(answer)
|
||||
5. trickle ICE candidates (both directions) 6. trickle ICE candidates
|
||||
7. DataChannel onopen (both sides) 7. DataChannel onopen
|
||||
```
|
||||
|
||||
All four signaling kinds (`shade.webrtc-offer/v1`, `shade.webrtc-answer/v1`,
|
||||
`shade.webrtc-ice/v1`, `shade.webrtc-bye/v1`) ride the existing Shade
|
||||
ratchet — the relay sees only ciphertext envelopes.
|
||||
|
||||
### Glare resolution
|
||||
|
||||
If both peers call `getOrCreate()` simultaneously, the manager uses
|
||||
lexicographic tiebreak: the side with the smaller address wins
|
||||
caller-role; the side with the larger address closes its outgoing
|
||||
connection and accepts the inbound offer instead. Both peers ultimately
|
||||
converge on a single `WebRtcConnection`.
|
||||
|
||||
## Backpressure
|
||||
|
||||
The `WebRtcTransferTransport` polls `RTCDataChannel.bufferedAmount` and
|
||||
suspends new sends once the buffer crosses `backpressureThresholdBytes`
|
||||
(default 4 MiB). This avoids SCTP queue runaway when the application
|
||||
pushes faster than the network can drain. Tune lower for memory-
|
||||
constrained clients (mobile / extension contexts).
|
||||
|
||||
## Auto-fallback
|
||||
|
||||
Configuring WebRTC wires `MultiTransportFallback([webrtc, http])` as the
|
||||
engine's transport. The chain is sticky-after-first-failure: when WebRTC
|
||||
raises a `TransferTransportError` (timeout, ICE failed, data channel
|
||||
closed, frame too large), the fallback advances to HTTP and stays there
|
||||
for the lifetime of the engine.
|
||||
|
||||
For three-tier composition (e.g. WebRTC → WebSocket → HTTP), build the
|
||||
fallback yourself and pass a custom transport via the engine deps:
|
||||
|
||||
```ts
|
||||
import { MultiTransportFallback } from '@shade/sdk';
|
||||
|
||||
const stack = new MultiTransportFallback([
|
||||
{ name: 'webrtc', transport: rtcTransport },
|
||||
{ name: 'ws', transport: wsTransport },
|
||||
{ name: 'http', transport: httpTransport },
|
||||
]);
|
||||
stack.onSwitch((from, to) => metrics.observe('shade.transport.demoted', { from, to }));
|
||||
```
|
||||
|
||||
The `WebRtcConnectionManager`'s connect timeout (default 30 s) is the
|
||||
upper bound on how long the chain dwells on WebRTC before demoting. The
|
||||
V3.11 acceptance criterion is "P2P-død → HTTP innen 5 s" — set
|
||||
`connectTimeoutMs: 4_000` in your `configureWebRTC()` call to keep the
|
||||
upper bound at 4 seconds and meet the SLO with margin.
|
||||
|
||||
## ICE server config
|
||||
|
||||
| Setting | Default | When to override |
|
||||
|------------------------|-----------------------------------|------------------|
|
||||
| `iceServers` | Google public STUN (×2) | Production — pin your own STUN to avoid Google rate limits, plus your TURN credentials |
|
||||
| `iceTransportPolicy` | `'all'` (host + reflexive + relay)| `'relay'` to mandate TURN-only routing (e.g. inside a corporate network where direct connectivity must never leak) |
|
||||
| `bundlePolicy` | spec default (`'balanced'`) | rarely |
|
||||
|
||||
Public STUN works for ~80% of consumer NATs. The remaining 20% (symmetric
|
||||
NAT, paranoid corporate proxies, mobile carrier-grade NAT) need TURN.
|
||||
Run your own [coturn](https://github.com/coturn/coturn) or use a managed
|
||||
provider — but **TURN traffic is real bandwidth through your server**, so
|
||||
budget accordingly. Shade's wire format is at least as efficient over
|
||||
TURN as over HTTPS (no per-request HTTP framing overhead).
|
||||
|
||||
## NAT-traversal: hopes and realities
|
||||
|
||||
What works without TURN, in our testing:
|
||||
|
||||
- Same NAT (LAN): always
|
||||
- Two clients behind cone NATs: usually
|
||||
- One client behind symmetric NAT, the other behind any cone NAT: usually
|
||||
- Two clients behind symmetric NATs: rarely — falls back to TURN
|
||||
|
||||
What doesn't work:
|
||||
|
||||
- Two clients behind strict carrier-grade NAT (CGNAT): TURN required
|
||||
- Clients on networks that block UDP entirely: TURN over TCP/443 required
|
||||
|
||||
When in doubt, configure TURN over TCP/443 — it impersonates HTTPS and
|
||||
gets through nearly every middlebox.
|
||||
|
||||
## Diagnostics
|
||||
|
||||
The SDK exposes the live runtime via `shade.getWebRtcRuntime()`:
|
||||
|
||||
```ts
|
||||
const runtime = shade.getWebRtcRuntime();
|
||||
if (runtime !== null) {
|
||||
console.log('active transport:', runtime.fallback.activeName);
|
||||
console.log('peers:', [...runtime.manager.byPeer ?? []]);
|
||||
|
||||
runtime.fallback.onSwitch((from, to) => {
|
||||
console.warn(`shade transport demoted ${from} → ${to}`);
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
The `failures` array on `MultiTransportFallback` records every
|
||||
demotion's reason — wire it to your observability backend to track
|
||||
NAT/TURN problems in production.
|
||||
|
||||
## Sample code
|
||||
|
||||
End-to-end test using `MemoryRtcFactory` (no real network):
|
||||
|
||||
```ts
|
||||
import { MemoryRtcFactory } from '@shade/transport-webrtc';
|
||||
|
||||
const factory = new MemoryRtcFactory();
|
||||
alice.configureWebRTC({ factory });
|
||||
bob.configureWebRTC({ factory });
|
||||
|
||||
await alice.upload({ to: 'bob', input: bytes }); // → P2P loopback
|
||||
```
|
||||
|
||||
See `packages/shade-sdk/tests/webrtc-integration.test.ts` for the full
|
||||
loopback test, `webrtc-failover.test.ts` for the auto-fallback test, and
|
||||
`packages/shade-transport-webrtc/tests/` for the unit tests covering
|
||||
wire format, signaling, glare, and TURN-only configuration.
|
||||
|
||||
## Wire format inside the DataChannel
|
||||
|
||||
The DataChannel is a single bidirectional pipe shared by every in-flight
|
||||
stream between two peers. Each frame is a self-describing binary blob:
|
||||
|
||||
```
|
||||
client → server server → client
|
||||
─────────────── ───────────────
|
||||
0x01 chunk reqId(16) sid(16) lane(u32) seq(u64) env(...) 0x81 chunk-ack reqId(16) lastSeq(u32) bytesRecv(u32)
|
||||
0x02 resume-query reqId(16) sid(16) 0x82 resume-state reqId(16) jsonBody(utf-8)
|
||||
0x03 ping reqId(16) nonce(u64) 0x83 pong reqId(16) nonce(u64)
|
||||
0xFE error reqId(16) jsonBody(utf-8)
|
||||
```
|
||||
|
||||
`reqId` is a 16-byte random correlation token; the responder echoes it
|
||||
verbatim so multiple in-flight requests can be matched without a stream
|
||||
multiplexer on top of SCTP.
|
||||
|
||||
The wire matches `ShadeTransferWsTransport` exactly — adapters for
|
||||
either transport can interoperate by translating between SCTP message-
|
||||
framing and WS binary frames at the byte level.
|
||||
|
||||
## Limits
|
||||
|
||||
- Max DataChannel message: **256 KiB** (Chrome's safe ceiling). Configure
|
||||
`chunkSize` ≤ 256 KiB on uploads that prefer WebRTC. The transport
|
||||
raises a clear error when an envelope exceeds the cap; the engine then
|
||||
retries via HTTP.
|
||||
- One DataChannel per peer pair (label `shade-transfer/v1`). Multiple
|
||||
in-flight transfers from the same peer pair multiplex via `reqId`.
|
||||
- No SFU/MCU — group transfers fan out at the application layer.
|
||||
- DTLS-fingerprint binding to Shade's identity-fingerprint is **not** in
|
||||
V3.11 (deferred as hardening work — DataChannel is already inside a
|
||||
ratchet-authenticated session, so the practical exposure window is
|
||||
limited to in-process MITM scenarios that already require malware).
|
||||
|
||||
## Migration
|
||||
|
||||
Opt-in. If you don't call `configureWebRTC`, your existing HTTP/WS
|
||||
transport stack is unchanged.
|
||||
|
||||
When you do opt in, the **engine must not be built yet** — the easy way
|
||||
to ensure this is to call `configureWebRTC` before `configureTransfers`
|
||||
or before any of `upload` / `onIncomingTransfer` / `transferRoute`.
|
||||
Receiver-side: the WebRTC manager wires receiver-hooks into the engine
|
||||
during `engine()` construction, so make sure both sides do `configureWebRTC`
|
||||
+ `configureTransfers` before the first `transferRoute()` call.
|
||||
|
||||
## Related modules
|
||||
|
||||
- [`@shade/transfer`](../packages/shade-transfer/) — engine, lane queues,
|
||||
HTTP transport, multi-fallback wrapper.
|
||||
- [`@shade/streams`](./streams.md) — chunk encryption + lane key
|
||||
derivation. Indirect dep.
|
||||
- [`@shade/transport-bridge`](./transport.md) — V3.7 bridge layer (WS /
|
||||
SSE / long-poll for control envelopes). Orthogonal to V3.11.
|
||||
- [V5.0 — real-time channels](./V5.0.md) — downstream consumer of the
|
||||
same DataChannel for voice/video/broadcast.
|
||||
Reference in New Issue
Block a user