Files
Shade/docs/webrtc.md
Sterister e6fdf31b49
Some checks failed
Test / test (push) Has been cancelled
Cross-platform vectors / TypeScript vectors (bun) (push) Has been cancelled
Cross-platform vectors / Kotlin vectors (gradle) (push) Has been cancelled
Docker build and publish / docker (push) Has been cancelled
Publish / publish (push) Has been cancelled
release(v4.0.0): Shade GA — V3.x consolidation + audit prep
V3.1 → V3.12 consolidated and tagged for the first GA release. Wire
format unchanged from 0.4.x — 4.0 peers interoperate with 0.4.x peers
byte-for-byte. The version bump is semantic: audit-cycle complete,
opt-in surface fully exposed, threat model refreshed for every new
surface.

Highlights:
- All 24 @shade/* packages bumped to 4.0.0 in lockstep.
- CHANGELOG 4.0.0 section is the canonical manifest of what landed.
- THREAT-MODEL extended (§10 fingerprint gates, §11 WebRTC P2P, §12
  Web-Worker boundary) + residual-risks table refreshed.
- OpenAPI now covers all 27 routes: prekey, transfer, KT, inbox,
  bridge, observer, /metrics, /healthz, /ready.
- MIGRATION 0.3.x → 4.0 documented + smoke-tested against
  shade migrate-storage on a real SQLite DB.
- docs/audit/REVIEW-BUNDLE.md + SCOPE.md ready for external reviewer.
- scripts/soak.ts harness for the GA-stable 2-week soak window.
- All V*.md plans archived under docs/archive/ with Status: Done.
- Voice/Video carved out into V5.0; 4.0 audit focuses on the frozen
  non-realtime stack.

Tests: TS 1000/1000 + Kotlin 11/11 cross-platform vectors green.
Docker: gt.zyon.no/stian/shade-prekey:4.0.0 builds and reports
  version 4.0.0 on /health.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:35:35 +02:00

13 KiB
Raw Permalink Blame History

Shade Transport — WebRTC P2P Layer (V3.11)

@shade/transport-webrtc adds a direct peer-to-peer chunk transport on top of the existing @shade/transfer engine. When two clients can reach each other through NAT/firewall, large transfers (@shade/files, @shade/transfer) flow over a single bidirectional RTCDataChannel instead of paying the round-trip cost of HTTP-relayed POSTs. When NAT traversal fails, the multi-transport fallback automatically demotes the chain back to HTTP — without losing any chunks already in flight.

The wire payload is unchanged: every chunk is still a Shade ratchet / streams envelope (AES-256-GCM under HKDF-derived per-lane keys). DTLS- SRTP is only the WebRTC transport secret; turning a TURN-relay on does not give the relay operator access to plaintext.

┌───────────────────────────────────────────────────────────────┐
│                     application code                          │
│                                                               │
│      shade.upload({ to: 'bob', input: file })                 │
└────────────────────────────────┬──────────────────────────────┘
                                 │
                       ┌─────────▼──────────┐
                       │  TransferEngine    │
                       └─────────┬──────────┘
                                 │  ITransferTransport
                       ┌─────────▼──────────┐
                       │ MultiTransport     │
                       │ Fallback (sticky)  │
                       └────┬─────┬─────┬───┘
                            │     │     │
              ┌─────────────▼┐  ┌─▼─┐  ┌▼────────────┐
              │ WebRtcTransfer│  │WS │  │ ShadeTransfer│
              │ Transport     │  │…  │  │ HttpTransport│
              └─────┬─────────┘  └───┘  └──────────────┘
                    │ DataChannel binary frames
              ┌─────▼─────────┐
              │ WebRtcConn    │ ←──── SDP/ICE over Shade.send
              │ Manager       │       (ratchet-encrypted)
              └───────────────┘

When to reach for it

Scenario Default (HTTP) + WebRTC
Two clients on the same LAN server-relayed direct, P2P
One peer behind enterprise NAT only works TURN-relay
Both peers behind symmetric NAT works falls back to HTTP
One peer offline inbox-buffered inbox-buffered (HTTP path)
Browser extension with strict CSP works works (uses RTCPeerConnection)

Use cases:

  • @shade/transfer upload of multi-MB / multi-GB files
  • @shade/files read/write of large inline blobs
  • Future: @shade/streams real-time channels (V5.0 reuses this same DataChannel)

Quick start (browser)

import { createShade } from '@shade/sdk';
import { nativeRtcFactory } from '@shade/transport-webrtc';

const shade = await createShade({ prekeyServer: 'https://prekey.example.com' });

// IMPORTANT: configureWebRTC MUST be called BEFORE the first upload() /
// onIncomingTransfer() / transferRoute() call, because those build the
// transfer engine — and the engine captures its transport stack at
// construction time.
shade.configureWebRTC({
  factory: nativeRtcFactory(),
  // Optional — defaults to two public Google STUN servers.
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    {
      urls: 'turn:turn.example.com:3478',
      username: 'shade',
      credential: 'YOUR_TURN_SECRET',
    },
  ],
});

shade.configureTransfers({
  resolveBaseUrl: async (peer) => directory.lookup(peer),
});

await shade.upload({ to: 'bob', input: file });   // → P2P when NAT allows

Quick start (Bun / Node)

Bun does not yet expose RTCPeerConnection natively. Use one of:

Wrap the chosen library behind an IRtcFactory (the package only depends on a narrow surface — createPeerConnection, createDataChannel, addEventListener):

import { IRtcFactory, IPeerConnection, IDataChannel } from '@shade/transport-webrtc';
// pseudo-adapter for node-datachannel
class NodeDataChannelFactory implements IRtcFactory {
  createPeerConnection(config) { /* ... return adapter wrapping nodeDc PeerConnection */ }
}

shade.configureWebRTC({ factory: new NodeDataChannelFactory(), iceServers });

Connection flow

Alice initiates                                Bob receives
───────────────                                ────────────
1. createOffer() → SDP                         2. shade.send delivers offer
                                                  → Bob.createAnswer()
3. shade.send delivers answer                  4. setRemoteDescription(answer)
5. trickle ICE candidates (both directions)    6. trickle ICE candidates
7. DataChannel onopen (both sides)             7. DataChannel onopen

All four signaling kinds (shade.webrtc-offer/v1, shade.webrtc-answer/v1, shade.webrtc-ice/v1, shade.webrtc-bye/v1) ride the existing Shade ratchet — the relay sees only ciphertext envelopes.

Glare resolution

If both peers call getOrCreate() simultaneously, the manager uses lexicographic tiebreak: the side with the smaller address wins caller-role; the side with the larger address closes its outgoing connection and accepts the inbound offer instead. Both peers ultimately converge on a single WebRtcConnection.

Backpressure

The WebRtcTransferTransport polls RTCDataChannel.bufferedAmount and suspends new sends once the buffer crosses backpressureThresholdBytes (default 4 MiB). This avoids SCTP queue runaway when the application pushes faster than the network can drain. Tune lower for memory- constrained clients (mobile / extension contexts).

Auto-fallback

Configuring WebRTC wires MultiTransportFallback([webrtc, http]) as the engine's transport. The chain is sticky-after-first-failure: when WebRTC raises a TransferTransportError (timeout, ICE failed, data channel closed, frame too large), the fallback advances to HTTP and stays there for the lifetime of the engine.

For three-tier composition (e.g. WebRTC → WebSocket → HTTP), build the fallback yourself and pass a custom transport via the engine deps:

import { MultiTransportFallback } from '@shade/sdk';

const stack = new MultiTransportFallback([
  { name: 'webrtc', transport: rtcTransport },
  { name: 'ws',     transport: wsTransport },
  { name: 'http',   transport: httpTransport },
]);
stack.onSwitch((from, to) => metrics.observe('shade.transport.demoted', { from, to }));

The WebRtcConnectionManager's connect timeout (default 30 s) is the upper bound on how long the chain dwells on WebRTC before demoting. The V3.11 acceptance criterion is "P2P-død → HTTP innen 5 s" — set connectTimeoutMs: 4_000 in your configureWebRTC() call to keep the upper bound at 4 seconds and meet the SLO with margin.

ICE server config

Setting Default When to override
iceServers Google public STUN (×2) Production — pin your own STUN to avoid Google rate limits, plus your TURN credentials
iceTransportPolicy 'all' (host + reflexive + relay) 'relay' to mandate TURN-only routing (e.g. inside a corporate network where direct connectivity must never leak)
bundlePolicy spec default ('balanced') rarely

Public STUN works for ~80% of consumer NATs. The remaining 20% (symmetric NAT, paranoid corporate proxies, mobile carrier-grade NAT) need TURN. Run your own coturn or use a managed provider — but TURN traffic is real bandwidth through your server, so budget accordingly. Shade's wire format is at least as efficient over TURN as over HTTPS (no per-request HTTP framing overhead).

NAT-traversal: hopes and realities

What works without TURN, in our testing:

  • Same NAT (LAN): always
  • Two clients behind cone NATs: usually
  • One client behind symmetric NAT, the other behind any cone NAT: usually
  • Two clients behind symmetric NATs: rarely — falls back to TURN

What doesn't work:

  • Two clients behind strict carrier-grade NAT (CGNAT): TURN required
  • Clients on networks that block UDP entirely: TURN over TCP/443 required

When in doubt, configure TURN over TCP/443 — it impersonates HTTPS and gets through nearly every middlebox.

Diagnostics

The SDK exposes the live runtime via shade.getWebRtcRuntime():

const runtime = shade.getWebRtcRuntime();
if (runtime !== null) {
  console.log('active transport:', runtime.fallback.activeName);
  console.log('peers:', [...runtime.manager.byPeer ?? []]);

  runtime.fallback.onSwitch((from, to) => {
    console.warn(`shade transport demoted ${from}${to}`);
  });
}

The failures array on MultiTransportFallback records every demotion's reason — wire it to your observability backend to track NAT/TURN problems in production.

Sample code

End-to-end test using MemoryRtcFactory (no real network):

import { MemoryRtcFactory } from '@shade/transport-webrtc';

const factory = new MemoryRtcFactory();
alice.configureWebRTC({ factory });
bob.configureWebRTC({ factory });

await alice.upload({ to: 'bob', input: bytes });    // → P2P loopback

See packages/shade-sdk/tests/webrtc-integration.test.ts for the full loopback test, webrtc-failover.test.ts for the auto-fallback test, and packages/shade-transport-webrtc/tests/ for the unit tests covering wire format, signaling, glare, and TURN-only configuration.

Wire format inside the DataChannel

The DataChannel is a single bidirectional pipe shared by every in-flight stream between two peers. Each frame is a self-describing binary blob:

client → server                                                  server → client
───────────────                                                  ───────────────
0x01 chunk         reqId(16) sid(16) lane(u32) seq(u64) env(...) 0x81 chunk-ack    reqId(16) lastSeq(u32) bytesRecv(u32)
0x02 resume-query  reqId(16) sid(16)                              0x82 resume-state reqId(16) jsonBody(utf-8)
0x03 ping          reqId(16) nonce(u64)                           0x83 pong         reqId(16) nonce(u64)
                                                                  0xFE error        reqId(16) jsonBody(utf-8)

reqId is a 16-byte random correlation token; the responder echoes it verbatim so multiple in-flight requests can be matched without a stream multiplexer on top of SCTP.

The wire matches ShadeTransferWsTransport exactly — adapters for either transport can interoperate by translating between SCTP message- framing and WS binary frames at the byte level.

Limits

  • Max DataChannel message: 256 KiB (Chrome's safe ceiling). Configure chunkSize ≤ 256 KiB on uploads that prefer WebRTC. The transport raises a clear error when an envelope exceeds the cap; the engine then retries via HTTP.
  • One DataChannel per peer pair (label shade-transfer/v1). Multiple in-flight transfers from the same peer pair multiplex via reqId.
  • No SFU/MCU — group transfers fan out at the application layer.
  • DTLS-fingerprint binding to Shade's identity-fingerprint is not in V3.11 (deferred as hardening work — DataChannel is already inside a ratchet-authenticated session, so the practical exposure window is limited to in-process MITM scenarios that already require malware).

Migration

Opt-in. If you don't call configureWebRTC, your existing HTTP/WS transport stack is unchanged.

When you do opt in, the engine must not be built yet — the easy way to ensure this is to call configureWebRTC before configureTransfers or before any of upload / onIncomingTransfer / transferRoute. Receiver-side: the WebRTC manager wires receiver-hooks into the engine during engine() construction, so make sure both sides do configureWebRTC

  • configureTransfers before the first transferRoute() call.