release(v4.2.1): fix concurrent-ratchet desync via OutboundQueue waiter cursor
Some checks failed
Publish / publish (push) Has been cancelled
Docker build and publish / docker (push) Has been cancelled

Pull-mode httpClient + drainer + parallel RPCs against the same peer
deteriorated after ~10s with `DecryptionError`. Two bugs combined:

- `OutboundQueue.enqueue` woke `drain` waiters with a `since=0`
  snapshot, replaying already-processed events into
  `Shade.acceptTransferEnvelope` → `manager.decrypt` twice. The
  duplicate consumed an already-used skipped key and corrupted the
  Double Ratchet receive chain.

- `ratchetDecrypt` then propagated the corruption: a same-DH
  message behind the chain with no cached skipped key fell through
  to `kdfChainKey` on the ahead state and rewound `chain.counter`,
  permanently desyncing the chain.

Fix `OutboundQueue` to honor each waiter's `since`, and harden
`ratchetDecrypt` so any future duplicate fails cleanly without
mutating state. Adds regression coverage at all three layers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-04 22:58:26 +02:00
parent 7520b11b25
commit b77b7e771c
30 changed files with 380 additions and 29 deletions

View File

@@ -5,6 +5,49 @@ All notable changes to Shade are documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [4.2.1] — 2026-05-04 — Concurrent-ratchet desync under pull-mode drainer
A consumer running `shade.files.httpClient(server, { outboundQueueUrl, ... })`
alongside parallel RPC traffic against the same peer would, after ~10s of
load, see every subsequent message fail with
`DecryptionError: Failed to decrypt message — wrong key or tampered data`.
Two bugs combined to cause this; both are fixed in `4.2.1` with regression
coverage.
### Fixed
#### `@shade/transfer` — `OutboundQueue` waiter cursor
`enqueue` woke pending `drain` waiters with a `since=0` snapshot — the
full event log — instead of using the waiter's own `since`. A poll that
parked at the head and was woken by a fresh enqueue therefore replayed
every event the waiter had already processed. Downstream the queue
fed `Shade.acceptTransferEnvelope`, so the duplicate replayed an
envelope into `manager.decrypt` twice. The second decrypt consumed an
already-used skipped key and corrupted the Double Ratchet receive
chain. Each `PendingWaiter` now records its `since` cursor and is
delivered only events with `id > since`.
#### `@shade/core` — `ratchetDecrypt` defense-in-depth
A same-DH message whose `counter` was already behind the chain — and
that did NOT match a cached skipped key — fell through to a path that
called `kdfChainKey` on the *current* (ahead) chain key and then set
`chain.counter = message.counter + 1`, permanently desyncing the
ratchet so every subsequent decrypt returned wrong-key. Such messages
are now rejected with `DecryptionError` without any state mutation, so
a downstream replay (transport bug, retry, intermitent network) cannot
poison the session.
### Tests
- `packages/shade-files/tests/integration/concurrent-ratchet.test.ts`
100 parallel `httpClient` RPCs while the drainer runs, plus a mixed
workload of 50 RPCs + 50 raw `shade.send` deliveries with Bob
echoing replies through the queue. Both surface the bug pre-fix.
- `packages/shade-transfer/tests/outbound-queue.test.ts` — direct
regression on the waiter `since` cursor.
- `packages/shade-core/tests/ratchet.test.ts` — replay of an
already-decrypted message must throw cleanly without breaking
subsequent decrypts on the same chain.
## [4.2.0] — 2026-05-03 — Pull-mode streams for browser @shade/files
`4.1.0` shipped HTTP RPC for browser clients but capped them at inline