release(v4.8.5): kill flushOnce 15s success-backoff + per-recipient parallel drain
Some checks failed
Test / test (push) Has been cancelled

Prism filed a per-recipient-flush-concurrency FR pointing at
serial-per-flush. Investigation surfaced the actual culprit:
`scheduleFlush` was using a 15 s backoff on **both** the success and
failure paths, so envelopes enqueued *during* an in-flight flush
sat ~15 s behind the next drain — visible as "10 s of silence then
25-frame burst" on the receiving side under sustained sender output.

Two fixes:

1. `scheduleFlush` now uses 0 ms delay when `flushOnce` delivered
   ≥1 envelope and more is queued (network healthy → drain
   remainder immediately). 15 s reserved for the actual failure
   case where every attempt this round failed. `flushOnce` returns
   `{ delivered, remaining } | null` so concurrent-flush early
   returns don't double-schedule.

2. `flushOnce` groups the outgoing queue by `recipientAddress` and
   drains buckets via `Promise.all`. Per-peer order preserved
   (sequential within a bucket); a slow POST to recipient A no
   longer head-of-line-blocks frames bound for B.

`Inbox.tick` public shape unchanged. `OutgoingQueueStore`
implementations see the same per-entry list/remove/bumpAttempts/
size contract; only cross-recipient interleaving changes.

Tests cover (1) 25-envelope burst behind a 100 ms slow PUT drains
within 1 s, and (2) carol's PUT lands within 150 ms even when bob's
PUT stalls 200 ms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 22:56:27 +02:00
parent a98ea8a1bd
commit 3c0db14904
28 changed files with 334 additions and 59 deletions

View File

@@ -5,6 +5,85 @@ All notable changes to Shade are documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [4.8.5] — 2026-05-08 — `Inbox.flushOnce`: kill the 15 s success-backoff + per-recipient parallel drain
Prism filed a "typing-into-a-chatty-shell" UX FR pointing at
serial-per-flush behavior. The investigation surfaced a more
important latent bug: `scheduleFlush` was using a 15 s backoff timer
on **both** the success and failure paths, so any envelopes enqueued
*during* an in-flight flush had to wait ~15 s for the next drain to
fire — visible to Prism's web client as "10 s of silence then a
25-frame burst" whenever the PC sidecar was emitting steady output.
Two fixes ship together:
**(1) `scheduleFlush` distinguishes healthy-drain from all-failed.**
After `flushOnce` returns, if the round delivered ≥1 envelope and
items are still queued, the next flush fires with **0 ms** delay
(network is fine — drain whatever piled up immediately). The 15 s
backoff is reserved for the actual failure case (every attempt this
round threw / was rejected). `flushOnce` now returns
`{ delivered, remaining } | null` so the scheduler can also tell
"someone else is flushing, don't double-schedule" apart from
"queue is empty, idle." Externally-visible API unchanged
(`Inbox.tick()` still returns `{ flushed, received }`).
**(2) Per-recipient parallel drain inside `flushOnce`.** The queue
is grouped by `recipientAddress`; each bucket is drained
sequentially (preserves per-peer enqueue order — the relay assigns
`receivedAt` on PUT arrival, so concurrent PUTs to the same peer
would let the second one land first), but distinct buckets run
concurrently via `Promise.all`. Pre-fix, a slow POST to recipient A
head-of-line-blocked every other recipient's frames. Future N-peer
broadcast fan-outs (multiple devices viewing the same Prism PTY)
benefit immediately; single-recipient deployments are unaffected
since N=1 is the trivial parallel case.
Reported by Prism (multi-device E2EE terminal). Acceptance: under
sustained typing, web's `recv` rate is roughly proportional to PC's
emit rate, no multi-second silences punctuated by burst catch-ups.
### Fixed
#### `@shade/inbox` — `scheduleFlush` 15 s success-backoff
- After a successful drain, the next flush is rescheduled with
`delayMs=0` when `delivered > 0`. The 15 s timer is reserved for
rounds where every attempt failed (no progress, avoid tight retry
loop).
- Concurrent `scheduleFlush` calls during an in-flight flush are
detected via `flushOnce` returning `null`; the no-op early return
no longer double-schedules a 15 s retry for a flush that's
already running.
#### `@shade/inbox` — `flushOnce` per-recipient parallelism
- Outgoing queue is grouped by `recipientAddress`; buckets drain
via `Promise.all`. Per-peer order preserved (sequential within a
bucket); cross-peer order has no guarantee in Shade's wire model
to begin with.
- Failure handling unchanged: per-entry `bumpAttempts` /
`maxAttempts` semantics are identical to V4.8.4.
### Tests
- `packages/shade-inbox/tests/client.test.ts`:
1. "burst enqueued during a flush drains immediately, not after
15 s backoff" — slow first PUT (100 ms), pile 24 more during,
assert `pendingCount === 0` within 1 s.
2. "per-recipient parallel drain — slow POST to A does not block
POSTs to B" — `bob` PUT stalls 200 ms; `carol` envelope queued
after; assert `inbox.message_delivered` for carol fires within
150 ms (would be ≥200 ms pre-fix).
### Migration
None. `Inbox.flushOnce` is a private method; the
`{ delivered, remaining } | null` shape is internal. `Inbox.tick`
public return `{ flushed, received }` is unchanged. Apps that hand
custom `OutgoingQueueStore` implementations to `Inbox` see no
contract change — `list()` / `remove()` / `bumpAttempts()` / `size()`
are called the same way per entry; only the *order* of `remove()`
calls across distinct recipients changes (interleaved instead of
strictly sequential).
## [4.8.4] — 2026-05-08 — Server-side cross-channel dedup via `BridgeDeliveryLog`
V4.8.3 shipped the *client-side* cross-channel dedup hook