release(v4.8.5): kill flushOnce 15s success-backoff + per-recipient parallel drain

Prism filed a per-recipient-flush-concurrency FR pointing at serial-per-flush. Investigation surfaced the actual culprit: `scheduleFlush` was using a 15 s backoff on **both** the success and failure paths, so envelopes enqueued *during* an in-flight flush sat ~15 s behind the next drain — visible as "10 s of silence then 25-frame burst" on the receiving side under sustained sender output. Two fixes: 1. `scheduleFlush` now uses 0 ms delay when `flushOnce` delivered ≥1 envelope and more is queued (network healthy → drain remainder immediately). 15 s reserved for the actual failure case where every attempt this round failed. `flushOnce` returns `{ delivered, remaining } | null` so concurrent-flush early returns don't double-schedule. 2. `flushOnce` groups the outgoing queue by `recipientAddress` and drains buckets via `Promise.all`. Per-peer order preserved (sequential within a bucket); a slow POST to recipient A no longer head-of-line-blocks frames bound for B. `Inbox.tick` public shape unchanged. `OutgoingQueueStore` implementations see the same per-entry list/remove/bumpAttempts/ size contract; only cross-recipient interleaving changes. Tests cover (1) 25-envelope burst behind a 100 ms slow PUT drains within 1 s, and (2) carol's PUT lands within 150 ms even when bob's PUT stalls 200 ms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:56:27 +02:00
parent a98ea8a1bd
commit 3c0db14904
28 changed files with 334 additions and 59 deletions
--- a/packages/shade-inbox/tests/client.test.ts
+++ b/packages/shade-inbox/tests/client.test.ts
@@ -284,6 +284,141 @@ describe('Inbox orchestrator', () => {
    expect(dispatched).toEqual([msgId]);
  });

+  test('burst enqueued during a flush drains immediately, not after 15 s backoff (V4.8.5)', async () => {
+    // Reproduces Prism FR `per-recipient-flush-concurrency-v4.8`: a
+    // burst of envelopes enqueued *during* a slow POST used to sit
+    // ~15 s behind the next flush because both the success path and
+    // the failure path of `flushOnce` rescheduled with the same 15 s
+    // backoff. The fix uses 0 ms when the round delivered something
+    // (network is healthy — drain remainder) and reserves 15 s for
+    // the all-attempts-failed case.
+    const store = new MemoryInboxStore();
+    const app = createInboxServer({ crypto, store, disableRateLimit: true });
+    const bob = await makeIdentity();
+    const alice = await makeIdentity();
+    const bobClient = new InboxClient({
+      baseUrl: 'http://localhost',
+      crypto,
+      signingPrivateKey: bob.signingPrivateKey,
+      fetch: honoFetch(app),
+    });
+    await bobClient.register({ address: 'bob', signingKey: bob.signingPublicKey });
+
+    // Wrap fetch so the FIRST PUT (only) takes 100 ms — long enough
+    // for many enqueues to land while it's in flight.
+    let firstPutSeen = false;
+    const slowFirstFetch: typeof fetch = (async (input, init) => {
+      const u =
+        typeof input === 'string'
+          ? input
+          : input instanceof URL
+            ? input.toString()
+            : (input as Request).url;
+      const isPut = u.includes('/v1/inbox/bob') && !u.includes('/fetch');
+      if (isPut && !firstPutSeen) {
+        firstPutSeen = true;
+        await new Promise((r) => setTimeout(r, 100));
+      }
+      return honoFetch(app)(input, init);
+    }) as typeof fetch;
+
+    const aliceInbox = new Inbox({
+      baseUrl: 'http://localhost',
+      ownAddress: 'alice',
+      crypto,
+      signingPrivateKey: alice.signingPrivateKey,
+      signingPublicKey: alice.signingPublicKey,
+      pollIntervalMs: 0,
+      fetch: slowFirstFetch,
+    });
+
+    aliceInbox.start();
+
+    // First send — this kicks the slow-PUT path.
+    await aliceInbox.send({ recipientAddress: 'bob', envelope: randBytes(20) });
+
+    // Pile 24 more on top while the first PUT is still in flight. The
+    // first PUT will finish at ~T+100 ms; the subsequent 24 should
+    // drain immediately after, NOT after a 15 s backoff.
+    for (let i = 0; i < 24; i++) {
+      await aliceInbox.send({ recipientAddress: 'bob', envelope: randBytes(20) });
+    }
+
+    // Wait long enough for the slow first PUT + the immediate
+    // reschedule + the 24-envelope drain. Pre-fix this would still
+    // have ≥1 entry pending after 1 s (waiting for the 15 s timer).
+    await new Promise((r) => setTimeout(r, 1_000));
+    expect(await aliceInbox.pendingCount()).toBe(0);
+    aliceInbox.stop();
+  });
+
+  test('per-recipient parallel drain — slow POST to A does not block POSTs to B (V4.8.5)', async () => {
+    const store = new MemoryInboxStore();
+    const app = createInboxServer({ crypto, store, disableRateLimit: true });
+    const alice = await makeIdentity();
+    const bob = await makeIdentity();
+    const carol = await makeIdentity();
+    // Register bob + carol.
+    const reg = async (name: string, kp: { signingPrivateKey: Uint8Array; signingPublicKey: Uint8Array }) => {
+      const c = new InboxClient({
+        baseUrl: 'http://localhost',
+        crypto,
+        signingPrivateKey: kp.signingPrivateKey,
+        fetch: honoFetch(app),
+      });
+      await c.register({ address: name, signingKey: kp.signingPublicKey });
+    };
+    await reg('bob', bob);
+    await reg('carol', carol);
+
+    // bob's PUT route stalls 200 ms; carol's is instant. Pre-fix this
+    // would head-of-line block carol behind bob.
+    const slowedFetch: typeof fetch = (async (input, init) => {
+      const u =
+        typeof input === 'string'
+          ? input
+          : input instanceof URL
+            ? input.toString()
+            : (input as Request).url;
+      const m = (init as RequestInit | undefined)?.method ?? 'GET';
+      if (m === 'POST' && u.includes('/v1/inbox/bob') && !u.includes('/fetch')) {
+        await new Promise((r) => setTimeout(r, 200));
+      }
+      return honoFetch(app)(input, init);
+    }) as typeof fetch;
+
+    const aliceInbox = new Inbox({
+      baseUrl: 'http://localhost',
+      ownAddress: 'alice',
+      crypto,
+      signingPrivateKey: alice.signingPrivateKey,
+      signingPublicKey: alice.signingPublicKey,
+      pollIntervalMs: 0,
+      fetch: slowedFetch,
+    });
+
+    const carolDeliveredAt = new Promise<number>((resolve) => {
+      aliceInbox.on((e) => {
+        if (e.name === 'inbox.message_delivered' && e.data.recipientAddress === 'carol') {
+          resolve(Date.now());
+        }
+      });
+    });
+
+    const t0 = Date.now();
+    // Bob queue first, carol second — pre-fix carol would wait 200 ms
+    // behind bob's slow PUT. With per-recipient parallelism, carol's
+    // PUT runs concurrently and lands first.
+    await aliceInbox.send({ recipientAddress: 'bob', envelope: randBytes(20) });
+    await aliceInbox.send({ recipientAddress: 'carol', envelope: randBytes(20) });
+
+    aliceInbox.start();
+    const carolAt = await carolDeliveredAt;
+    const carolElapsed = carolAt - t0;
+    expect(carolElapsed).toBeLessThan(150);
+    aliceInbox.stop();
+  });
+
  test('flush retries on transient server failure', async () => {
    const store = new MemoryInboxStore();
    const app = createInboxServer({ crypto, store, disableRateLimit: true });